This article is intended for administrators wishing to create a 2x node + witness High Availability cluster utilizing Proxmox Virtual Environment (https://www.proxmox.com/en/proxmox-virtual-environment/overview) and StorMagic SvSAN (https://stormagic.com/svsan/).
Note: All images are clickable for enlarging, or can be opened in a new tab
Information
Guide Summary
This multipart guide will walk through the process to deploy 2x hyperconverged Proxmox VE nodes utilizing SvSAN virtualized block storage and a lightweight witness node, such as a Raspberry Pi.
Create a iSCSI login script
The below gratefully shared by a StorMagic partner.
This next step/setup ensures that iSCSI storage provided by SvSAN virtual appliances is reconnected at boot and that all required Proxmox HA resources are properly configured to start automatically, in the event of a cluster down scenario.
Note: The following configuration needs to be done on every pve cluster member.
The below script
/usr/local/bin/iscsi-reconnect.sh"
waits for a list of SvSAN IPs to become reachable on the iSCSI port (3260).
Once all defined targets are online, it reconnects all known iSCSI sessions using iscsiadm and then calls a secondary script, ha-enable.sh to cleanup the cluster (HA) resources (Guest VMs).
The SvSAN IPs are defined in the script and MUST be adjusted to match the SvSAN VSA iSCSI IPs on your VSAs. The script includes a global timeout to prevent indefinite waiting if the targets never become available.
nano /usr/local/bin/iscsi-reconnect.sh
#!/bin/bash
# ---------------------------------------------
# Configuration
# ---------------------------------------------
REQUIRED_PVE_VERSION="8.4.1" # Expected PVE version (for info only)
SVSAN_IPS=("192.168.1.3" "192.168.1.4") # List of SvSAN IPs to check
TIMEOUT=400 # Global timeout (seconds)
INTERVAL=5 # Retry interval (seconds)
HA_ENABLE_SCRIPT="/usr/local/bin/ha-enable.sh" # Path to HA script
START_TIME=$(date +%s) # Start timestamp
declare -A ready_ips # Readiness tracker
# ---------------------------------------------
# Version check (warning only)
# ---------------------------------------------
CURRENT_PVE_VERSION=$(pveversion | awk -F'/' '{print $2}' | cut -d '/' -f1)
if [ "$CURRENT_PVE_VERSION" != "$REQUIRED_PVE_VERSION" ]; then
echo "[iSCSI-Reconnect] WARNING: This script was built for Proxmox $REQUIRED_PVE_VERSION
echo "[iSCSI-Reconnect] You are running $CURRENT_PVE_VERSION. Proceeding anyway..."
fi
# ---------------------------------------------
# iSCSI readiness function
# ---------------------------------------------
function is_iscsi_ready() {
local ip="$1"
timeout 1 bash -c "/dev/null
return $?
}
# Initialize readiness tracking
for ip in "${SVSAN_IPS[@]}"; do
ready_ips["$ip"]=0
done
echo "[iSCSI-Reconnect] Waiting for all SvSAN IPs to be ready for iSCSI (port 3260)..."
# ---------------------------------------------
# Main check loop
# ---------------------------------------------
while true; do
all_ready=true
for ip in "${SVSAN_IPS[@]}"; do
if [ "${ready_ips[$ip]}" -eq 0 ]; then
if is_iscsi_ready "$ip"; then
echo "[iSCSI-Reconnect] $ip is accepting iSCSI connections."
ready_ips["$ip"]=1
else
all_ready=false
echo "[iSCSI-Reconnect] $ip not ready yet..."
fi
fi
done
if $all_ready; then
echo "[iSCSI-Reconnect] All SvSAN targets are ready. Connecting to all known iSCSI networks"
iscsiadm -m node --loginall=automatic
echo "[iSCSI-Reconnect] Done."
echo "[iSCSI-Reconnect] Calling HA enable script..."
$HA_ENABLE_SCRIPT
exit 0
fi
# Global timeout handling
current_time=$(date +%s)
elapsed=$((current_time - START_TIME))
if [ "$elapsed" -ge "$TIMEOUT" ]; then
echo "[iSCSI-Reconnect] Timeout reached. Not all SvSAN targets became ready."
exit 1
fi
sleep $INTERVAL
done
Next chmod the script file to enable the execute bit
The iSCSI reconnect script is managed by a systemd unit ( iscsi-reconnect.service ), which runs automatically at boot after the network is online.
Next create the service to run the script
nano /etc/systemd/system/iscsi-reconnect.service
[Unit]
Description=Reconnect iSCSI targets after SvSAN VMs are online
After=network-online.target
Wants=network-online.target
[Service]
Type=oneshot
ExecStart=/usr/local/bin/iscsi-reconnect.sh
RemainAfterExit=true
[Install]
WantedBy=multi-user.target
Next enable the service
systemctl daemon-reexec
systemctl daemon-reload
systemctl enable iscsi-reconnect.service
HA Cluster tidy up
Once the storage is reconnected, the script /usr/local/bin/ha-enable.sh is triggered to ensure that HA resources are marked with the correct started state and ready to resume operation.
The script first checks whether the current node is the elected HA master.
Only the HA master is allowed to proceed with enforcing the HA state, avoiding conflicting changes during partial cluster reboots.
The script dynamically parses the Proxmox cluster-wide HA configuration file ( /etc/pve/ha/resources.cfg ) to identify all existing HA-managed VMs and containers.
This ensures that the HA recovery process reflects any live changes made via the GUI or CLI, without requiring updates to the script.
Before proceeding with HA activation, the script can optionally wait for a specific system—typically a Domain Controller—to respond on the network.
This is done using a basic ping check to a defined IP address.
If enabled, this check ensures that core services such as DNS or LDAP are online before restarting dependent VMs or containers.
If the Domain Controller does not respond within the configured timeout, the script logs a warning and continues anyway. This prevents unnecessary delays while still prioritizing the availability of critical infrastructure.
Once ready, the script uses ha-manager to set each HA-managed resource to the started state.
This allows the HA manager to automatically restart these VMs and containers in accordance with defined HA rules (failover, restart, migration, etc.).
Although the script is safe to run on all nodes, only the HA master performs changes, guaranteeing consistency and avoiding duplicate operations.
Why must HA be reconfigured after a full cluster shutdown?
When a Proxmox HA cluster is shut down cleanly (e.g., all nodes powered off for maintenance), the HA stack enters a safe "stopped" state.
This prevents unintended recovery actions before the cluster regains quorum and shared storage access. After reboot, the cluster configuration persists, but the HA manager does not automatically resume protected workloads.
Instead, each HA resource remains in the stopped state until explicitly marked as started . For this reason, the ha-enable.sh script is essential: once the cluster is operational and storage is online, it automatically restores the intended HA state.
This guarantees that protected workloads resume their expected behavior without manual intervention, even after a complete cluster shutdown and recovery.
nano /usr/local/bin/ha-enable.sh
#!/bin/bash
# ---------------------------------------------
# Config: Optional domain controller check
# ---------------------------------------------
WAIT_FOR_DC=true
DC_HOST="10.231.15.240" # IP or hostname of the domain controller
DC_TIMEOUT=180 # Max time to wait (in seconds)
INTERVAL=5 # Ping interval (in seconds)
# ---------------------------------------------
# Step 0: Ensure this node is the HA master
# ---------------------------------------------
HA_MASTER=$(ha-manager status | grep '^master' | awk '{print $2}')
THIS_NODE=$(hostname)
if [ "$HA_MASTER" != "$THIS_NODE" ]; then
echo "[HA-Enable] This node is NOT the HA master ($HA_MASTER). Skipping HA config."
exit 0
fi
echo "[HA-Enable] This node is the HA master. Proceeding..."
# ---------------------------------------------
# Step 1: Wait for Domain Controller (optional)
# ---------------------------------------------
if $WAIT_FOR_DC; then
echo "[HA-Enable] Waiting for Domain Controller at $DC_HOST to respond to ping..."
elapsed=0
while [ $elapsed -lt $DC_TIMEOUT ]; do
if ping -c 1 -W 1 "$DC_HOST" &/dev/null; then
echo "[HA-Enable] Domain Controller is reachable."
break
fi
sleep $INTERVAL
elapsed=$((elapsed + INTERVAL))
done
if [ $elapsed -ge $DC_TIMEOUT ]; then
echo "[HA-Enable] WARNING: Domain Controller not reachable after $DC_TIMEOUT seconds
fi
fi
# ---------------------------------------------
# Step 2: Enforce 'started' state on HA resources
# ---------------------------------------------
HA_RESOURCE_FILE="/etc/pve/ha/resources.cfg"
if [ ! -f "$HA_RESOURCE_FILE" ]; then
echo "[HA-Enable] ERROR: HA config file not found at $HA_RESOURCE_FILE."
exit 1
fi
echo "[HA-Enable] Reading HA-configured resources from cluster..."
HA_RESOURCES=()
while IFS= read -r line; do
ID=$(echo "$line" | awk '{print $1 $2}') # Join "vm:" and "114" → "vm:114"
if [[ "$ID" =~ ^(vm|ct):[0-9]+$ ]]; then
HA_RESOURCES+=("$ID")
fi
done <"$ha_resource_file"
if [ ${#HA_RESOURCES[@]} -eq 0 ]; then
echo "[HA-Enable] No valid HA-managed VMs or CTs found. Exiting."
exit 0
fi
echo "[HA-Enable] Enforcing HA state 'started' for:"
for res in "${HA_RESOURCES[@]}"; do
echo " - $res"
ha-manager set "$res" --state started
done
echo "[HA-Enable] HA configuration complete."
exit 0
again chmod the file adding the execute bit
chmod +x /usr/local/bin/ha-enable.sh
Debug
It is possible to debug the script with journalctl using the below
journalctl --since "20 minutes ago" | grep "iscsi-reconnect"
See Also
Comments
0 comments
Article is closed for comments.