An overview of troubleshooting techniques when diagnosing issues with a Smoothwall Failover or High Availability (HA) installation setup.
The Smoothwall Failover feature makes use of two Smoothwalls connected via a dedicated heartbeat interface where system health and availability are monitored.
One is always referred to as the “master”, the other the “failover”, with both using an active / passive mode:
- Active mode All interfaces are active. The Smoothwall passes traffic and manages internet connections.
- Passive mode Only the heartbeat interface is active. Used to monitor the status of the active system.
When a failover event occurs, the master connection to the failover is severed. The failover unit takes over by activating all configured network interfaces, and issuing a broadcast on every interface to let other systems on the same physical Ethernet segment know to remove the MAC address associated with the master’s IP address. Then all services are made active on the failover unit, turning the failover unit from passive mode to active mode. The entire switchover should take less than one minute.
See our help topic, Adding a connection to the heartbeat interface.
Things to Consider when Installing
- Configure the master first.
- Use the same software serial number for the failover unit as was used for the master.
- On the failover unit, use a temporary IP address to add the gateway and DNS settings before installing the failover archive.
- If configuring a failover setup with an existing live master unit, ensure the failover unit has the same update and release level as the master unit before installing the failover archive.
- If installing a failover unit at the same time as the master unit, you can install the archive at installation time.
- If using a USB stick to transfer the failover archive from master to failover, the USB stick must be formatted as FAT32.
- Before importing the settings from the failover archive, disconnect all network cards except for the heartbeat interface this is the only network card that should be connected at import time. When the failover archive is imported, the network settings will take effect and a reboot will be required so to prevent any issues with duplicate IP addresses on the network, it is recommended all cards be disconnected.
Finalizing the install
While the failover system is rebooting, log into the administration UI on the master and go to Reports > Realtime > System and select Heartbeat from the Section drop-down list. When the failover system is coming online messages should appear in the heartbeat log showing the master can communicate with the failover unit. Various messages will appear showing that settings have been transferred and that both master and failover have taken control of the resources used by the HA.
Once the failover unit is up, try to SSH to the failover unit from the master:
# ssh –p 222 10.99.0.2
This should complete normally and allow root login on the failover unit. Once logged in issue the command:
# ip addr
and check that only the heartbeat interface is listed as being active. That shows that the failover system is communicating with the master and has gone into passive mode as the master is alive and well.
Once this is confirmed, cable the failover as the master has been cabled. A failover test can now be done.
Testing the Failover Unit
With both master and failover unit cabled and the failover unit is confirmed in passive mode, a failover test can and should be done.
On the master unit, go to System > Hardware > Failover and click Enter standby. You will almost immediately lose connection to the administration UI as the master goes into passive mode.
Wait 10 seconds and try to access the administration UI again – you should new see the failover system administration UI. The failover system will always have a warning message showing that the system is the failover system and also showing a timestamp if the latest transfer of settings from the master.
Check that the failover unit is now passing traffic and all services are working as expected.
Testing Failback
Once the failover has been tested and all services are running on the failover unit it’s time to fail-back. First try to navigate to the master unit’s administration UI using port 440: https://<smoothwall.domain.name>:440
. Port 440 redirects over the heartbeat interface to the passive Smoothwall, which should currently be the master.
Once the administration UI to the master over port 440 has been confirmed, navigate back to the failover system and go to System > Hardware > Failover and click Enter standby.
Wait 10 seconds and refresh the page – the master administration UI should be the UI that is now being shown. Test that all services are running.
Note: The master is always the master, the failover is always the failover. Their status of active and passive can change – when the master is active, the failover is passive and vice versa. The passive unit will only have one enabled interface, the heartbeat interface. The active unit will have all configured interfaces enabled.
Issues: Split brain syndrome
Split brain syndrome occurs when both systems are in active mode. If both systems are in active mode the symptoms will be:
- Intermittent internet
- Accessing the administration UI alternates between master and failover unit on the same port number
If the split brain syndrome occurs, disconnect all network cards on the failover unit apart from the heartbeat interface. We need the heartbeat connection in order to access the failover unit and we need to disconnect all other network cards to avoid network confusion due to duplicate IP addresses on the network.
Once the failover has been disconnected, access the failover unit administration UI on port 440 and reboot the failover unit. Monitor the heartbeat logs on the master when the failover system comes back up and check the log messages look OK.
Once the failover unit is back up, login to the command line via SSH from the master and issue the command:
# ip addr
Again, check if the failover system is in active or passive mode. Only the heartbeat interface should be up if the failover is in passive mode. If the failover is still in active mode after a reboot, try issuing the following command on the command line:
# smoothcom enterstandby
and check the log messages on the master to see if the failover and master unit are communicating correctly.
If the failover unit is still in active mode after these 2 attempts, contact our Support department.