One of the units used in a hardware failover pair has failed and needs to be replaced.
When a hardware failure occurs on a hardware failover pair, often the failed unit needs to be replaced. Depending on the age of the hardware and the type of configuration on the failed unit, multiple methods can be used to re-establish the hardware failover setup.
If the failure occurred on an appliance that still is within warranty limits, please contact support to proceed with ordering a replacement. Once the replacement has arrived, contact support again and they will assist with establishing and restoring the setup.
Depending on what has failed, there are multiple ways forward. The methods given here are the same for appliances and for custom hardware/virtual setups.
Master failed, failover unit active
If the master system has failed, one option could be to convert the failover system to the master and when the replacement system arrives, configure that as the new failover node. Depending on the age of the hardware involved, this may be the easiest option to get the hardware failover back up and running. If the age of the appliance is over three years, this may not be the best option.
Converting the failover system to a master
Note: This involves command line access and editing files using the WinSCP tool or directly on the command line. If you are in any way unsure of this, please contact support for assistance.
- Access the GUI and go to the "System - Hardware - Failover" menu. There disable the hardware failover setting and save.
- Use WinSCP to connect to the Smoothwall SSH interface. Navigate to the "/var/heartbeat" folder and delete the file "settings.tar.gz" located there.
- The find the file "/etc/ha.d/nodeinfo" and edit the file. Change "slave" to "master" and save.
- Re enable the hardware failover setting in the "System - Hardware - Failover" menu and restart the Smoothwall.
This changes the unit from the failover node to the master node. Once the system has rebooted, go to the "System - Hardware - Failover" menu again and generate a failover archive. Use this to setup the new failover unit.
Using the new system as master and retaining existing as failover
Using the new system as a master requires restoring settings to the new unit from a settings archive. If none are available, an archive from the failover unit could be used, but in that case, the steps above need to be taken to convert the configuration from failover to master again. The failover unit should be disconnected from everything but the heartbeat interface during this process.
- Power on and configure IP addressing and DNS on the new master. Update to the latest update level and the system will reboot.
- Once up, import and restore the settings archive. Do NOT restore the item "Ethernet settings". If the archive was from a failover system, please perform the steps for converting a failover to a master, before rebooting.
- Confirm settings look normal and all configurations have been imported. Then go to "System - Hardware - Failover" and generate a new failover archive.
- Put the archive on a USB drive with a single fat(32) partition and insert this into the failover.
- Log on to the command line of the failover unit and restore the failover archive using the "setup" command.
- Reboot the failover and check that this can now be accessed using https://smoothwall.lan.ip:440. If the failover admin interface shows up on this link, the pair should be back in operation. Double check that there are log entries in the "Reports - Realtime - System" menu, select the heartbeat in the section drop down. There should be messages on both the master and failover system showing they can communicate.
- Perform failover tests.
NOTE: The reason we are reinstalling failover settings on the failover is to resync the new master with the original failover. Version numbers for configurations may be different on the new master which may cause issues.
Performing failover tests
A failover test is essential to ensure that the failover is working as expected. To perform one, go through these steps:
- Access the master GUI and navigate to the "System » Hardware » Failover" section and use the "Enter Standby" button to enter passive mode on the master. This will trigger the failover system to enter active mode. This is the start of the failover test.
- Test that the failover system has entered active mode correctly and that all services are functioning as expected. The failover system should have everything working after 2-3 minutes. It is important to verify that all services are running, and that the failover system is working as expected as this is the final step before applying the updates on the master.
- If the failover system does not pass testing, navigate to "System » Hardware » Failover" and use the "Enter Standby" button to revert to having the master in active mode. Call support so we can help you troubleshoot any issues the failover system may have.
- Once the failover has passed testing, enter the failover system GUI and use the "Enter Standby" button in the "System - Hardware - Failover" section to push the failover into standby mode and the master into active.
- Test that everything is working as expected after the master is active mode and has been updated.
Additionally, please read this knowledge base to learn how to apply updates to both systems in a hardware failover pair: