Chapter 3. Configuring Database Failover
The failover monitor daemon must run on all of the non-database CloudForms appliances to check for failures. In case of a database failure, it modifies the database configuration accordingly.
This configuration is crucial for high availability to work in your environment. If the database failover monitor is not configured, non-database CloudForms appliances will not react to the failover event and will not be reconfigured against the new primary database host.
3.1. Configuring the Failover Monitor
Configure the failover monitor only on the non-database CloudForms appliances with the following steps:
- In the appliance console menu, select Configure Application Database Failover Monitor.
- Select Start Database Failover Monitor.
3.2. Testing Database Failover
Test that failover is working correctly between your databases with the following steps:
Simulate a failure by stopping the database on the primary server:
# systemctl stop postgresql.service
To check the status of the database, run:
# systemctl status postgresql.serviceNote
You can check the status of the simulated failure by viewing the most recent
evm.loglog on the engine appliances.
- Check the appliance console summary screen for the primary database. If configured correctly, the CFME Database value in the appliance console summary should have switched from the hostname of the old primary database to the hostname of the new primary on all CloudForms appliances.
Upon database server failover, the standby server becomes the primary. However, the failed node cannot switch to standby automatically and must be manually configured. Data replication from the new primary to the failed and recovered node does not happen by default, so the failed node must be reintroduced into the configuration.
3.3. Reintroducing the Failed Node
Manual steps are required to reintroduce the failed primary database node back into the cluster as a standby. This allows for greater control over the configuration, and to diagnose the failure.
To reintroduce the failed node, reinitialize the standby database. On the standby database-only appliance, configure the following:
- In the appliance console menu, select Configure Database Replication.
- Select Configure Server as Standby.
yto remove all previous data from the server and configure it as a new standby database.
Set a unique identifier number for the standby server and enter the database name and credentials:
- Select a number to uniquely identify the node in the replication cluster. This number can be the same as the node’s original identification number.
- Enter the cluster database name.
- Enter the cluster database username.
- Enter the cluster database password.
- Enter the new primary database-only appliance hostname or IP address.
Enter the new standby database-only appliance hostname or IP address.Note
The hostname must be visible to all appliances that communicate with this database, including the engine appliances and any global region databases.
yto configure the replication manager for automatic failover.Note
If re-using the node’s identification number, select
yto overwrite the existing node ID (this cannot be undone). Additionally, select
yto overwrite and reconfigure the replication settings in
Confirm that the replication standby server configuration details are correct, and select
yto apply the configuration.
The standby server will then run an initial synchronization with the primary database, and start locally in standby mode.
Verify the configuration on the appliance console details screen for the standby server. When configured successfully, Local Database Server shows as
Your CloudForms environment is now re-configured for high availability.