Appendix C. Troubleshooting
The following section covers database and cluster troubleshooting steps.
C.1. Cluster Configuration
To check the attributes of a cluster resource, execute:
#pcs resource show RESOURCE`
Attribute values can be changed using:
#pcs resource update RESOURCE_NAME ATTR_NAME=ATTR_VALUE
If a resource fails to start and pcs status shows an error for the resource, run the following command to start it on the local node and get more details about the error:
#pcs resource debug-start RESOURCE
Use caution if/when executing debug-restart . It is advised to disable the resource first to prevent conflicts, possible corruption, or resource failures. Consult Red Hat Support as needed.
To stop and start a cluster resource, execute:
#pcs resource disable RESOURCE #pcs resource enable RESOURCE
While working on the configuration, the resource may fail to start so often that the cluster disables it permanently. This can be checked with:
#pcs resource failcount show postgresql
If the failcounts are shown as INFINITY , you can reset them with:
#pcs resource cleanup postgresql
C.2. Replication in a Cluster Environment
The cluster resource agent script automatically determines which of the two nodes should be the primary and which should be the standby node. The current status can be viewed with:
#crm_mon -Afr -1
If the primary and standby are both active, the output should appear as:
Node Attributes: * Node cf-db1.example.com: + master-postgresql : 1000 + postgresql-data-status : LATEST + postgresql-master-baseline : 0000000010000080 + postgresql-status : PRI + postgresql-xlog-loc : 0000000010000080 * Node cf-db2.example.com: + master-postgresql : 100 + postgresql-data-status : STREAMING|ASYNC + postgresql-status : HS:async
In this case, cf-db1 is the primary, and cf-db2 is the standby server, with streaming asynchronous replication.
If the standby lost the connection to the primary for too long and requires its database to be restored from a backup done on the primary, the output will appear as:
Node Attributes: * Node cf-db1.example.com: + master-postgresql : -INFINITY + postgresql-data-status : DISCONNECT + postgresql-status : HS:alone * Node cf-db2.example.com: + master-postgresql : 1000 + postgresql-data-status : LATEST + postgresql-master-baseline : 0000000011000080 + postgresql-status : PRI + postgresql-xlog-loc : 0000000011000080
Here, cf-db2 is the primary, and cf-db1 is unable to start because its database is out-of-date.
This can be caused by connection problems. Check the firewalls for both database systems, and check that pg_hba.conf has the same content on both systems.
If a problem is found and fixed, disable and enable the postgresql resource, run tail -f /var/log/messages and some time after enabling the resource, one database system becomes the primary and the other one the standby.
C.3. Restoring the Standby Database from a Backup
If the standby is still unable to start after checking the firewall, PostgreSQL access permissions and the NFS mount for archived Write Ahead Logs, take a backup of the primary and restore it on the standby database.
To do this, run the following commands on the standby cluster node:
#pcs cluster standby $HOSTNAME #su - postgres $rm -rf /tmp/pgbackup $mkdir /tmp/pgbackup $scl enable rh-postgresql94 -- pg_basebackup -h REPLICATION_VIP -U \ replicator -D /tmp/pgbackup -x $rm -rf /var/opt/rh/rh-postgresql94/lib/pgsql/data/* $mv /tmp/pgbackup/* /var/opt/rh/rh-postgresql94/lib/pgsql/data $chown -R postgres:postgres \ /var/opt/rh/rh-postgresql94/lib/pgsql/data #pcs cluster unstandby $HOSTNAME
C.4. Simulating a Node Failure
To test fencing and automating failover, trigger a kernel panic by running the command below. Before doing this, ensure access to the system console and power control.
#echo c >/proc/sysrq-trigger
Watching /var/log/messages on the surviving node, the crashed node is fenced, and the surviving node becomes the primary database (if it was not already).
The crashed node should boot after the power off/power on cycle, automatically join the cluster and start the database as standby. If it was the primary before, PGSQL.lock needs to be removed as described above.
C.5. Red Hat CloudForms UI Failover
To simulate a UI failure by stopping the Web server on one of the UI appliances, run the following command:
#service httpd stop
When done testing, start the Web server again with:
#service httpd start
To verify which CFME appliance serves requests, check: /var/www/miq/vmdb/log/apache/ssl_access.log .

Where did the comment section go?
Red Hat's documentation publication system recently went through an upgrade to enable speedier, more mobile-friendly content. We decided to re-evaluate our commenting platform to ensure that it meets your expectations and serves as an optimal feedback mechanism. During this redesign, we invite your input on providing feedback on Red Hat documentation via the discussion platform.