Red Hat Training

A Red Hat training course is available for Red Hat Enterprise Linux

10.4. Cluster Daemon crashes

RGManager has a watchdog process that reboots the host if the main rgmanager process fails unexpectedly. This causes the cluster node to get fenced and rgmanager to recover the service on another host. When the watchdog daemon detects that the main rgmanager process has crashed then it will reboot the cluster node, and the active cluster nodes will detect that the cluster node has left and evict it from the cluster.
The lower number process ID (PID) is the watchdog process that takes action if its child (the process with the higher PID number) crashes. Capturing the core of the process with the higher PID number using gcore can aid in troubleshooting a crashed daemon.
Install the packages that are required to capture and view the core, and ensure that both the rgmanager and rgmanager-debuginfo are the same version or the captured application core might be unusable.
$ yum -y --enablerepo=rhel-debuginfo install gdb rgmanager-debuginfo

10.4.1. Capturing the rgmanager Core at Runtime

There are two rgmanager processes that are running as it is started. You must capture the core for the rgmanager process with the higher PID.
The following is an example output from the ps command showing two processes for rgmanager.

$ ps aux | grep rgmanager | grep -v grep 

root    22482  0.0  0.5  23544  5136 ?        S<Ls Dec01   0:00 rgmanager 
root    22483  0.0  0.2  78372  2060 ?        S<l  Dec01   0:47 rgmanager 
In the following example, the pidof program is used to automatically determine the higher-numbered pid, which is the appropriate pid to create the core. The full command captures the application core for the process 22483 which has the higher pid number.
$ gcore -o /tmp/rgmanager-$(date '+%F_%s').core $(pidof -s rgmanager)