The monitor and stop operations of an ethmonitor resource timed out in a Pacemaker cluster
Issue
- Why did the
ethmonitor
resource in my cluster time out, causing the node to be rebooted? ethmonitor
resource timed out with no messages indicating "link down" or other network issues.
Dec 18 02:49:54 node-2 lrmd[71867]: warning: child_timeout_callback: bond0-monitor_monitor_60000 process (PID 8970) timed out
Dec 18 02:49:54 node-2 lrmd[71867]: warning: operation_finished: bond0-monitor_monitor_60000:8970 - timed out after 60000ms
...
Dec 18 02:49:54 node-2 crmd[71870]: notice: te_rsc_command: Initiating action 5: stop bond0-monitor_stop_0 on node-2 (local)
Dec 18 02:50:14 node-2 lrmd[71867]: warning: child_timeout_callback: bond0-monitor_stop_0 process (PID 10133) timed out
Dec 18 02:50:14 node-2 lrmd[71867]: warning: operation_finished: bond0-monitor_stop_0:10133 - timed out after 20000ms
Environment
- Red Hat Enterprise Linux 6, 7, or 8 (with the High Availability Add-on)
- Pacemaker
- An
ocf:heartbeat:ethmonitor
resource
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.