Node reboots after reporting "crmd[xxxx]: error: process_lrm_event: Operation yyyyyy_stop_0: Timed out" in a RHEL 6 or 7 High Availability cluster
Issue
- I rebooted one node manually and observed that when that node came back, another node got rebooted itself
- I see a node getting rebooted frequently after reporting a stop operation timed out
- A node gets fenced after a stop operation fails from a timeout
Apr 23 01:21:38 node1 crmd[41660]: error: process_lrm_event: Operation rabbitmq-server_stop_0: Timed Out (node=node1.example.com, call=841, timeout=90000ms)
- I see a node getting rebooted after another node goes missing from the cluster, and it looks like it was scheduled for STONITH because its
rabbitmqstop operation failed or timed out
Environment
- Red Hat Enterprise Linux (RHEL) 6 or 7 with the High Availability Add On
pacemaker
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
