Node reboots after reporting "crmd[xxxx]: error: process_lrm_event: Operation yyyyyy_stop_0: Timed out" in a RHEL 6 or 7 High Availability cluster

Solution Unverified - Updated -

Issue

  • I rebooted one node manually and observed that when that node came back, another node got rebooted itself
  • I see a node getting rebooted frequently after reporting a stop operation timed out
  • A node gets fenced after a stop operation fails from a timeout
Apr 23 01:21:38 node1 crmd[41660]: error: process_lrm_event: Operation rabbitmq-server_stop_0: Timed Out (node=node1.example.com, call=841, timeout=90000ms)
  • I see a node getting rebooted after another node goes missing from the cluster, and it looks like it was scheduled for STONITH because its rabbitmq stop operation failed or timed out

Environment

  • Red Hat Enterprise Linux (RHEL) 6 or 7 with the High Availability Add On
  • pacemaker

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content