RHEL 5 Cluster node was evicted by qdiskd but did not log any messages before eviction indicating that its I/O was hung or failing

Solution In Progress - Updated -

Issue

  • My node was evicted by qdiskd on another node, however the evicted node gave no indication its I/O was failing or hanging before the eviction. We know the node was still responsive (ie had not panicked or completely hung) because cman/openais reported being killed by the eviction:
Jul 30 02:24:54 node2 openais[22959]: [CMAN ] cman killed by node 1 because we were killed by cman_tool or other application

Environment

  • Red Hat Enterprise Linux (RHEL) 5 Update 4 through RHEL 5 Update 8 with the High Availability Add Ons
  • Cluster configured to use a quorum device (<quorumd> in /etc/cluster/cluster.conf)
  • Node evicted but qdiskd on the evicted node does not report any of:

    qdiskd[XXX]: <warning> qdiskd: read (system call) has hung for YY seconds
    
    qdiskd[XXX]: <warning> qdiskd: write (system call) has hung for YY seconds
    
    qdiskd[XXX]: <error> Error writing to quorum disk
    
  • Evidence suggesting that the node was still alive and responsive up until the point it was evicted. For example, if it logs messages indicating it recognized it was evicted:
openais[22959]: [CMAN ] cman killed by node 1 because we were killed by cman_tool or other application

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content