RHEL 5 Cluster node was evicted by qdiskd but did not log any messages before eviction indicating that its I/O was hung or failing

Solution In Progress - Updated -

Issue

  • My node was evicted by qdiskd on another node, however the evicted node gave no indication its I/O was failing or hanging before the eviction. We know the node was still responsive (ie had not panicked or completely hung) because cman/openais reported being killed by the eviction:
Jul 30 02:24:54 node2 openais[22959]: [CMAN ] cman killed by node 1 because we were killed by cman_tool or other application

Environment

  • Red Hat Enterprise Linux (RHEL) 5 Update 4 through RHEL 5 Update 8 with the High Availability Add Ons
  • Cluster configured to use a quorum device (<quorumd> in /etc/cluster/cluster.conf)
  • Node evicted but qdiskd on the evicted node does not report any of:

    qdiskd[XXX]: <warning> qdiskd: read (system call) has hung for YY seconds
    
    qdiskd[XXX]: <warning> qdiskd: write (system call) has hung for YY seconds
    
    qdiskd[XXX]: <error> Error writing to quorum disk
    
  • Evidence suggesting that the node was still alive and responsive up until the point it was evicted. For example, if it logs messages indicating it recognized it was evicted:
openais[22959]: [CMAN ] cman killed by node 1 because we were killed by cman_tool or other application

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.