RHEL 5 Cluster node was evicted by qdiskd but did not log any messages before eviction indicating that its I/O was hung or failing
Issue
- My node was evicted by
qdiskdon another node, however the evicted node gave no indication its I/O was failing or hanging before the eviction. We know the node was still responsive (ie had not panicked or completely hung) becausecman/openaisreported being killed by the eviction:
Jul 30 02:24:54 node2 openais[22959]: [CMAN ] cman killed by node 1 because we were killed by cman_tool or other application
- But
qdiskdis supposed to warn us when it detects hung I/O or failed I/O, but neither shows up in the logs here.
Environment
- Red Hat Enterprise Linux (RHEL) 5 Update 4 through RHEL 5 Update 8 with the High Availability Add Ons
cmanreleases starting with2.0.115-1.el5up to (but not including)2.0.115-109.el5- Earlier releases than
cman-2.0.115-1.el5did not report any sort of warning for hung I/O inqdiskd, so it is expected prior to an eviction from stalled I/O to not see any indications of why - Later releases than
cman-2.0.115-109.el5have a feature to avoid evictions when I/O is hanging (https://access.redhat.com/knowledge/solutions/153223)
- Earlier releases than
- Cluster configured to use a quorum device (
<quorumd>in/etc/cluster/cluster.conf) -
Node evicted but
qdiskdon the evicted node does not report any of:qdiskd[XXX]: <warning> qdiskd: read (system call) has hung for YY secondsqdiskd[XXX]: <warning> qdiskd: write (system call) has hung for YY secondsqdiskd[XXX]: <error> Error writing to quorum disk - Evidence suggesting that the node was still alive and responsive up until the point it was evicted. For example, if it logs messages indicating it recognized it was evicted:
openais[22959]: [CMAN ] cman killed by node 1 because we were killed by cman_tool or other application
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
