rgmanager, clvmd, cmirrord, or other DLM-using daemons in a RHEL 5 or 6 High Availability cluster become blocked after "dlm: connecting to X" or "dlm: got connection from X" is seen in the logs
Issue
rgmanageris not responding to any requests to move a service, and doesn't seem to be running status checks any longer. The last thing we see in the logs isdlmreporting connections to other nodes:
Nov 3 12:20:35 node1 kernel: dlm: got connection from 6
Nov 3 12:20:35 node2 kernel: dlm: got connection from 4
Nov 3 12:20:35 node3 kernel: dlm: got connection from 5
Nov 3 12:20:35 node4 kernel: dlm: got connection from 3
clustatis reporting "Service states unavailable" and not showing any services, andclusvcadmcan't manage any services. The logs showdlm: connecting to Xright before this happened- LVM commands in my cluster are not finishing and just hang forever. The logs report
dlmconnections between nodes that were already connected
Environment
- Red Hat Enterprise Linux (RHEL) 5 or 6 with the High Availability Add
- One or more daemons in use that utilizes DLM for locking
rgmanager,clvmd,cmirrordand GFS2 file systems all utilize DLM and may be affected
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
