rgmanager, clvmd, cmirrord, or other DLM-using daemons in a RHEL 5 or 6 High Availability cluster become blocked after "dlm: connecting to X" or "dlm: got connection from X" is seen in the logs
Issue
rgmanageris not responding to any requests to move a service, and doesn't seem to be running status checks any longer. The last thing we see in the logs isdlmreporting connections to other nodes:
Nov 3 12:20:35 node1 kernel: dlm: got connection from 6
Nov 3 12:20:35 node2 kernel: dlm: got connection from 4
Nov 3 12:20:35 node3 kernel: dlm: got connection from 5
Nov 3 12:20:35 node4 kernel: dlm: got connection from 3
clustatis reporting "Service states unavailable" and not showing any services, andclusvcadmcan't manage any services. The logs showdlm: connecting to Xright before this happened- LVM commands in my cluster are not finishing and just hang forever. The logs report
dlmconnections between nodes that were already connected
Environment
- Red Hat Enterprise Linux (RHEL) 5 or 6 with the High Availability Add
- One or more daemons in use that utilizes DLM for locking
rgmanager,clvmd,cmirrordand GFS2 file systems all utilize DLM and may be affected
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.