clvmd or other services hang while node rejoins cluster, `cman_tool services` shows services in FAIL_ALL_STOPPED, FAIL_START_WAIT, or JOIN_START_WAIT throughout cluster
Issue
- After a node was removed from the cluster, attempted to rejoin, and removed again shortly after,
clvmd
won't start or GFS/GFS2 file systems can't be mounted.cman_tool services
shows one or more services in bad states:
# cman_tool services
type level name id state
fence 0 default 0001000a FAIL_ALL_STOPPED
[1 2 3 4 5 6 7 8 9 10 11 12 13 14]
dlm 1 clvmd 00010003 none
[1 2 3 4 5 6 7 8 9 10 11 12 13]
dlm 1 gfsdata 00020005 none
[1 2 3 4 5 6 7 8 9 10 11 12 13]
dlm 1 rgmanager 00010002 none
[1 2 3 4 5 6 7 8 9 10 11 12 13]
gfs 2 gfsdata 00010005 FAIL_START_WAIT
[1 2 3 4 5 6 7 8 9 10 11 12 13]
- After a node is removed from the cluster that node is unable to mount GFS filesystems. When starting clvmd it hangs.
Environment
- Red Hat Enterprise Linux Server 5 (with the High Availability Add on)
- A node has been removed from the cluster, rejoined, and then removed again shortly after
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.