Service recovery after crash of rgmanager leads to lvm resources failing to recover with "[lvm] <node> owns <vg>/<lv> unable to start" in RHEL 6
Issue
- When
rgmanager
crashes or dies unexpectedly on one node, recovery of its services begins immediately rather than waiting for fencing. This leads to anylvm
resources failing to start. When using tagging it looks like this:
Aug 7 08:34:46 node2 rgmanager[8451]: State change: node1 DOWN
Aug 7 08:34:46 node2 rgmanager[8451]: Taking over service service:myService from down member node1
Aug 7 08:34:47 node2 rgmanager[10196]: [lvm] node1 owns vg/lv unable to start
Aug 7 08:34:47 node2 rgmanager[8451]: start on lvm "vg-lv" returned 1 (generic error)
Or with the clvmd
variant of HA-LVM:
Aug 13 09:55:32 node2 rgmanager[15875]: State change: node1 DOWN
Aug 13 09:55:32 node2 rgmanager[15875]: Taking over service service:myService from down member node1
Aug 13 09:55:33 node2 rgmanager[17323]: [lvm] Starting volume group, vg
Aug 13 09:55:34 node2 rgmanager[17355]: [lvm] Failed to activate vg
Aug 13 09:55:34 node2 rgmanager[15875]: start on lvm "lvm-vg" returned 1 (generic error)
rgmanager
on a node was killed, so the other node took the service over. However, whenrgmanager
tried to start the LVM resource, it failed.- Why does
rgmanager
not to try fencing failure node before recovering a service?
Environment
- Red Hat Enterprise Linux (RHEL) 6 with the High Availability Add On
rgmanager
releases prior to3.0.12.1-19.el6
- HA-LVM using either variant (tagging or
clvmd
)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.