Service recovery after crash of rgmanager leads to lvm resources failing to recover with "[lvm] <node> owns <vg>/<lv> unable to start" in RHEL 6

Solution Unverified - Updated -

Issue

  • When rgmanager crashes or dies unexpectedly on one node, recovery of its services begins immediately rather than waiting for fencing. This leads to any lvm resources failing to start. When using tagging it looks like this:
Aug  7 08:34:46 node2 rgmanager[8451]: State change: node1 DOWN
Aug  7 08:34:46 node2 rgmanager[8451]: Taking over service service:myService from down member node1
Aug  7 08:34:47 node2 rgmanager[10196]: [lvm]   node1 owns vg/lv unable to start
Aug  7 08:34:47 node2 rgmanager[8451]: start on lvm "vg-lv" returned 1 (generic error)

Or with the clvmd variant of HA-LVM:

Aug 13 09:55:32 node2 rgmanager[15875]: State change: node1 DOWN
Aug 13 09:55:32 node2 rgmanager[15875]: Taking over service service:myService from down member node1
Aug 13 09:55:33 node2 rgmanager[17323]: [lvm] Starting volume group, vg
Aug 13 09:55:34 node2 rgmanager[17355]: [lvm] Failed to activate vg
Aug 13 09:55:34 node2 rgmanager[15875]: start on lvm "lvm-vg" returned 1 (generic error)
  • rgmanager on a node was killed, so the other node took the service over. However, when rgmanager tried to start the LVM resource, it failed.
  • Why does rgmanager not to try fencing failure node before recovering a service?

Environment

  • Red Hat Enterprise Linux (RHEL) 6 with the High Availability Add On
  • rgmanager releases prior to 3.0.12.1-19.el6
  • HA-LVM using either variant (tagging or clvmd)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content