Cluster service containing lvm resource fails instead of relocating when volume cannot be deactivated in RHEL
Issue
- Service with lvm resource fails to stop or recover when the file system on it cannot be unmounted:
Apr 22 17:04:37 node1 rgmanager[26011]: [fs] unmounting /dataFS
Apr 22 17:04:37 node1 rgmanager[26058]: [fs] Sending SIGTERM to processes on /dataFS
Apr 22 17:04:43 node1 rgmanager[26218]: [fs] unmounting /dataFS
Apr 22 17:04:43 node1 rgmanager[26263]: [fs] Sending SIGKILL to processes on /dataFS
Apr 22 17:04:48 node1 rgmanager[26420]: [fs] unmounting /dataFS
Apr 22 17:04:48 node1 rgmanager[26468]: [fs] Sending SIGKILL to processes on /dataFS
Apr 22 17:04:49 node1 rgmanager[13132]: stop on fs "backupdata" returned 1 (generic error)
Apr 22 17:04:51 node1 rgmanager[26525]: [lvm] Logical volume VolGroupBackup/dataFS failed to shutdown
Apr 22 17:04:51 node1 rgmanager[13132]: stop on lvm "BACKUPLVM" returned 1 (generic error)
- Red Hat cluster does not failover when active node lost access to SAN storage
lvm
resource failed to shutdown, causing the service to fail:
rgmanager[3010]: [lvm] Logical volume VolGroup/LogVol failed to shutdown
- My node did not reboot itself when it could not shutdown the lvm resource, even though
self_fence="on"
- I tried to move a cluster service from node 2 to 1 with command
clusvcadm -r <service> -m <node>
.
The result was failed and the service went down on node 2. To get it back working, I had to disable it a couple of times on node 2 and then re-enable it on node 2.
Jan 28 14:21:54 node2 clurgmgrd[13471]: <notice> Stopping service service:myService
Jan 28 14:22:02 node2 clurgmgrd: [13471]: <err> Logical volume vg1/lv1 failed to shutdown
Jan 28 14:22:02 node2 clurgmgrd[13471]: <notice> stop on lvm "vg1" returned 1 (generic error)
Jan 28 14:22:02 node2 clurgmgrd: [13471]: <err> Logical volume vg2/lv1 failed to shutdown
Jan 28 14:22:02 node2 clurgmgrd[13471]: <notice> stop on lvm "vg2" returned 1 (generic error)
Jan 28 14:22:03 node2 clurgmgrd: [13471]: <err> Logical volume vg3/lv1 failed to shutdown
Jan 28 14:22:03 node2 clurgmgrd[13471]: <notice> stop on lvm "vg3" returned 1 (generic error)
[...]
Jan 28 14:22:14 node2 clurgmgrd[13471]: <crit> #12: RG service:myService failed to stop; intervention required
Jan 28 14:22:14 node2 clurgmgrd[13471]: <notice> Service service:myService is failed
- Multiple clustered lvm volumes failed to be stopped when stopping the service
Environment
- Red Hat Enterprise Linux (RHEL) 5 or 6 with the High Availability Add On
- Using HA-LVM
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.