Cluster fails to start HA-LVM service in node 2 " [lvm] Someone else owns this logical volume"

Latest response

Hello,
I'm testing in my lab to see how service relocates and fenced.

My environment is this;
RHEL 6.9 (VM in KVM)
CMAN, RGMANAGER, RICCI
service: HA-LVM (iscsi multipath)
fencing instrument: fenc_virt

Test 1. shuttting down node 1
successfully relocates service to node2

Test2. shutting down network interface
successfully relocates service to node2

Test3. shutting down network inferfaces in all multipathed iscsi sessions.
Failed to relocate sevice to node2, then the service status changes to 'recoverable' and 'failed'.

Question 1.
Then I did;

clusvcadm -d my_service

clusvcadm -e my_service

and below log is recorded on node 2. I googled the particular error ' [lvm] Someone else owns this logical volume' online and the closest explanation is https://access.redhat.com/solutions/1169483. Basically it tells, when all multipathed connections fail, the storage is still tagged with node 1 and won't relocate the lvm to node 2. Is this true? If then, when storage fails, Red Hat Cluster always requires human intervention to manually reenable the lvm service??

Question 2.
The service cannot be started on node2 until node1 comes up. If this is a real case, says node1 fails with storage card, the stroage is still tagged with node1 and cannot be started on node2. How can I force to start the lvm service on node2? I did clusvcadm -u -d -e -r, nothing works.

"[root@noded ~]# tail -f /var/log/messages
Jun 28 19:27:05 noded rgmanager[2133]: Stopping service service:my_service
Jun 28 19:27:05 noded rgmanager[19234]: [fs] stop: Could not match /dev/vg_vol001/lv_vol001 with a real device
Jun 28 19:27:05 noded ntpd[2001]: Listen normally on 17 eth1 192.168.254.110 UDP 123
Jun 28 19:27:05 noded rgmanager[19301]: [lvm] Someone else owns this logical volume
Jun 28 19:27:05 noded rgmanager[2133]: stop on lvm "ha_lvm_vol001" returned 1 (generic error)
Jun 28 19:27:05 noded rgmanager[19356]: [ip] Removing IPv4 address 192.168.254.110/24 from eth1
Jun 28 19:27:07 noded ntpd[2001]: Deleting interface #17 eth1, 192.168.254.110#123, interface stats: received=0, sent=0, dropped=0, active_time=2 secs
Jun 28 19:27:15 noded rgmanager[2133]: #12: RG service:my_service failed to stop; intervention required
Jun 28 19:27:15 noded rgmanager[2133]: Service service:my_service is failed
Jun 28 19:27:15 noded rgmanager[2133]: #13: Service service:my_service failed to stop cleanly

"

Responses

Hello Sungpill Han,

Please examine this Red Hat solution https://access.redhat.com/solutions/1120433 to see if it fits the issue you are experiencing. Otherwise, please post back here.

Regards,
RJ

lvmetad daemon is off in chkconfig and it was not running at the time. I also put lvm2-lvmetad =0 in /etc/cluster/cluster.conf as well as Red Hat HA-LVM configuration guide.

But, I feel something related to lvmetad, but just can't find a way to do.

Sungpill Han,

Double-check the output of ps aux | grep -i lvm on each node.

  • I also found this solution https://access.redhat.com/solutions/1169483. Is the file system read-only at the moment (the solution in this link asks that)?
  • I've seen in some cases where you have to reboot the system hosting the file system to make the filesystem read-write (that issue may be unique to our environment, this may not apply to you).

Please let us know how this goes,
RJ

p.s. - I will be unavailable for a while. If needed, you can open a case with Red Hat as well, or perhaps another person will chime in here.

Sungpill Han,

Did the last link help? If needed, open a case with Red Hat,

Regards,
RJ