A singular cluster-fence attempt never completes and cluster operations block during this time in a RHEL 6 Resilient Storage cluster with gfs2

Solution Unverified - Updated -

Issue

  • After a node left the cluster for unknown reasons, there was a GFS2 deadlock
  • A node required fencing, but that fencing never completed and everything in the cluster blocked
  • fence_ipmilan hung when trying to fence a node
  • fence_ipmilan never returned when it was called to fence another node
Apr 13 10:48:27 node1 corosync[1984]: [TOTEM ] A processor failed, forming new configuration.
Apr 13 10:49:28 node1 corosync[1984]: [QUORUM] Members[8]: 1 2 3 5 6 7 8 9
Apr 13 10:49:28 node1 corosync[1984]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Apr 13 10:49:28 node1 kernel: dlm: closing connection to node 4
Apr 13 10:49:28 node1 corosync[1984]: [CPG ] chosen downlist: sender r(0) ip(10.0.0.9) ; members(old:9 left:1)
Apr 13 10:49:28 node1 corosync[1984]: [MAIN ] Completed service synchronization, ready to provide service.
Apr 13 10:49:28 node1 fenced[2061]: fencing node node2
Apr 13 10:49:28 node1 kernel: GFS2: fsid=cluster:lv1.8: jid=7: Trying to acquire journal lock...
Apr 13 10:49:28 node1 kernel: GFS2: fsid=cluster:lv2.8: jid=4: Trying to acquire journal lock...
Apr 13 10:52:08 node1 kernel: INFO: task kswapd0:100 blocked for more than 120 seconds.
[...]
Apr 13 10:52:08 node1 kernel: INFO: task glock_workqueue:2403 blocked for more than 120 seconds.
[...]
Apr 13 10:52:08 node1 kernel: INFO: task glock_workqueue:2404 blocked for more than 120 seconds.
[...]
Apr 13 10:52:08 node1 kernel: INFO: task glock_workqueue:2405 blocked for more than 120 seconds.
  • fence_vmware_soap began fencing, but never finished. gfs2 throughout the cluster blocked that entire time and didn't recover on its own.

Environment

  • Red Hat Enterprise Linux (RHEL) 6 w/ the Resilient Storage Add-On
  • Red Hat Enterprise Linux (RHEL) 7 w/ the Resilient Storage Add-On
  • One or more gfs2 file systems mounted within the cluster

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.