fencing fails with 'No such device' or a stonith device is listed as "Stopped" in a RHEL 6 or 7 High Availability cluster with symmetric-cluster=false set

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux (RHEL) 6 or 7 with the High Availability Add-On
  • pacemaker
  • Cluster property symmetric-cluster is set to false in the CIB

Issue

  • Pacemaker cluster stonith resources show as stopped, and are unable to run anywhere:
pengine:     info: native_print:     node1_fence   (stonith:fence_ilo4):   Stopped
pengine:     info: native_print:     node2_fence   (stonith:fence_ilo4):   Stopped
pengine:     info: native_print:     node3_fence   (stonith:fence_ilo4):   Stopped
pengine:     info: native_print:     node4_fence   (stonith:fence_ilo4):   Stopped
pengine:     info: native_color:     Resource node1_fence cannot run anywhere
pengine:     info: native_color:     Resource node2_fence cannot run anywhere
pengine:     info: native_color:     Resource node3_fence cannot run anywhere
pengine:     info: native_color:     Resource node4_fence cannot run anywhere
  • The pcs stonith fence command fails:
# pcs stonith fence node4
Error: unable to fence 'node4'
Command failed: No such device
  • stonith fences fail with this message in the logs:
crmd:   notice: tengine_stonith_notify:   Peer node4 was not terminated (reboot) by node4 for node4: No such device (ref=50574ae6-54fa-4201-aac7-d9b20fa55377) by client stonith_admin.33126
  • My stonith device won't start, and just stays listed as "Stopped"
  • Why isn't the cluster starting my stonith device?

Resolution

  • Use pcs property set symmetric-cluster=true to allow resources to run anywhere by default. You can then use constraints to limit where the resources are able to run, and control the priority of each node. Alternatively if you need symmetric-cluster=false set be sure to create constraints to allow the fence devices to run.

Root Cause

If a cluster has symmetric-cluster=false set, then stonith devices must have a constraint assigning them to a specific node or nodes. Without such a constraint, the cluster reports that it has no fence devices with which to fence a node, and will not attempt to start monitoring those devices.

This is expected behavior, given that symmetric-cluster is intended to prevent resources from starting just anywhere, and instead requires them to be specifically designated to run on certain nodes. stonith devices are just a special class of resource, so they are treated the same.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments