How do I configure a stonith device using agent fence_vmware_rest in a RHEL 7 or 8 High Availability cluster with pacemaker?

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux (RHEL) 7 Update 5
  • Red Hat Enterprise Linux (RHEL) 8
  • Pacemaker High Availability or Resilient Storage Add On
  • VMware vSphere version 6.5 and above, including 7.0

Issue

How do I configure a stonith device using agent fence_vmware_rest in a RHEL 7 or 8 High Availability cluster with pacemaker?

Resolution

  • Assuming following is cluster architecture:

    • cluster node hostnames are node1 and node2
    • cluster node names as seen by the vmware hypervisor (vCenter) are node1-vm and node2-vm
    • is IP address of vmware hypervisor which is managing cluster nodes VMs
  • First check if cluster node is able to reach the hypervisor and list VMs on it. Following command will try to connect to hypervisor with provided credentials and list all machines.

     # fence_vmware_rest -a <vCenter IP address> -l <vcenter_username> -p <vcenter_password> --ssl-insecure -z -o list | egrep "(node1-vm|node2-vm)"
     node1-vm,
     node2-vm,
     # fence_vmware_rest -a <vCenter IP address> -l <vcenter_username> -p <vcenter_password> --ssl-insecure -z -o status -n node1-vm
     Status: ON
    
  • If above list fails, then make sure the below is true

  • If command succeeded the node is able to communicate with hypervisor. Stonith device should be configured using same configuration options as were tested in listing. Some of arguments for the fence_vmware_rest command and fence_vmware_rest fencing agent in pacemaker can have slightly different name.
    For this reason check the help pages of both - fence_vmware_rest command and fence_vmware_rest fencing agent (In diagnostics section is shortened listing of options used by this solution)

  • Create the stonith device using command below. The pcmk_host_map attribute is used to map node hostname as see by cluster to the name of virtual machine as seen on vmware hypervisor.

  • The first attribute in pcmk_host_map is the cluster node name as seen in /etc/corosync/corosync.conf file and the next attribute, that is post semicolon is the cluster node names as seen by the vmware hypervisor.

    # cat /etc/corosync/corosync.conf
    [...]
    nodelist {
        node {
            ring0_addr: node1  <<<=== Cluster node name
            nodeid: 1
        }
    
        node {
            ring0_addr: node2
            nodeid: 2
        }
    }
    
    # pcs stonith create vmfence fence_vmware_rest pcmk_host_map="node1:node1-vm;node2:node2-vm" ipaddr=<vCenter IP address> ssl=1 login=<vcenter_username> passwd=<vcenter_password> ssl_insecure=1
    
  • To check the status of stonith device and its configuration use the commands below.

    # pcs stonith show
    Full list of resources:
    vmfence (stonith:fence_vmware_rest):    Started node1
    
    # pcs stonith show vmfence --full
     Resource: vmfence (class=stonith type=fence_vmware_rest)
      Attributes: pcmk_host_map=node1:node1-vm;node2:node2-vm ipaddr=<vCenter IP address> ssl=1 login=<vcenter_username> passwd=<vcenter_password> ssl_insecure=1
    
  • When stonith device is started proceed with proper testing of fencing in the cluster.

Additional notes and recommendations:

  • Make sure package fence-agents-4.0.11-86.el7 or later is installed which has new agent fence_vmware_rest.
  • fence_vmware_rest works with VMware vSphere version 6.5 or higher, including 7.0
  • Please refer to following link for support policies of fence_vmware_rest.
  • Once configured, it is highly recommended to test the fence functionality.
  • The fence agent fence_vmware_soap causes CPU usage to spike.
  • There is a known limitation imposed by the VMware Rest API of 1000 VMs: fence_vmware_rest monitor fails with error: "Exception: 400: Too many virtual machines. Add more filter criteria to reduce the number."
  • The fencing agent fence_vmware_rest currently does not support using UUIDs for VMs (only support using VM names). If you need to use UUID then use fence_vmware_soap.
  • The fencing agent fence_vmware_rest works only with vCentre IP as ESXi host does not provide REST API. If you need to use ESXi host IP then use fence_vmware_soap.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

8 Comments

fence_vmware_rest works only when vmware vcenter ip address presents.
It shound be : # fence_vmware_rest -a \ -l \<vcenter_username> -p \<vcenter_password> --ssl-insecure -z -o list

Thank you for highlighting this point. The article has been updated with the use of vCentre IP only.

Hello Guys, i'm serching a clear statement about Red Hat Enterprise Linux High Availability Cluster support on vSphere 7, the above KB mention it but...actually I found only this old RH kb about vSphere 5/6 support....https://access.redhat.com/articles/3131271

Can anyone point me to the right direction?

Hi, Beniamino. The article that you linked contains a complete list of supported VMware products for Red Hat High Availability. We update the support policies articles any time a support policy changes.

Our QE team is working on testing vSphere 7 to determine supportability (or to recommend any changes that may be necessary to achieve supportability).

Regards,

Reid Wahl, RHCA Senior Software Maintenance Engineer CEE - Platform Support Delivery - ClusterHA Red Hat

Hello, CVE 2021-21985, 2021-21986 [VMware vSAN plugin vulnerabilities: port 443] as a part of the patch workaround, it requires that the VSPHERE-UI service be stopped and started. Will there be an impact to HA clusters managed by Pacemaker? Will this impact the stonith devices/resources?

"Node is able to communicate with vCenter on port 443/tcp (when using SSL) or on port 80/tcp (without SSL)."

As long as you don't do anything that causes the VMs to restart, migrate, or lose communication with each other, there should be no impact to the cluster.

The stonith device may fail and stop. If so, that's not a big deal. Just run pcs stonith cleanup when you're finished. A stonith device still works when it's in Stopped state, as long as it's not administratively disabled.

See also: A stonith resource attempts to fence a cluster node while it is in stopped state on a pacemaker cluster

By the way, since I know that you have a Red Hat support subscription, I strongly encourage you to open a case when you have questions. We do our best to respond to comments on KB articles, but we don't always see them come in, and they take a lower priority than support cases.

I want to make sure you get your questions and concerns addressed in a timely fashion.

Thanks Reid.