remote node resource times out or fails when under high load in a RHEL 6 or 7 High Availability cluster with pacemaker and pacemaker-remote
Issue
- My remote nodes get restarted any time something hogs CPU or system resources
- guest remote nodes are failing and recovering constantly whenever we execute an intensive job in that VM
- In the case of high I/O inside a VM, that VM is systematically fenced due to remote-node timeout
- Once a VM is started, the remote-node successfully connected, and all resources on remote-node successfully started; in case of I/O load inside VM,
pacemakerfences the VM and restarts it due to a remote-node timeout
Environment
- Red Hat Enterprise Linux (RHEL) 6 or 7 with the High Availability Add-On
pacemakerandpacemaker-remote- One or more remote nodes defined in the CIB as an
ocf:heartbeat:VirtualDomainresource with aremote-nodemeta attribute, or as anocf:pacemaker:remoteresource
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.