remote node resource times out or fails when under high load in a RHEL 6 or 7 High Availability cluster with pacemaker and pacemaker-remote
Issue
- My remote nodes get restarted any time something hogs CPU or system resources
- guest remote nodes are failing and recovering constantly whenever we execute an intensive job in that VM
- In the case of high I/O inside a VM, that VM is systematically fenced due to remote-node timeout
- Once a VM is started, the remote-node successfully connected, and all resources on remote-node successfully started; in case of I/O load inside VM,
pacemakerfences the VM and restarts it due to a remote-node timeout
Environment
- Red Hat Enterprise Linux (RHEL) 6 or 7 with the High Availability Add-On
pacemakerandpacemaker-remote- One or more remote nodes defined in the CIB as an
ocf:heartbeat:VirtualDomainresource with aremote-nodemeta attribute, or as anocf:pacemaker:remoteresource
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
