remote node resource times out or fails when under high load in a RHEL 6 or 7 High Availability cluster with pacemaker and pacemaker-remote

Solution Unverified - Updated -

Issue

  • My remote nodes get restarted any time something hogs CPU or system resources
  • guest remote nodes are failing and recovering constantly whenever we execute an intensive job in that VM
  • In the case of high I/O inside a VM, that VM is systematically fenced due to remote-node timeout
  • Once a VM is started, the remote-node successfully connected, and all resources on remote-node successfully started; in case of I/O load inside VM, pacemaker fences the VM and restarts it due to a remote-node timeout

Environment

  • Red Hat Enterprise Linux (RHEL) 6 or 7 with the High Availability Add-On
  • pacemaker and pacemaker-remote
  • One or more remote nodes defined in the CIB as an ocf:heartbeat:VirtualDomain resource with a remote-node meta attribute, or as an ocf:pacemaker:remote resource

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.