remote node resource times out or fails when under high load in a RHEL 6 or 7 High Availability cluster with pacemaker and pacemaker-remote

Solution Unverified - Updated -

Issue

  • My remote nodes get restarted any time something hogs CPU or system resources
  • guest remote nodes are failing and recovering constantly whenever we execute an intensive job in that VM
  • In the case of high I/O inside a VM, that VM is systematically fenced due to remote-node timeout
  • Once a VM is started, the remote-node successfully connected, and all resources on remote-node successfully started; in case of I/O load inside VM, pacemaker fences the VM and restarts it due to a remote-node timeout

Environment

  • Red Hat Enterprise Linux (RHEL) 6 or 7 with the High Availability Add-On
  • pacemaker and pacemaker-remote
  • One or more remote nodes defined in the CIB as an ocf:heartbeat:VirtualDomain resource with a remote-node meta attribute, or as an ocf:pacemaker:remote resource

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content