Attempting to stop or relocate a cluster service fails with a warning "clurgmgrd[11619]: <warning> BUG! Attempt to forward to myself!" in a RHEL 5 or 6 High Availability cluster with rgmanager

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux (RHEL) 5 or 6 with the High Availability Add On
  • rgmanager

Issue

  • Relocation or stopping of a service failed with "Attempt to forward to myself!"
  • when disabling the cluster service serviceclu we got:
Oct  8 13:23:32 server clurgmgrd[11619]: <warning> BUG! Attempt to forward to myself!

Resolution

  • Workaround: Manually disable the service and start it up in the preferred location

  • Workaround: Restart rgmanager on all nodes, or the affected nodes, to clear the problematic condition causing this

  • If the error happens in response to running clusvcadm -e, then check if the service is already enabled and running. If so, then relocation should be done with clusvcadm -r, not clusvcadm -e.

Root Cause

It is unknown exactly what leads to this. The logic in the code that produces this warning occurs when a node is attempting to stop a service, discovers that another node owns it and attempts to forward that request to the owning node, but then discovers the owning node is in fact itself. This is likely a bug, but it is unknown what conditions produce it.

The one instance in which this can be reliably reproduced is when attempting to enable a service that is already enabled, doing something like:

# clusvcadm -e <service> -m <other node>

Some users may attempt to do this in hopes of relocating the service to another node, but this is invalid when the service is already enabled. The proper syntax would be to relocate with clusvcadm -r. The improper usage of this command gets handled incorrectly by rgmanager, leading to the error seen in the logs.

Diagnostic Steps

  • When this occurs, capture an application core from clurgmgrd from all nodes immediately after the problem:
# gcore $(pidof -s clurgmgrd)
  • Capture a dump of rgmanager's internal state immediately after dumping the core

  • Check the core and debug dump for signs of rgmanager having incorrect data for the service owner

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.