Rgmanager services fail to start with "caught signal 13" when ldap is unavailable on Red Hat Enterprise Linux 4

Solution Unverified - Updated -

Issue

  • At 13:39CET today, all of the cluster resources on one node of a two node cluster stopped running. It appears from the logs that the cluster shut them down, but it's not clear why. There were errors about "signal 13" (SIGPIPE) from the status scripts, not clear what caused this.
  • Red Hat cluster resources have had issues during an outage to our ldap systems and are now in a recoverable state. We have been unable to disable/stop/enable/clear the resources.

Environment

  • Red Hat Enterprise Linux 4 (RHEL4), including
    • Red Hat Cluster Suite 4+
  • rgmanager to manage services
    • Non-Red Hat script-based service that runs sshd service, that receives a SIGPIPE shortly after starting and dies.
  • ldap-based network authentication being unavailable can trigger this issue
    • It is unknown if other network-based authentication methods can also cause this issue.

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content