Why does rgmanager fail service migration with error '#75: Failed changing service status' during cman membership transition?
Issue
- When stopping cluster services (rgmanager, cman) on one node which is running an rgmanager service, it is possible to hit a timing issue which causes that service to fail starting on the other node with the message:
#75: Failed changing service status
-
The surviving node is unable to take over the services because it can not receive the acknowledgement from cman/ais on the failed node leading to timeouts in view formation and the service terminates with an error.
Dec 15 14:49:48 lsphc1e02 clurgmgrd[8702]: <notice> Member 1 shutting down Dec 15 14:49:54 lsphc1e02 clurgmgrd[8702]: <notice> Starting stopped service service:lsphc1e-srv ... Dec 15 14:50:32 lsphc1e02 clurgmgrd[8702]: <err> #75: Failed changing service status Dec 15 14:50:32 lsphc1e02 clurgmgrd[8702]: <notice> Stopping service service:lsphc1e-srv
Environment
- Red Hat Enterprise Linux 5 (RHEL5) with the High Availability Add on.
- Red Hat High Availability cluster with 2 or more nodes:
- rgmanager version prior to rgmanager-2.0.52-6.el5
- NOTE: If a simlar issue is enountered on rgmanager-2.0.52-6.el5 or later, please refer to Cluster service fails to start or relocate with
#75: Failed changing service status
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.