Systemd service fails to start in Pacemaker and reports "error (inactive)"

Solution Verified - Updated -

Issue

When starting many systemd services in pacemaker at the same time, some of the resource report a failure during start due to an inactive error. This occurs within a few seconds of initiating the start operation initiating:

$ pcs status
------------------------------------>8----------------------------------------
Migration Summary:
  * Node: rhel8-node2 (2):
    * test1-service: migration-threshold=1000000 fail-count=1000000 last-failure='Mon May 15 08:59:39 2023'
    * test2-service: migration-threshold=1000000 fail-count=1000000 last-failure='Mon May 15 08:59:39 2023'
    * test3-service: migration-threshold=1000000 fail-count=1000000 last-failure='Mon May 15 08:59:39 2023'

Failed Resource Actions:
  * test1-service_start_0 on rhel8-node2 'error' (1): call=362, status='complete', exitreason='inactive', last-rc-change='Mon May 15 08:59:40 2023', queued=0ms, exec=4064ms
  * test2-service_start_0 on rhel8-node2 'error' (1): call=369, status='complete', exitreason='inactive', last-rc-change='Mon May 15 08:59:41 2023', queued=0ms, exec=3820ms
  * test3-service_start_0 on rhel8-node2 'error' (1): call=364, status='complete', exitreason='inactive', last-rc-change='Mon May 15 08:59:40 2023', queued=0ms, exec=4027ms
$ cat ./plwmscups02-May11/sos_commands/logs/journalctl_--no-pager
----------------------------------->8-----------------------------------------
May 10 12:21:38 plwmscups02 pacemaker-controld[2825486]:  notice: Initiating start operation test1-service_start_0 locally on rhel8-node2
------------------------------------>8----------------------------------------
May 10 12:21:42 rhel8-node2 pacemaker-controld[2825486]:  notice: Result of start operation for test1-service on rhel8-node2: error (inactive) <---
May 10 12:21:42 rhel8-node2 pacemaker-controld[2825486]:  notice: Result of start operation for test2-service on rhel8-node2: error (inactive) <---
May 10 12:21:42 rhel8-node2 pacemaker-controld[2825486]:  notice: Result of start operation for test3-service on rhel8-node2: error (inactive) <---
May 10 12:21:42 rhel8-node2 pacemaker-controld[2825486]:  notice: Transition 55 aborted by operation test1-service_start_0 'modify' on rhel8-node2: Event failed
May 10 12:21:42 rhel8-node2 pacemaker-controld[2825486]:  notice: Transition 55 action 227 (test1-service_start_0 on rhel8-node2): expected 'ok' but got 'error'
May 10 12:21:42 rhel8-node2 pacemaker-attrd[2825484]:  notice: Setting fail-count-test1-service#start_0[rhel8-node2]: (unset) -> INFINITY
May 10 12:21:42 rhel8-node2 pacemaker-attrd[2825484]:  notice: Setting last-failure-test1-service#start_0[rhel8-node2]: (unset) -> 1683739302

Despite the cluster failure, the actual systemd service may report started in the shortly after:

$ cat /var/log/messages
----------------------------------->8-----------------------------------------
May 10 12:21:42 rhel8-node2 systemd[1]: Starting Cluster Controlled test1...  
----------------------------------->8-----------------------------------------
May 10 12:22:30 rhel8-node2 podman[3255228]: test1
May 10 12:22:30 rhel8-node2 systemd[1]: Started Cluster Controlled test1. <--- service started

Environment

  • Red Hat Enterprise Linux 7, 8 and 9
  • High Availability w/ Pacemaker
  • Systemd Pacemaker Resources

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content