double increment of failcount upon single failure of start operation

Solution Verified - Updated -

Issue

When certain resource fails to start the failcount value increases twice instead of only once in some cases. Example (one start failure but failcount increases from 2 to 3 and then from 3 to 4):

Feb 27 12:33:04 [13688] vm2 stonith-ng: (       xml.c:2089  )   debug: xml_patch_version_check: Can apply patch 0.594.9 to 0.594.8
Feb 27 12:33:04 [13690] vm2      attrd: (  commands.c:757   )    info: attrd_peer_update:       Setting fail-count-myapache2[vm2]: 2 -> 3 from vm1
Feb 27 12:33:04 [13690] vm2      attrd: (  commands.c:757   )    info: attrd_peer_update:       Setting last-failure-myapache2[vm2]: 1519702379 -> 1519702383 from vm1
Feb 27 12:33:04 [13690] vm2      attrd: (  commands.c:757   )    info: attrd_peer_update:       Setting fail-count-myapache2[vm2]: 3 -> 4 from vm1
Feb 27 12:33:04 [13687] vm2        cib: ( cib_utils.c:285   )   debug: cib_acl_enabled: CIB ACL is disabled
Feb 27 12:33:04 [13687] vm2        cib: (   cib_ops.c:378   )   debug: cib_process_modify:      Destroying /cib/status/node_state[2]/transient_attributes/instance_attributes/nvpair[2]

The nature of this problem is random. The problem doesn't happen every time - it has been observed that it happens only when fail count and time get updated in separate events. If both are updated in same event the overall failcount value gets incremented by 1 as expected. This doesn't seem to be resource-type specific as it has been observed with various resources.

Environment

  • Red Hat Enterprise Linux Server 7, 8 (with the High Availability Add On)
  • pacemaker-1.1.15-11.el7_3.4.x86_64 or later

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content