A resource configured with 'op start on-fail=ignore' still moves to another node after a start failure in a RHEL 6 High Availability cluster with pacemaker

Solution Unverified - Updated -

Issue

  • I have a resource may fail on start. I do have op start on-fail=ignore set on the resource, and from what I understand, if resource will fail on start, the failure will be ignored, and resource will not be moved to or started on other nodes. The log confirms that on 'start' resource script does return '1', but then Pacemaker instead of leaving the resource as failed, tries to start it on other nodes.
Feb 25 15:55:23 rhel6-node1-pcmk crmd[19078]:  warning: status_from_rc: Action 15 (hascript_start_0) on rhel6-node2-pcmk.example.com failed (target: 0 vs. rc: 1): Error
Feb 25 15:55:23 rhel6-node1-pcmk crmd[19078]:  warning: update_failcount: Updating failcount for hascript on rhel6-node2-pcmk.example.com after failed start: rc=1 (update=INFINITY, time=1424897723)
Feb 25 15:55:23 rhel6-node1-pcmk crmd[19078]:  warning: update_failcount: Updating failcount for hascript on rhel6-node2-pcmk.example.com after failed start: rc=1 (update=INFINITY, time=1424897723)
Feb 25 15:55:23 rhel6-node1-pcmk crmd[19078]:   notice: run_graph: Transition 2285 (Complete=1, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-68.bz2): Stopped
Feb 25 15:55:23 rhel6-node1-pcmk pengine[19077]:  warning: unpack_rsc_op: Remapping hascript_last_failure_0 (rc=1) on rhel6-node2-pcmk.example.com to DONE: ignore
Feb 25 15:55:23 rhel6-node1-pcmk crmd[19078]:   notice: te_rsc_command: Initiating action 17: monitor hascript_monitor_60000 on rhel6-node2-pcmk.example.com
Feb 25 15:55:23 rhel6-node1-pcmk pengine[19077]:   notice: process_pe_message: Calculated Transition 2286: /var/lib/pacemaker/pengine/pe-input-69.bz2
Feb 25 15:55:23 rhel6-node1-pcmk crmd[19078]:   notice: run_graph: Transition 2286 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-69.bz2): Complete
Feb 25 15:55:23 rhel6-node1-pcmk pengine[19077]:  warning: unpack_rsc_op: Remapping hascript_last_failure_0 (rc=1) on rhel6-node2-pcmk.example.com to DONE: ignore
Feb 25 15:55:23 rhel6-node1-pcmk pengine[19077]:  warning: common_apply_stickiness: Forcing hascript away from rhel6-node2-pcmk.example.com after 1000000 failures (max=1000000)
Feb 25 15:55:23 rhel6-node1-pcmk pengine[19077]:   notice: LogActions: Move    hascript#011(Started rhel6-node2-pcmk.example.com -> rhel6-node1-pcmk.example.com)

Environment

  • Red Hat Enterprise Linux (RHEL) 6 or 7 with the High Availability Add On
  • pacemaker
    • RHEL 7: pacemaker releases prior to 1.1.13-10.el7
    • RHEL 6: All releases
  • One or more resources configured with an op start on-fail=start
    • That resource fails on start

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content