A stonith device is failing to start and/or reporting "Timed Out" errors in a RHEL High Availability cluster with pacemaker

Solution Verified - Updated -

Issue

  • The pcs status command shows "Timed Out" errors for one or more stonith devices.
fence_node1_start_0 on node2.example.com 'unknown error' (1): call=48, status=Timed Out, last-rc-change='Fri Sep 5 15:50:46 2014', queued=21022ms, exec=0ms
  • Stonith device monitor or start operations are timing out and reporting errors similar to those shown below.
Jun 01 11:36:07 node1.example.com crmd[2807]: notice: process_lrm_event: Operation fence_node_5356_monitor_0: not running (node=node1.example.com, call=311, rc=7, cib-upda...nfirmed=true)
Jun 01 11:36:27 node1.example.com stonith-ng[2803]: notice: stonith_action_async_done: Child process 3114 performing action 'monitor' timed out with signal 15
Jun 01 11:36:27 node1.example.com stonith-ng[2803]: notice: log_operation: Operation 'monitor' [3114] for device 'fence_node2' returned: -62 (Timer expired)
Jun 01 11:36:28 node1.example.com crmd[2807]: error: process_lrm_event: Operation fence_node_node2_start_0: Timed Out (node=node1.example.com, call=312, timeout=20000ms)
  • Stonith devices are stuck in a Stopped state with "Timed Out" errors.

Environment

  • Red Hat Enterprise Linux 6, 7, or 8 (with the High Availability Add-on)
  • Pacemaker

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content