A stonith device is failing to start and/or reporting "Timed Out" errors in a RHEL High Availability cluster with pacemaker

Solution Verified - Updated -

Issue

  • The pcs status command shows "Timed Out" errors for one or more stonith devices.
fence_node1_start_0 on node2.example.com 'unknown error' (1): call=48, status=Timed Out, last-rc-change='Fri Sep 5 15:50:46 2014', queued=21022ms, exec=0ms
  • Stonith device monitor or start operations are timing out and reporting errors similar to those shown below.
Jun 01 11:36:07 node1.example.com crmd[2807]: notice: process_lrm_event: Operation fence_node_5356_monitor_0: not running (node=node1.example.com, call=311, rc=7, cib-upda...nfirmed=true)
Jun 01 11:36:27 node1.example.com stonith-ng[2803]: notice: stonith_action_async_done: Child process 3114 performing action 'monitor' timed out with signal 15
Jun 01 11:36:27 node1.example.com stonith-ng[2803]: notice: log_operation: Operation 'monitor' [3114] for device 'fence_node2' returned: -62 (Timer expired)
Jun 01 11:36:28 node1.example.com crmd[2807]: error: process_lrm_event: Operation fence_node_node2_start_0: Timed Out (node=node1.example.com, call=312, timeout=20000ms)
  • Stonith devices are stuck in a Stopped state with "Timed Out" errors.

Environment

  • Red Hat Enterprise Linux 6, 7, or 8 (with the High Availability Add-on)
  • Pacemaker

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In