A stonith device is failing to start and/or reporting "Timed Out" errors in a RHEL High Availability cluster with pacemaker
Issue
- The
pcs status
command shows"Timed Out"
errors for one or more stonith devices.
fence_node1_start_0 on node2.example.com 'unknown error' (1): call=48, status=Timed Out, last-rc-change='Fri Sep 5 15:50:46 2014', queued=21022ms, exec=0ms
- Stonith device monitor or start operations are timing out and reporting errors similar to those shown below.
Jun 01 11:36:07 node1.example.com crmd[2807]: notice: process_lrm_event: Operation fence_node_5356_monitor_0: not running (node=node1.example.com, call=311, rc=7, cib-upda...nfirmed=true)
Jun 01 11:36:27 node1.example.com stonith-ng[2803]: notice: stonith_action_async_done: Child process 3114 performing action 'monitor' timed out with signal 15
Jun 01 11:36:27 node1.example.com stonith-ng[2803]: notice: log_operation: Operation 'monitor' [3114] for device 'fence_node2' returned: -62 (Timer expired)
Jun 01 11:36:28 node1.example.com crmd[2807]: error: process_lrm_event: Operation fence_node_node2_start_0: Timed Out (node=node1.example.com, call=312, timeout=20000ms)
- Stonith devices are stuck in a
Stopped
state with"Timed Out"
errors.
Environment
- Red Hat Enterprise Linux 6, 7, or 8 (with the High Availability Add-on)
- Pacemaker
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.