fence_ipmilan reports "Connection timed out" in a RHEL 7 High Availability cluster
Environment
- Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add On
- One or more
stonith
devices configured to usefence_ipmilan
as the agent fence-agents-ipmilan-4.0.11-11.el7
or later
Issue
- My
fence_ipmilan
device is reporting "Connection timed out"
Jun 01 11:44:45 [2824] node1.example.com stonith-ng: notice: log_operation: Operation 'monitor' [12742] for device 'fence_node1' returned: -110 (Connection timed out)
Jun 01 11:44:45 [2824] node1.example.com stonith-ng: warning: log_operation: fence_node1:12742 [ Connection timed out ]
Jun 01 11:44:45 [2824] node1.example.com stonith-ng: warning: log_operation: fence_node1:12742 [ ]
Jun 01 11:44:45 [2824] node1.example.com stonith-ng: warning: log_operation: fence_node1:12742 [ ]
Jun 01 11:44:45 [2825] node1.example.com lrmd: info: log_finished: finished - rsc:fence_node1 action:start call_id:312 exit-code:1 exec-time:41116ms queue-time:1ms
Jun 01 11:44:46 [2828] node1.example.com crmd: error: process_lrm_event: Operation fence_node1_start_0 (node=node1.example.com, call=312, status=4, cib-update=803, confirmed=true) Error
- My
stonith
devices are failing to start and failing monitor operations. I've configured mystonith
devices several times with different options regarding the timeouts likepcmk_monitor_timeout=120s
,stonith-timeout
, but nothing helped.
Resolution
Set a power_timeout
value in the device's attributes that is higher than the default 20 seconds.
# # Example: # pcs stonith create <device> <agent> <attributes> power_timeout=<seconds>
# pcs stonith create node1_ipmi fence_ipmilan ipaddr=node1-ipmi.example.com lanplus=1 login=admin password='a2@7czD44#pQrs7UX.' power_timeout=60
# # Example # pcs stonith update <device> power_timeout=<seconds>
# pcs stonith update node1_ipmi power_timeout=60
Root Cause
In RHEL 7 Update 1 (fence-agents-ipmilan-4.0.11-11.el7
), the fence_ipmilan
agent was updated to a new implementation that utilizes the shared fencing library that many other fence agents use. The error message being reported here ("Connection timed out") is one of the standard errors that agents using this library can report, and in the case of fence_ipmilan
it means that the ipmitool
command it spawned did not return within the timeout that was allocated for it; that timeout is controlled by the power_timeout
attribute, which defaults to 20 seconds. So, increasing this timeout will give the device more time to complete the command, hopefully avoiding the error.
For information on resolving timeout errors more generally, see this related solution.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments