- Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add On
- One or more
stonithdevices configured to use
fence_ipmilanas the agent
fence_ipmilandevice is reporting "Connection timed out"
Jun 01 11:44:45  node1.example.com stonith-ng: notice: log_operation: Operation 'monitor'  for device 'fence_node1' returned: -110 (Connection timed out) Jun 01 11:44:45  node1.example.com stonith-ng: warning: log_operation: fence_node1:12742 [ Connection timed out ] Jun 01 11:44:45  node1.example.com stonith-ng: warning: log_operation: fence_node1:12742 [ ] Jun 01 11:44:45  node1.example.com stonith-ng: warning: log_operation: fence_node1:12742 [ ] Jun 01 11:44:45  node1.example.com lrmd: info: log_finished: finished - rsc:fence_node1 action:start call_id:312 exit-code:1 exec-time:41116ms queue-time:1ms Jun 01 11:44:46  node1.example.com crmd: error: process_lrm_event: Operation fence_node1_start_0 (node=node1.example.com, call=312, status=4, cib-update=803, confirmed=true) Error
stonithdevices are failing to start and failing monitor operations. I've configured my
stonithdevices several times with different options regarding the timeouts like
stonith-timeout, but nothing helped.
power_timeout value in the device's attributes that is higher than the default 20 seconds.
# # Example: # pcs stonith create <device> <agent> <attributes> power_timeout=<seconds> # pcs stonith create node1_ipmi fence_ipmilan ipaddr=node1-ipmi.example.com lanplus=1 login=admin password='a2@7czD44#pQrs7UX.' power_timeout=60
# # Example # pcs stonith update <device> power_timeout=<seconds> # pcs stonith update node1_ipmi power_timeout=60
In RHEL 7 Update 1 (
fence_ipmilan agent was updated to a new implementation that utilizes the shared fencing library that many other fence agents use. The error message being reported here ("Connection timed out") is one of the standard errors that agents using this library can report, and in the case of
fence_ipmilan it means that the
ipmitool command it spawned did not return within the timeout that was allocated for it; that timeout is controlled by the
power_timeout attribute, which defaults to 20 seconds. So, increasing this timeout will give the device more time to complete the command, hopefully avoiding the error.
For information on resolving timeout errors more generally, see this related solution.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.