fence_compute in OpenStack HA scale-out environments times out during unfencing.

Solution Verified - Updated -

Issue

During cluster startup, many of the "unfencing" operations fail, and report a timeout during Operation 'on'. All of these timeouts are additionally reported at the same time:

  • All of the timeout operations report the exact same timestamp for the failure.
  • Some reported unfencing may succeed, but most will report a failure.
  • Issue is particularly common on OpenStack scale-out configurations with a lot of attached compute nodes.
$ grep -i "pacemaker-fenced.*timed out:" /var/log/messages
-----------------------------------------8<----------------------------------------- 
Nov  5 14:15:30 openstack-controler1 pacemaker-fenced[4690]: error: Operation 'on' targeting openstack-computek1408 by openstack-controler1 for pacmeaker-controld.785775@openstack-controler1: Error occurred (Timed Out: Fencing did not complete within a total timeout based on the configured timeout and retries for an...
Nov  5 14:15:30 openstack-controler1 pacemaker-fenced[4690]: error: Operation 'on' targeting openstack-computek2103 by openstack-controler1 for pacemaker-controld.785775@openstack-controler1: Error occurred (Timed Out: Fencing did not complete within a total timeout based on the configured timeout and retries for an...
Nov  5 14:15:30 openstack-controler1 pacemaker-fenced[4690]: notice: Action 'on' targeting openstack-computek2304 using stonith-fence_compute-fence-nova on behalf of pacemaker-controld.785775@openstack-controler1: complete
Nov  5 14:15:30 openstack-controler1 pacemaker-fenced[4690]: notice: Operation 'on' targeting openstack-computek2304 by vszrm-soe2614ao1201 for pacemaker-controld.785775@openstack-controler1: OK (complete)
Nov  5 14:15:30 openstack-controler1 pacemaker-fenced[4690]: error: Operation 'on' targeting openstack-computek2301 by openstack-controler1 for pacmeaker-controld.785775@openstack-controler1: Error occurred (Timed Out: Fencing did not complete within a total timeout based on the configured timeout and retries for an...
Nov  5 14:15:30 openstack-controler1 pacemaker-fenced[4690]: error: Operation 'on' targeting openstack-computeo1409 by openstack-controler1 for pacmeaker-controld.785775@openstack-controler1: Error occurred (Timed Out: Fencing did not complete within a total timeout based on the configured timeout and retries for an...
Nov  5 14:15:30 openstack-controler1 pacemaker-fenced[4690]: error: Operation 'on' targeting openstack-computek2206 by openstack-controler1 for pacmeaker-controld.785775@openstack-controler1: Error occurred (Timed Out: Fencing did not complete within a total timeout based on the configured timeout and retries for an...
Nov  5 14:15:30 openstack-controler1 pacemaker-fenced[4690]: error: Operation 'on' targeting openstack-computek2207 by openstack-controler1 for pacmeaker-controld.785775@openstack-controler1: Error occurred (Timed Out: Fencing did not complete within a total timeout based on the configured timeout and retries for an...
Nov  5 14:15:30 openstack-controler1 pacemaker-fenced[4690]: error: Operation 'on' targeting openstack-computeo1414 by openstack-controler1 for pacmeaker-controld.785775@openstack-controler1: Error occurred (Timed Out: Fencing did not complete within a total timeout based on the configured timeout and retries for an...
Nov  5 14:15:30 openstack-controler1 pacemaker-fenced[4690]: error: Operation 'on' targeting openstack-computek1003 by openstack-controler1 for pacmeaker-controld.785775@openstack-controler1: Error occurred (Timed Out: Fencing did not complete within a total timeout based on the configured timeout and retries for an...
Nov  5 14:15:30 openstack-controler1 pacemaker-fenced[4690]: error: Operation 'on' targeting openstack-computek2305 by openstack-controler1 for pacmeaker-controld.785775@openstack-controler1: Error occurred (Timed Out: Fencing did not complete within a total timeout based on the configured timeout and retries for an...
Nov  5 14:15:30 openstack-controler1 pacemaker-fenced[4690]: error: Operation 'on' targeting openstack-computeo2302 by openstack-controler1 for pacmeaker-controld.785775@openstack-controler1: Error occurred (Timed Out: Fencing did not complete within a total timeout based on the configured timeout and retries for an...
Nov  5 14:15:30 openstack-controler1 pacemaker-fenced[4690]: error: Operation 'on' targeting openstack-computek2208 by openstack-controler1 for pacmeaker-controld.785775@openstack-controler1: Error occurred (Timed Out: Fencing did not complete within a total timeout based on the configured timeout and retries for an...
-----------------------------------------8<----------------------------------------- 
Nov  5 14:15:30 openstack-controler1 pacemaker-fenced[4690]: error: Operation 'on' targeting openstack-compute01405 by openstack-controler1 for pacemaker-controld.785775@openstack-controler1: Error occurred (Timed Out: Fencing did not complete within a total timeout based on the configured timeout and retries for an...
Nov  5 14:15:30 openstack-controler1 pacemaker-fenced[4690]: error: Operation 'on' targeting openstack-computeo0712 by openstack-controler1 for pacemaker-controld.785775@openstack-controler1: Error occurred (Timed Out: Fencing did not complete within a total timeout based on the configured timeout and retries for an...
Nov  5 14:15:30 openstack-controler1 pacemaker-fenced[4690]: error: Operation 'on' targeting openstack-computek1215 by openstack-controler1 for pacemaker-controld.785775@openstack-controler1: Error occurred (Timed Out: Fencing did not complete within a total timeout based on the configured timeout and retries for an...
Nov  5 14:15:30 openstack-controler1 pacemaker-fenced[4690]: error: Operation 'on' targeting openstack-computeo1209 by openstack-controler1 for pacemaker-controld.785775@openstack-controler1: Error occurred (Timed Out: Fencing did not complete within a total timeout based on the configured timeout and retries for an...
Nov  5 14:15:30 openstack-controler1 pacemaker-fenced[4690]: error: Operation 'on' targeting openstack-computeo1212 by openstack-controler1 for pacemaker-controld.785775@openstack-controler1: Error occurred (Timed Out: Fencing did not complete within a total timeout based on the configured timeout and retries for an...
Nov  5 14:15:30 openstack-controler1 pacemaker-fenced[4690]: error: Operation 'on' targeting openstack-computeo1208 by openstack-controler1 for pacemaker-controld.785775@openstack-controler1: Error occurred (Timed Out: Fencing did not complete within a total timeout based on the configured timeout and retries for an...

Environment

  • Red Hat Enterprise Linux (RHEL) 7, 8, 9 with the High Availability or Resilient Storage Add On
  • OpenStack scale-out Environments

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content