nova-compute resource times out on stop triggering a controller fence with RHOSP 8
Issue
- We issued a
pcs resource restart
on ouropenstack-nova-api
resource and all the compute nodes are fenced by Pacemaker and all the VM instances stopped. - The
nova-compute
resource is regularly timing out on stop during a resource restart or cluster node shutdown, causing controller node fencing nova-compute
fails withOCF_TIMEOUT
- Our
nova-compute
resource is failing to stop due to a timeout error
Sep 03 15:25:03 [3480] node1 pengine: warning: unpack_rsc_op_failure: Processing failed op stop for nova-compute:0 on compute-l01: OCF_TIMEOUT (198)
Sep 03 15:25:03 [3480] node1 pengine: warning: pe_fence_node: Node compute1 will be fenced because of resource failure(s)
Sep 01 16:46:44 [3485] node1 crmd: info: process_graph_event: Detected action (296.568) nova-compute_stop_0.73608=OCF_TIMEOUT: failed
Sep 01 16:46:44 [3485] node1 crmd: warning: status_from_rc: Action 568 (nova-compute_stop_0) on compute-l02 failed (target: 0 vs. rc: 198): Error
Sep 01 16:46:44 [3485] node1 crmd: info: abort_transition_graph: Transition aborted by nova-compute_stop_0 'modify' on node2: Event failed (magic=2:198;568:296:0:3dd776e6-e59b-4d8b-b1b8-fbce832e096f, cib=0.316.2, source=match_graph_event:381, 0)
Sep 01 16:46:44 [3485] node1 crmd: info: match_graph_event: Action nova-compute_stop_0 (568) confirmed on compute-l02 (rc=198)
Sep 01 16:46:44 [3485] node1 crmd: info: update_failcount: Updating failcount for nova-compute on compute-l02 after failed stop: rc=198 (update=INFINITY, time=1472716004)
Sep 01 16:46:44 [3485] node1 crmd: info: process_graph_event: Detected action (296.568) nova-compute_stop_0.73608=OCF_TIMEOUT: failed
Environment
- Red Hat Openstack Platform (RHOSP) 8
- Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add On for RHEL-OSP controller nodes
nova-compute
managed by the High Availability cluster
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.