Stack update fails during WorkflowTasks_Step2_Execution

Solution In Progress - Updated -

Issue

  • A stack update, specifically a scale up of one node, is failing repeatedly at the same step:
2020-04-15 22:45:34Z [overcloud-mycloud-AllNodesDeploySteps-wkbflvledpui.WorkflowTasks_Step2_Execution]: CREATE_FAILED  resources.WorkflowTasks_Step2_Execution: Failure caused by error in tasks: ceph_base_ansible_workflow
  ceph_base_ansible_workflow [task_ex_id=e65de45a-8015-4095-9db9-c692ca170b30] -> Failure caused by error in tasks: enable_ssh_admin
  enable_ssh_admin
2020-04-15 22:45:34Z [overcloud-mycloud-AllNodesDeploySteps-wkbflvledpui]: UPDATE_FAILED  Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: Failure caused by error in tasks: ceph_base_ansible_workflow
  ceph_base_ansible_workflow [task_ex_id=e65de45a-8015-4095-9db9-c692ca170b30] -> Failure caused by error in tasks: enable_ssh_a
2020-04-15 22:45:34Z [AllNodesDeploySteps]: UPDATE_FAILED  resources.AllNodesDeploySteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: Failure caused by error in tasks: ceph_base_ansible_workflow
  ceph_base_ansible_workflow [task_ex_id=e65de45a-8015-4095-9db9-c692ca170b30] -> Failure caused 
2020-04-15 22:45:34Z [overcloud-mycloud]: UPDATE_FAILED  Resource UPDATE failed: resources.AllNodesDeploySteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: Failure caused by error in tasks: ceph_base_ansible_workflow
  ceph_base_ansible_workflow [task_ex_id=e65de45a-8015-4095-9db9-c692ca17
 Stack overcloud-mycloud UPDATE_FAILED 
overcloud-mycloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution:
  resource_type: OS::TripleO::WorkflowSteps
  physical_resource_id: aac65618-c794-46f7-9ff5-523cc4fa6c03
  status: CREATE_FAILED
  status_reason: |
    ...
      deploy_config [task_ex_id=fdc9c136-956e-4b26-b800-a2c9f80ec4bd] -> Timeout for heat deployment 'create_admin'
        [action_ex_id=4934d7d9-30f5-4d15-ab84-2802f9f4c39e, idx=0]: Timeout for heat deployment 'create_admin'
      send_message [task_ex_id=6a19362e-2e4d-474c-bec6-6cd3f8cc5125] -> Workflow failed due to message status
        [wf_ex_id=8e313e42-d279-4cf5-8435-7bf0e3b6d2a7, idx=0]: Workflow failed due to message status

Heat Stack update failed.
Heat Stack update failed.

real    33m31.827s
user    0m5.911s
sys     0m0.445s
  • One key piece of info is when the failure actually occurs at 22:45:34Z.

  • If you take a look at the logs of every single overcloud node...

(undercloud) [stack@director ~]$ ansible -i hosts.yml  -m shell -a 'grep AUTH_463490ef385049a5a03c36d5a67e85fc/create_admin /var/log/messages -A1 | egrep -v "ansible|vswitchd|kernel|--|Docker" | tail -n2' -b overcloud
[WARNING]: log file at /var/log/ansible.log is not writeable and we cannot create it, aborting

overcloud-mycloud-controller-1 | SUCCESS | rc=0 >>
Apr 15 22:40:23 overcloud-mycloud-controller-1 os-collect-config: [2020-04-15 22:40:23,407] (heat-config) [DEBUG] [2020-04-15 22:40:23,355] (heat-config-notify) [DEBUG] Signaling to http://10.10.10.10:8080/v1/AUTH_463490ef385049a5a03c36d5a67e85fc/create_admin-ff46e19c-3dde-4af0-9b48-611eb49511e8/b40fd58a-325e-40dc-8085-92693e71e489?temp_url_sig=05666cc56963471abe599ef1b18e2e51f0395fad&temp_url_expires=1587008412 via PUT
Apr 15 22:40:23 overcloud-mycloud-controller-1 os-collect-config: [2020-04-15 22:40:23,371] (heat-config-notify) [DEBUG] Response <Response [201]>

overcloud-mycloud-controller-0 | SUCCESS | rc=0 >>
Apr 15 22:40:34 overcloud-mycloud-controller-0 os-collect-config: [2020-04-15 22:40:34,533] (heat-config) [DEBUG] [2020-04-15 22:40:34,487] (heat-config-notify) [DEBUG] Signaling to http://10.10.10.10:8080/v1/AUTH_463490ef385049a5a03c36d5a67e85fc/create_admin-7716260a-1039-4102-bd34-984b111337c4/1d3aa19e-9b8d-4bd2-a6e4-1e1c74a86ff0?temp_url_sig=42e35d54c0547593e21899257dc15697ccd97113&temp_url_expires=1587008412 via PUT
Apr 15 22:40:34 overcloud-mycloud-controller-0 os-collect-config: [2020-04-15 22:40:34,502] (heat-config-notify) [DEBUG] Response <Response [201]>

overcloud-mycloud-controller-2 | SUCCESS | rc=0 >>
Apr 15 22:40:25 overcloud-mycloud-controller-2 os-collect-config: [2020-04-15 22:40:25,018] (heat-config) [DEBUG] [2020-04-15 22:40:24,975] (heat-config-notify) [DEBUG] Signaling to http://10.10.10.10:8080/v1/AUTH_463490ef385049a5a03c36d5a67e85fc/create_admin-67b7e4ab-2ca6-4665-aa90-9ebca5c88ab6/a88c290e-e666-4d1d-8027-df59ec469ac0?temp_url_sig=f5bfc6752bdba91ba57163da62b5a426d021925c&temp_url_expires=1587008412 via PUT
Apr 15 22:40:25 overcloud-mycloud-controller-2 os-collect-config: [2020-04-15 22:40:24,989] (heat-config-notify) [DEBUG] Response <Response [201]>

overcloud-mycloud-novacompute-dvr-15 | SUCCESS | rc=0 >>
Apr 15 22:40:35 overcloud-mycloud-novacompute-dvr-15 os-collect-config: [2020-04-15 22:40:35,127] (heat-config) [DEBUG] [2020-04-15 22:40:35,085] (heat-config-notify) [DEBUG] Signaling to http://10.10.10.10:8080/v1/AUTH_463490ef385049a5a03c36d5a67e85fc/create_admin-0a35da40-dc48-48ce-bcef-264281ecc586/e80a3680-7b57-4b43-a2a8-ecad9d1745d0?temp_url_sig=fbaf48aae530498c7f4f3a5e10926fed48bcd43a&temp_url_expires=1587008412 via PUT
Apr 15 22:40:35 overcloud-mycloud-novacompute-dvr-15 os-collect-config: [2020-04-15 22:40:35,101] (heat-config-notify) [DEBUG] Response <Response [201]>

overcloud-mycloud-novacompute-dvr-14 | SUCCESS | rc=0 >>
Apr 15 22:47:24 overcloud-mycloud-novacompute-dvr-14 os-collect-config: [2020-04-15 22:47:24,328] (heat-config) [DEBUG] [2020-04-15 22:47:24,289] (heat-config-notify) [DEBUG] Signaling to http://10.10.10.10:8080/v1/AUTH_463490ef385049a5a03c36d5a67e85fc/create_admin-b1007bf8-0b9c-40e7-83b2-109887b85c3e/5e8ac65d-6dc4-40ba-b4fd-c8db4c0897b6?temp_url_sig=9c99a515f2444b9d4493eee3d374291508b41eed&temp_url_expires=1587008412 via PUT
Apr 15 22:47:24 overcloud-mycloud-novacompute-dvr-14 os-collect-config: [2020-04-15 22:47:24,299] (heat-config-notify) [DEBUG] Response <Response [404]>

overcloud-mycloud-novacompute-dvr-11 | SUCCESS | rc=0 >>
Apr 15 22:45:12 overcloud-mycloud-novacompute-dvr-11 os-collect-config: [2020-04-15 22:45:12,548] (heat-config) [DEBUG] [2020-04-15 22:45:12,491] (heat-config-notify) [DEBUG] Signaling to http://10.10.10.10:8080/v1/AUTH_463490ef385049a5a03c36d5a67e85fc/create_admin-525c475d-3bb5-4dd0-9d7d-b3ac4a2002cb/430add10-c5dc-4854-95ee-b6aa1f3a35bb?temp_url_sig=ea15bbdc74cf349ce841b4049340b522271d5205&temp_url_expires=1587008412 via PUT
Apr 15 22:45:12 overcloud-mycloud-novacompute-dvr-11 os-collect-config: [2020-04-15 22:45:12,507] (heat-config-notify) [DEBUG] Response <Response [201]>

overcloud-mycloud-novacompute-dvr-10 | SUCCESS | rc=0 >>
Apr 15 22:45:05 overcloud-mycloud-novacompute-dvr-10 os-collect-config: [2020-04-15 22:45:05,997] (heat-config) [DEBUG] [2020-04-15 22:45:05,940] (heat-config-notify) [DEBUG] Signaling to http://10.10.10.10:8080/v1/AUTH_463490ef385049a5a03c36d5a67e85fc/create_admin-55d5d6d7-8ce3-454d-b0b1-f0546be5e5e7/0dd72120-275e-4696-a894-08d3ab40ada4?temp_url_sig=84709f6c4b5ce0f0a9e1e5261fb799ce1068fb2d&temp_url_expires=1587008412 via PUT
Apr 15 22:45:05 overcloud-mycloud-novacompute-dvr-10 os-collect-config: [2020-04-15 22:45:05,955] (heat-config-notify) [DEBUG] Response <Response [201]>

overcloud-mycloud-novacompute-dvr-12 | SUCCESS | rc=0 >>
Apr 15 22:47:01 overcloud-mycloud-novacompute-dvr-12 os-collect-config: [2020-04-15 22:47:01,216] (heat-config) [DEBUG] [2020-04-15 22:47:01,162] (heat-config-notify) [DEBUG] Signaling to http://10.10.10.10:8080/v1/AUTH_463490ef385049a5a03c36d5a67e85fc/create_admin-eb72f884-9dca-4b32-bc96-0f38c081a8a5/a344be8b-47f0-4f1c-bf3c-ba96a7cb8b5b?temp_url_sig=29a402a3e08f5a4c311a9166d8b7834c217f04e0&temp_url_expires=1587008412 via PUT
Apr 15 22:47:01 overcloud-mycloud-novacompute-dvr-12 os-collect-config: [2020-04-15 22:47:01,172] (heat-config-notify) [DEBUG] Response <Response [404]>

overcloud-mycloud-novacompute-dvr-0 | SUCCESS | rc=0 >>
Apr 15 22:46:55 overcloud-mycloud-novacompute-dvr-0 os-collect-config: [2020-04-15 22:46:55,309] (heat-config) [DEBUG] [2020-04-15 22:46:55,253] (heat-config-notify) [DEBUG] Signaling to http://10.10.10.10:8080/v1/AUTH_463490ef385049a5a03c36d5a67e85fc/create_admin-ca1f4528-f1f2-410b-bf41-e3e71d171788/17ca3ca6-8c8e-4883-8e5c-8a953a5eef9b?temp_url_sig=cde86d95b7a82b4b42c7856eb00073930398d523&temp_url_expires=1587008412 via PUT
Apr 15 22:46:55 overcloud-mycloud-novacompute-dvr-0 os-collect-config: [2020-04-15 22:46:55,265] (heat-config-notify) [DEBUG] Response <Response [404]>

overcloud-mycloud-novacompute-dvr-1 | SUCCESS | rc=0 >>
Apr 15 22:46:00 overcloud-mycloud-novacompute-dvr-1 os-collect-config: [2020-04-15 22:46:00,665] (heat-config) [DEBUG] [2020-04-15 22:46:00,618] (heat-config-notify) [DEBUG] Signaling to http://10.10.10.10:8080/v1/AUTH_463490ef385049a5a03c36d5a67e85fc/create_admin-854565f1-a32f-4504-a710-6252e349b726/d620f616-16eb-42e7-8c62-27dce9827fa2?temp_url_sig=0e73fec83f25fed1a0ad05056a3dd88c86847a7f&temp_url_expires=1587008412 via PUT
Apr 15 22:46:00 overcloud-mycloud-novacompute-dvr-1 os-collect-config: [2020-04-15 22:46:00,628] (heat-config-notify) [DEBUG] Response <Response [404]>

overcloud-mycloud-novacompute-dvr-3 | SUCCESS | rc=0 >>
Apr 15 22:45:27 overcloud-mycloud-novacompute-dvr-3 os-collect-config: [2020-04-15 22:45:27,835] (heat-config) [DEBUG] [2020-04-15 22:45:27,783] (heat-config-notify) [DEBUG] Signaling to http://10.10.10.10:8080/v1/AUTH_463490ef385049a5a03c36d5a67e85fc/create_admin-92b9976d-f710-4d95-8c31-2a28afd18cb9/7f73375f-e49a-4adc-824e-34bc73ae4195?temp_url_sig=71ac1d3d2a35d72b3b70d2258ae8b420f3a7d10d&temp_url_expires=1587008412 via PUT
Apr 15 22:45:27 overcloud-mycloud-novacompute-dvr-3 os-collect-config: [2020-04-15 22:45:27,793] (heat-config-notify) [DEBUG] Response <Response [404]>

overcloud-mycloud-novacompute-dvr-2 | SUCCESS | rc=0 >>
Apr 15 22:49:08 overcloud-mycloud-novacompute-dvr-2 os-collect-config: [2020-04-15 22:49:08,655] (heat-config) [DEBUG] [2020-04-15 22:49:08,605] (heat-config-notify) [DEBUG] Signaling to http://10.10.10.10:8080/v1/AUTH_463490ef385049a5a03c36d5a67e85fc/create_admin-5c2064fd-508f-49df-8ea3-3d60f5ba9148/df4b005a-0209-4d75-8add-4ff64016c022?temp_url_sig=a1c9ad836fc78e89a02efcb46af9ea745dc34a17&temp_url_expires=1587008412 via PUT
Apr 15 22:49:08 overcloud-mycloud-novacompute-dvr-2 os-collect-config: [2020-04-15 22:49:08,615] (heat-config-notify) [DEBUG] Response <Response [404]>

overcloud-mycloud-novacompute-dvr-5 | SUCCESS | rc=0 >>
Apr 15 22:46:29 overcloud-mycloud-novacompute-dvr-5 os-collect-config: [2020-04-15 22:46:29,319] (heat-config) [DEBUG] [2020-04-15 22:46:29,265] (heat-config-notify) [DEBUG] Signaling to http://10.10.10.10:8080/v1/AUTH_463490ef385049a5a03c36d5a67e85fc/create_admin-47e33ad8-de7d-461d-b5a9-641a86c0e17a/35da49a3-2f2c-4e49-bd26-7de7f13e759a?temp_url_sig=d337811921bed8979479bb0f5573b1a13a1d9786&temp_url_expires=1587008412 via PUT
Apr 15 22:46:29 overcloud-mycloud-novacompute-dvr-5 os-collect-config: [2020-04-15 22:46:29,275] (heat-config-notify) [DEBUG] Response <Response [404]>

overcloud-mycloud-novacompute-dvr-4 | SUCCESS | rc=0 >>
Apr 15 22:48:18 overcloud-mycloud-novacompute-dvr-4 os-collect-config: [2020-04-15 22:48:18,371] (heat-config) [DEBUG] [2020-04-15 22:48:18,330] (heat-config-notify) [DEBUG] Signaling to http://10.10.10.10:8080/v1/AUTH_463490ef385049a5a03c36d5a67e85fc/create_admin-0621936c-e3cb-4baa-8180-edecc33b89ab/ca810e26-6d9d-434e-a8bc-e604df576e18?temp_url_sig=a370dd314df381ddb9caaf98aad0db9f7e3edb88&temp_url_expires=1587008412 via PUT
Apr 15 22:48:18 overcloud-mycloud-novacompute-dvr-4 os-collect-config: [2020-04-15 22:48:18,340] (heat-config-notify) [DEBUG] Response <Response [404]>

overcloud-mycloud-novacompute-dvr-7 | SUCCESS | rc=0 >>
Apr 15 22:47:08 overcloud-mycloud-novacompute-dvr-7 os-collect-config: [2020-04-15 22:47:08,620] (heat-config) [DEBUG] [2020-04-15 22:47:08,576] (heat-config-notify) [DEBUG] Signaling to http://10.10.10.10:8080/v1/AUTH_463490ef385049a5a03c36d5a67e85fc/create_admin-e2eb4502-eecc-445d-b69c-21b416cef8e1/aeedcd54-b007-485c-8bc0-9f43909accd5?temp_url_sig=7cef0a5da6272930fc0b8421b35d2021b59492cc&temp_url_expires=1587008412 via PUT
Apr 15 22:47:08 overcloud-mycloud-novacompute-dvr-7 os-collect-config: [2020-04-15 22:47:08,587] (heat-config-notify) [DEBUG] Response <Response [404]>

overcloud-mycloud-novacompute-dvr-6 | SUCCESS | rc=0 >>
Apr 15 22:46:28 overcloud-mycloud-novacompute-dvr-6 os-collect-config: [2020-04-15 22:46:28,383] (heat-config) [DEBUG] [2020-04-15 22:46:28,332] (heat-config-notify) [DEBUG] Signaling to http://10.10.10.10:8080/v1/AUTH_463490ef385049a5a03c36d5a67e85fc/create_admin-57a4ceb6-1f02-4a27-acbe-a21c43e48dfa/36db3c01-ce15-45ef-92a0-eb06c8201593?temp_url_sig=59fc7ae6967ab3a5e8293ce91fe4d1ca8572d29e&temp_url_expires=1587008412 via PUT
Apr 15 22:46:28 overcloud-mycloud-novacompute-dvr-6 os-collect-config: [2020-04-15 22:46:28,342] (heat-config-notify) [DEBUG] Response <Response [404]>

overcloud-mycloud-novacompute-dvr-9 | SUCCESS | rc=0 >>
Apr 15 22:46:16 overcloud-mycloud-novacompute-dvr-9 os-collect-config: [2020-04-15 22:46:16,903] (heat-config) [DEBUG] [2020-04-15 22:46:16,860] (heat-config-notify) [DEBUG] Signaling to http://10.10.10.10:8080/v1/AUTH_463490ef385049a5a03c36d5a67e85fc/create_admin-96ba292d-1860-4fe3-9ef3-82a9256a8b06/90345b02-4c6d-4dde-b5a5-797a49ac14e3?temp_url_sig=006c8debaa23459890ca5f13103c938a2032830a&temp_url_expires=1587008412 via PUT
Apr 15 22:46:16 overcloud-mycloud-novacompute-dvr-9 os-collect-config: [2020-04-15 22:46:16,870] (heat-config-notify) [DEBUG] Response <Response [404]>

overcloud-mycloud-novacompute-dvr-8 | SUCCESS | rc=0 >>
Apr 15 22:45:28 overcloud-mycloud-novacompute-dvr-8 os-collect-config: [2020-04-15 22:45:28,712] (heat-config) [DEBUG] [2020-04-15 22:45:28,673] (heat-config-notify) [DEBUG] Signaling to http://10.10.10.10:8080/v1/AUTH_463490ef385049a5a03c36d5a67e85fc/create_admin-48ea0aff-3a81-40b3-84ca-0ccfa8d61d12/0d5e0e94-6ec3-4c98-a153-8580ed8c8792?temp_url_sig=c8a49bde78fd6b246a6341839a6a513115fc9bb3&temp_url_expires=1587008412 via PUT
Apr 15 22:45:28 overcloud-mycloud-novacompute-dvr-8 os-collect-config: [2020-04-15 22:45:28,683] (heat-config-notify) [DEBUG] Response <Response [404]>
  • 6 out of the 18 total overcloud nodes (computes + controllers) respond successfully and get 201 on this task.

  • All the other nodes reply after 22:45:34Z (when mistral considered this thing dead) and get 404

  • Interestingly I found this in Red Hat solutions, pointing to this exact problem...still in progress and updated 6 days ago.

  • We haven't updated packages since the initial (successful) deploy.

Environment

  • Red Hat OpenStack Platform 13.0 (RHOSP)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In