OpenStack overcloud run fails due to facter running for more than 5 minutes in Red Hat OpenStack Platform

Solution In Progress - Updated -

Issue

This issue may happen in WorkflowTasks_Step2_Execution in ceph_base_ansible_workflow in task enable_ssh_admin:

2020-02-19 04:19:10Z [overcloud-AllNodesDeploySteps-hdpp3w3ojlco.WorkflowTasks_Step2_Execution]: UPDATE_IN_PROGRESS  state changed
2020-02-19 04:24:32Z [overcloud-AllNodesDeploySteps-hdpp3w3ojlco.WorkflowTasks_Step2_Execution]: UPDATE_FAILED  resources.WorkflowTasks_Step2_Execution: Failure caused by error in tasks: ceph_base_ansible_workflow

  ceph_base_ansible_workflow [task_ex_id=a025f701-de16-481d-8e3e-8f8d6c04bb03] -> Failure caused by error in tasks: enable_ssh_admin

  enable_ssh_admin
2020-02-19 04:24:32Z [overcloud-AllNodesDeploySteps-hdpp3w3ojlco]: UPDATE_FAILED  Resource UPDATE failed: resources.WorkflowTasks_Step2_Execution: Failure caused by error in tasks: ceph_base_ansible_workflow

  ceph_base_ansible_workflow [task_ex_id=a025f701-de16-481d-8e3e-8f8d6c04bb03] -> Failure caused by error in tasks: enable_ssh_a
2020-02-19 04:24:33Z [AllNodesDeploySteps]: UPDATE_FAILED  resources.WorkflowTasks_Step2_Execution: resources.AllNodesDeploySteps.Failure caused by error in tasks: ceph_base_ansible_workflow

  ceph_base_ansible_workflow [task_ex_id=a025f701-de16-481d-8e3e-8f8d6c04bb03] -> Failure caused by error in tasks: enable
2020-02-19 04:24:33Z [overcloud]: UPDATE_FAILED  Resource UPDATE failed: resources.WorkflowTasks_Step2_Execution: resources.AllNodesDeploySteps.Failure caused by error in tasks: ceph_base_ansible_workflow

  ceph_base_ansible_workflow [task_ex_id=a025f701-de16-481d-8e3e-8f8d6c04bb03] -> Failure caused b

 Stack overcloud UPDATE_FAILED

overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution:
  resource_type: OS::TripleO::WorkflowSteps
  physical_resource_id: 4e2eda2d-2b99-4a38-b4ad-b076ec4da82a
  status: UPDATE_FAILED
  status_reason: |
    ...


        [wf_ex_id=14101f6e-760c-45c3-b72f-c6a2119d9c30, idx=0]: Failure caused by error in tasks: create_admin

      create_admin [task_ex_id=6042f936-0570-4158-9652-35acc22cd431] -> One or more actions had failed.
        [wf_ex_id=6bc7ef3b-5d87-4675-b475-240d3db4dbdb, idx=7]: None
        [wf_ex_id=9d5be15b-d0d4-41b6-a14b-67ce58a6a702, idx=10]: None
        [wf_ex_id=e0234903-e865-4c7f-be18-3e628721802b, idx=17]: None

It does hence look similar to:

However, in this specific case, the undercloud does not use SSL.

Instead, one can see from the controller os-collect-config logs that the task is not completing before the URL expires:

Feb 19 01:19:50 controller-prd03 os-collect-config[460653]: [2020-02-19 01:19:50,108] (heat-config) [DEBUG] Running /usr/libexec/heat-config/hooks/ansible < /var/lib/heat-config/deployed/a5784c21-03c3-4731-be53-ee2797cf7c3d.json
Feb 19 01:19:51 controller-prd03 ansible-setup[48601]: Invoked with filter=* gather_subset=['all'] fact_path=/etc/ansible/facts.d gather_timeout=10
Feb 19 01:26:12 controller-prd03 ansible-user[102143]: Invoked with comment=None ssh_key_bits=0 update_password=always non_unique=False force=False ssh_key_type=rsa create_home=True password_lock=None ssh_key_passphrase=NOT_LOGGING_PARAM
ETER uid=None home=None append=False skeleton=None ssh_key_comment=ansible-generated on controller-prd03 group=None system=False state=present hidden=None local=None shell=None expires=None ssh_key_file=None groups=None move_home=False p
assword=NOT_LOGGING_PARAMETER name=tripleo-admin seuser=None remove=False login_class=None generate_ssh_key=None
Feb 19 01:26:20 controller-prd03 os-collect-config[460653]: [2020-02-19 01:26:20,849] (heat-config) [DEBUG] [2020-02-19 01:26:20,785] (heat-config-notify) [DEBUG] Signaling to http://10.51.110.10:8080/v1/AUTH_2123ca4387294ee19b88ab8deafe395c/create_admin-5cffb359-cb0d-4148-b03c-16787a959b31/8c20319e-d41a-4e1b-ba5d-5b671eafc6a4?temp_url_sig=c43c3c9c8e16c85e148d7c4763151983922a090d&temp_url_expires=1582103960 via PUT
Feb 19 01:26:20 controller-prd03 os-collect-config[460653]: [2020-02-19 01:26:20,800] (heat-config-notify) [DEBUG] Response <Response [404]>
Feb 19 01:19:20 director.example.com object-server[5408]: 10.51.110.10 - - [19/Feb/2020:04:19:20 +0000] "PUT /1/952/AUTH_2123ca4387294ee19b88ab8deafe395c/create_admin-5cffb359-cb0d-4148-b03c-16787a959b31/8c20319e-d41a-4e1b-ba5d-5b671eafc6a4" 201 - "PUT http://10.51.110.10:8080/v1/AUTH_2123ca4387294ee19b88ab8deafe395c/create_admin-5cffb359-cb0d-4148-b03c-16787a959b31/8c20319e-d41a-4e1b-ba5d-5b671eafc6a4" "tx061bd764d0a546a7a70a2-005e4cb748" "proxy-server 5876" 0.0057 "-" 5408 0
Feb 19 01:24:26 director.example.com object-server[5413]: 10.51.110.10 - - [19/Feb/2020:04:24:26 +0000] "DELETE /1/952/AUTH_2123ca4387294ee19b88ab8deafe395c/create_admin-5cffb359-cb0d-4148-b03c-16787a959b31/8c20319e-d41a-4e1b-ba5d-5b671eafc6a4" 204 - "DELETE http://10.51.110.10:8080/v1/AUTH_2123ca4387294ee19b88ab8deafe395c/create_admin-5cffb359-cb0d-4148-b03c-16787a959b31/8c20319e-d41a-4e1b-ba5d-5b671eafc6a4" "tx65c2273ebb9b4bef89b7a-005e4cb87a" "proxy-server 5876" 0.0106 "-" 5413 0

Most of this time is taken up by ansible-setup[48601]: Invoked with filter=* gather_subset=['all'] fact_path=/etc/ansible/facts.d gather_timeout=10.

Environment

Red Hat OpenStack Platform 13

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In