Deployments failing while updating nova ownership

Solution In Progress - Updated -

Issue

  • During an overcloud deployment the deployment frequently fails with an error No such file or directory:
openstack-overcloud.AllNodesDeploySteps.Compute03Deployment_Step3.20:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: f8cb2e2c-54e5-4d9a-ad75-57cc0a9247f3
  status: UPDATE_FAILED
  status_reason: |
    Error: resources[20]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
            "  File \"/docker-config-scripts/nova_statedir_ownership.py\", line 126, in _walk",
            "    for f in os.listdir(top):",
            "OSError: [Errno 2] No such file or directory: '/var/lib/nova/instances/b8d64704-d120-4f92-99b3-7b954056d8d2'"
        ]
    }
        to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/0dba793d-7677-4127-9205-8a8b92081039_playbook.retry

    PLAY RECAP *********************************************************************
    localhost                  : ok=5    changed=2    unreachable=0    failed=1
  • This appears to be a race condition between when the nova_statedir_ownership.py gathers the list of instances and when it runs the ownership change. We are expecting the deployments to complete. Is there a way to skip this script or fix it to not error when a file is not found. These are very active fabrics and this error occurs frequently on deployments of the overcloud.

Environment

  • Red Hat OpenStack Platform 13.0 (RHOSP)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In