Deployments failing while updating nova ownership
Issue
- During an overcloud deployment the deployment frequently fails with an error
No such file or directory
:
openstack-overcloud.AllNodesDeploySteps.Compute03Deployment_Step3.20:
resource_type: OS::Heat::StructuredDeployment
physical_resource_id: f8cb2e2c-54e5-4d9a-ad75-57cc0a9247f3
status: UPDATE_FAILED
status_reason: |
Error: resources[20]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
deploy_stdout: |
...
" File \"/docker-config-scripts/nova_statedir_ownership.py\", line 126, in _walk",
" for f in os.listdir(top):",
"OSError: [Errno 2] No such file or directory: '/var/lib/nova/instances/b8d64704-d120-4f92-99b3-7b954056d8d2'"
]
}
to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/0dba793d-7677-4127-9205-8a8b92081039_playbook.retry
PLAY RECAP *********************************************************************
localhost : ok=5 changed=2 unreachable=0 failed=1
- This appears to be a race condition between when the nova_statedir_ownership.py gathers the list of instances and when it runs the ownership change. We are expecting the deployments to complete. Is there a way to skip this script or fix it to not error when a file is not found. These are very active fabrics and this error occurs frequently on deployments of the overcloud.
Environment
- Red Hat OpenStack Platform 13.0 (RHOSP)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.