We've been trying to isolate a continuous problem we've been having which we seem to have came closer to identifying. We were hoping you could also take a glance at our logs to see if anything stands out. We performed a rabbitmq upgrade on our controller cluster today, that went very well. Unfortunately when we issued a restart of openstack-nova-compute, one of the virtual machines lost network connectivity and required an admin disable/enable of the network. This continues to be a constant problem where when we are doing rolling updates of compute nodes properly live-migrating, and things similar the VM will lose network connectivity until the network port is admin down'd/up'd.
Can you take a look specifically at nova-compute around 2020-04-15 16:55? That is where you'll see the VIF's come up, and specifically eed87775-9db6-4434-9132-5231ea89e5d6 is what went down.
We also noticed a bunch of errors about inconsistencies in the DB related to:
2020-04-15 16:55:39.109 7646 INFO nova.compute.resource_tracker [req-0f65dfe5-2bb8-4452-aa00-d664c3d33709 - - - - -] Instance a1fbad6a-8f54-4364-807a-08c3568a84e1 has allocations against this compute host but is not found in the database. 2020-04-15 16:55:39.135 7646 INFO nova.compute.resource_tracker [req-0f65dfe5-2bb8-4452-aa00-d664c3d33709 - - - - -] Instance 20d2a3c9-163f-415b-b0d7-f1dbf3ee99b4 has allocations against this compute host but is not found in the database.
- Red Hat OpenStack Platform 13.0 (RHOSP)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.