Lost network port access when nova-compute restarted, along with DB errors in log

Solution In Progress - Updated -

Issue

  • We've been trying to isolate a continuous problem we've been having which we seem to have came closer to identifying. We were hoping you could also take a glance at our logs to see if anything stands out. We performed a rabbitmq upgrade on our controller cluster today, that went very well. Unfortunately when we issued a restart of openstack-nova-compute, one of the virtual machines lost network connectivity and required an admin disable/enable of the network. This continues to be a constant problem where when we are doing rolling updates of compute nodes properly live-migrating, and things similar the VM will lose network connectivity until the network port is admin down'd/up'd.

  • Can you take a look specifically at nova-compute around 2020-04-15 16:55? That is where you'll see the VIF's come up, and specifically eed87775-9db6-4434-9132-5231ea89e5d6 is what went down.

  • We also noticed a bunch of errors about inconsistencies in the DB related to:

2020-04-15 16:55:39.109 7646 INFO nova.compute.resource_tracker [req-0f65dfe5-2bb8-4452-aa00-d664c3d33709 - - - - -] Instance a1fbad6a-8f54-4364-807a-08c3568a84e1 has allocations against this compute host but is not found in the database.
2020-04-15 16:55:39.135 7646 INFO nova.compute.resource_tracker [req-0f65dfe5-2bb8-4452-aa00-d664c3d33709 - - - - -] Instance 20d2a3c9-163f-415b-b0d7-f1dbf3ee99b4 has allocations against this compute host but is not found in the database.

Environment

  • Red Hat OpenStack Platform 13.0 (RHOSP)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In