Instance failing to spawn with 'Failed to allocate the network' error due to OVN hash ring issues
Issue
-
In a large OVN environment, if the OVN databases take too long to start up, Neutron spawns the API workers, stalls on configuring the OVN database and will fail to spawn the RPC workers and Maintenance worker.
-
While the OVN databases are starting, the Neutron API workers are already up and need to connect to the Galera database
ovs_neutron
and read from theovn_hash_ring
table to execute the hash ring mechanism. -
If anything happens to the Neutron server maintenance worker or access to the Galera database is blocked in any way, the hash ring table won't be updated. The hash ring table is used to make processing of events distributed, It's used to coordinate what Neutron API worker is responsible for processing a given event.
-
Because the Maintenance worker has failed to spawn, all the nodes
updated_at
records inside the hash ring table will be older than 60 seconds and considered stale preventing the Neutron api workers from being able to function. -
The following is seen in the logs:
03:04:24.812 36 ERROR networking_ovn.ovsdb.ovsdb_monitor [-] HashRing is empty, error: Hash Ring returned empty when hashing "b'aaaaaa-bbbb-cccc-dddd-eeeeeeeeeee'". This should never happen in a normal situation, please check the status of your cluster: networking_ovn.common.exceptions.HashRingIsEmpty: Hash Ring returned empty when hashing "b'aaaaaa-bbbb-cccc-dddd-eeeeeeeeeee'". This should never happen in a normal situation, please check the status of your cluster
Environment
- Red Hat OpenStack Platform 16.1.8, 16.2.x
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.