Instance failing to spawn with 'Failed to allocate the network' error due to OVN hash ring issues

Solution In Progress - Updated -

Issue

  • In a large OVN environment, if the OVN databases take too long to start up, Neutron spawns the API workers, stalls on configuring the OVN database and will fail to spawn the RPC workers and Maintenance worker.

  • While the OVN databases are starting, the Neutron API workers are already up and need to connect to the Galera database ovs_neutron and read from the ovn_hash_ring table to execute the hash ring mechanism.

  • If anything happens to the Neutron server maintenance worker or access to the Galera database is blocked in any way, the hash ring table won't be updated. The hash ring table is used to make processing of events distributed, It's used to coordinate what Neutron API worker is responsible for processing a given event.

  • Because the Maintenance worker has failed to spawn, all the nodes updated_at records inside the hash ring table will be older than 60 seconds and considered stale preventing the Neutron api workers from being able to function.

Environment

  • Red Hat OpenStack Platform 16.1.8, 16.2.x

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content