Safely reboot a OpenStack Neutron API node when using HA routers

Updated -

Environment

  • OpenStack OSP 13, using OVS backend and HA routers
  • OpenStack OSP 16, using OVS backend and HA routers

Issue

Bugzilla reference: https://bugzilla.redhat.com/show_bug.cgi?id=1962987.

In the reported bug, the Neutron controllers act as HA router instances too; that means the Neutron API instance runs in the same node as the HA router instance. When a server is rebooted and holds any HA router instance as "active", the router will vote again to find another "active" instance. During the reboot process, the Neutron L3 agent of this node can be shut down (along with the keepalived process) before the Neutron API. That will trigger the new "active" instance voting. If the Neutron API of this server that is going to be rebooted attends the Neutron L3 agent request, there is a small chance this Neutron API does not finish the request. The router port binding action won't finish correctly and the status will remain as "DOWN".

Resolution

To avoid this issue, manually shutdown first the Neutron API service of the server to be rebooted. After this Neutron API instance is off, any request will be processed by other active instance. The router port binding will be processed correctly.

Comments