Chapter 10. Configure Layer 3 High Availability

This chapter explains the role of Layer 3 High Availability in an OpenStack Networking deployment and includes implementation steps for protecting your network’s virtual routers.

10.1. OpenStack Networking without HA

An OpenStack Networking deployment without any high availability features is going to be vulnerable to physical node failures.

In a typical deployment, tenants create virtual routers, which are scheduled to run on physical L3 agent nodes. This becomes an issue when you lose a L3 agent node and the dependent virtual machines subsequently lose connectivity to external networks. Any floating IP addresses will also be unavailable. In addition, connectivity is also lost between any networks hosted by the router.

10.2. Overview of Layer 3 High Availability

This active/passive high availability configuration uses the industry standard VRRP (as defined in RFC 3768) to protect tenant routers and floating IP addresses. A virtual router is randomly scheduled across multiple OpenStack Networking nodes, with one designated as the active, and the remainder serving in a standby role.

Note

A successful deployment of Layer 3 High Availability requires that the redundant OpenStack Networking nodes maintain similar configurations, including floating IP ranges and access to external networks.

In the diagram below, the active Router1 and Router2 are running on separate physical L3 agent nodes. Layer 3 High Availability has scheduled backup virtual routers on the corresponding nodes, ready to resume service in the case of a physical node failure:

vrrp scheduling pre

When the L3 agent node fails, Layer 3 High Availability reschedules the affected virtual router and floating IP addresses to a working node:

vrrp scheduling post

During a failover event, instance TCP sessions through floating IPs remain unaffected, and will migrate to the new L3 node without disruption. Only SNAT traffic is affected by failover events.

The L3 agent itself is further protected when in an active/active HA mode.

10.2.1. Failover conditions

Layer 3 High Availability will automatically reschedule protected resources in the following events:

  • The L3 agent node shuts down or otherwise loses power due to hardware failure.
  • L3 agent node becomes isolated from the physical network and loses connectivity.
Note

Manually stopping the L3 agent service does not induce a failover event.

10.3. Tenant considerations

Layer 3 High Availability configuration occurs in the back end and is invisible to the tenant. They can continue to create and manage their virtual routers as usual, however there are some limitations to be aware of when designing your Layer 3 High Availability implementation:

  • Layer 3 High Availability supports up to 255 virtual routers per tenant.
  • Internal VRRP messages are transported within a separate internal network, created automatically for each project. This process occurs transparently to the user.

10.4. Background changes

The Neutron API has been updated to allow administrators to set the --ha=True/False flag when creating a router, which overrides the (Default) configuration of l3_ha in neutron.conf. See the next section for the necessary configuration steps.

10.4.1. Changes to neutron-server
  • Layer 3 High Availability assigns the active role randomly, regardless of the scheduler used by OpenStack Networking (whether random or leastrouter).
  • The database schema has been modified to handle allocation of VIPs to virtual routers.
  • A transport network is created to direct Layer 3 High Availability traffic as described above.
10.4.2. Changes to L3 agent
  • A new keepalived manager has been added, providing load-balancing and HA capabilities.
  • IP addresses are converted to VIPs.

10.5. Configuration Steps

This procedure enables Layer 3 High Availability on the OpenStack Networking and L3 agent nodes.

10.5.1. Configure the OpenStack Networking node

1. Configure Layer 3 High Availability in the neutron.conf file by enabling L3 HA and defining the number of L3 agent nodes that should protect each virtual router:

l3_ha = True
max_l3_agents_per_router = 2
min_l3_agents_per_router = 2

These settings are explained below:

  • l3_ha - When set to True, all virtual routers created from this point onwards will default to HA (and not legacy) routers. Administrators can override the value for each router using:
# neutron router-create --ha=<True | False> routerName
  • max_l3_agents_per_router - Set this to a value between the minimum and total number of network nodes in your deployment. For example, if you deploy four OpenStack Networking nodes but set max to 2, only two L3 agents will protect each HA virtual router: One active, and one standby. In addition, each time a new L3 agent node is deployed, additional standby versions of the virtual routers are scheduled until the max_l3_agents_per_router limit is reached. As a result, you can scale out the number of standby routers by adding new L3 agents.
  • min_l3_agents_per_router - The min setting ensures that the HA rules remain enforced. This setting is validated during the virtual router creation process to ensure a sufficient number of L3 Agent nodes are available to provide HA. For example, if you have two network nodes and one becomes unavailable, no new routers can be created during that time, as you need at least min active L3 agents when creating a HA router.

2. Restart the neutron-server service for the change to take effect:

# systemctl restart neutron-server.service
10.5.2. Review your configuration

Running the ip address command within the virtual router namespace will now return a HA device in the result, prefixed with ha-.

# ip netns exec qrouter-b30064f9-414e-4c98-ab42-646197c74020 ip address
<snip>
2794: ha-45249562-ec: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 12:34:56:78:2b:5d brd ff:ff:ff:ff:ff:ff
inet 169.254.0.2/24 brd 169.254.0.255 scope global ha-54b92d86-4f

With Layer 3 High Availability now enabled, virtual routers and floating IP addresses are protected against individual node failure.