Red Hat Training

A Red Hat training course is available for Red Hat OpenStack Platform

Chapter 14. Configure Distributed Virtual Routing (DVR)

Customers deploying Red Hat OpenStack Platform have the option to choose between a centralized routing model or DVR for their Red Hat OpenStack Platform deployment. While DVR is fully supported and available as a configuration option, Red Hat OpenStack Platform director still defaults to centralized routing.

It is important to note that both centralized routing and DVR are valid routing models, and each has its own set of advantages and disadvantages. Customers are advised to use this document to carefully plan whether centralized routing or DVR is better suit their needs.

14.1. Overview of Layer 3 Routing

OpenStack Networking (neutron) provides routing services for project networks. Without a router, instances in a project network are only able to communicate with one another over a shared L2 broadcast domain. Creating a router and assigning it to a tenant network allows the instances in that network to communicate with other project networks or upstream (if an external gateway is defined for the router).

14.1.1. Routing Flows

Routing services in OpenStack can be categorized into three main flows:

  • East-West routing - routing of traffic between different networks in the same tenant. This traffic does not leave the OpenStack deployment. This definition applies to both IPv4 and IPv6 subnets.
  • North-South routing with floating IPs - Floating IP addressing can be best described as a one-to-one NAT that can be modified and floats between instances. While floating IPs are modeled as a one-to-one association between the floating IP and a neutron port, they are implemented by association with a neutron router that performs the NAT translation. The floating IPs themselves are taken from the uplink network that is providing the router with its external connectivity. As a result, instances can communicate with external resources (such as endpoints in the internet) or the other way around. Floating IPs are an IPv4 concept and do not apply to IPv6. It is assumed that the IPv6 addressing used by projects uses Global Unicast Addresses (GUAs) with no overlap across the projects, and therefore can be routed without NAT.
  • North-South routing without floating IPs (also known as SNAT) - Neutron offers a default port address translation (PAT) service for instances that have not been allocated floating IPs. With this service, instances can communicate with external endpoints through the router, but not the other way around. For example, an instance can browse a website in the internet, but a web browser outside could not browse a website hosted within the instance. SNAT is applied for IPv4 traffic only. In addition, neutron project networks that are assigned GUAs prefixes do not require NAT on the neutron router external gateway port to access the outside world.

14.1.2. Centralized Routing

Originally, neutron was designed with a centralized routing model where a project’s virtual routers, managed by the neutron L3 agent, are all deployed in a dedicated node or cluster of nodes (referred to as the Network node, or Controller node). This means that each time a routing function is required (east/west, floating IPs or SNAT), traffic would traverse through a dedicated node in the topology. This introduced multiple challenges and resulted in sub-optimal traffic flows. For example:

  • Traffic between instances flows through a Controller node - when two instances need to communicate with each other using L3, traffic has to hit the Controller node. Even if the instances are scheduled on the same Compute node, traffic still has to leave the Compute node, flow through the Controller, and route back to the Compute node. This negatively impacts performance.
  • Instances with floating IPs receive and send packets through the Controller node - the external network gateway interface is available only at the Controller node, so whether the traffic is originating from an instance, or destined to an instance from the external network, it has to flow through the Controller node. Consequently, in large environments the Controller node is subject to heavy traffic load. This would affect performance and scalability, and also requires careful planning to accommodate enough bandwidth in the external network gateway interface. The same requirement applies for SNAT traffic.

To better scale the L3 agent, neutron can use the L3 HA feature, which distributes the virtual routers across multiple nodes. In the event that a Controller node is lost, the HA router will failover to a standby on another node and there will be packet loss until the HA router failover completes. The feature is available starting with Red Hat Enterprise Linux OpenStack Platform 6 and is enabled by default.

14.2. DVR Overview

Distributed Virtual Routing (DVR) offers an alternative routing design, which is now fully supported since Red Hat OpenStack Platform 11. It intends to isolate the failure domain of the Controller node and optimize network traffic by deploying the L3 agent and schedule routers on every Compute node. When using DVR:

  • East-West traffic is routed directly on the Compute nodes in a distributed fashion.
  • North-South traffic with floating IP is distributed and routed on the Compute nodes. This requires the external network to be connected to each and every Compute node.
  • North-South traffic without floating IP is not distributed and still requires a dedicated Controller node.

    • The L3 agent on the Controller node is configured with a new dvr_snat mode so that only SNAT traffic is served by the node.
  • The neutron metadata agent is distributed and deployed on all Compute nodes. The metadata proxy service is hosted on all the distributed routers.

14.3. Known Issues and Caveats

Note

For Red Hat OpenStack Platform 14, do not use DVR unless your kernel version is at least kernel-3.10.0-514.1.1.el7

  • Support for DVR is limited to the ML2 core plug-in and the Open vSwitch (OVS) mechanism driver, as well as OVN. Other back ends are not supported.
  • SNAT traffic is not distributed, even when DVR is enabled. SNAT does work, but all ingress/egress traffic must traverse through the centralized Controller node.
  • IPv6 traffic is not distributed, even when DVR is enabled. IPv6 routing does work, but all ingress/egress traffic must traverse through the centralized Controller node. Customers that are extensively using IPv6 routing are advised not to use DVR at this time.
  • DVR is not supported in conjunction with L3 HA. With Red Hat OpenStack Platform 14 director, if DVR is used, L3 HA is turned off. That means that routers are still going to be scheduled on the Network nodes (and load-shared between the L3 agents), but if one agent fails, all routers hosted by this agent will fail as well. This only affects SNAT traffic. The allow_automatic_l3agent_failover feature is recommended in such cases, so that if one network node fails, the routers will be rescheduled to a different node.

    For SNAT routers, the allow_automatic_l3agent_failover feature lets Neutron automatically reschedule routers of dead L3 agents. The failure detection uses the control plane (rabbitmq, L3 agents, neutron-server, and the DB). The down time can be significant. The L3 agent must be considered "dead" by the neutron server (75s by default). Then the neutron server must reschedule the router, and the new L3 agent must configure the router. It can take more than two minutes if the dead L3 agent/controller hosted many routers. If allow_automatic_l3agent_failover is enabled the operator need not intervene, but there is no way to restore the sessions that timed out, and a new connection must be established.

  • DHCP servers, which are managed by the neutron DHCP agent, are not distributed and are still deployed on the Controller node. With Red Hat OpenStack Platform, the DHCP agent is deployed in a highly available configuration on the Controller nodes, regardless of the routing design (centralized or DVR).
  • For floating IPs, each Compute node requires an interface on the External network. In addition, each Compute node now requires one additional IP address. This is due to the implementation of the external gateway port and the floating IP network namespace.
  • VLAN, GRE, and VXLAN are all supported for project data separation. When GRE or VXLAN are used, the L2 Population feature must be turned on. With Red Hat OpenStack Platform director, this is enforced during installation.

14.4. Supported Routing Architectures

  • Red Hat Enterprise Linux OpenStack Platform 8 through to Red Hat OpenStack Platform 14 for centralized, HA routing.
  • Red Hat OpenStack Platform 12 or later with distributed routing.
  • Upgrading from a Red Hat OpenStack Platform 8 or later deployment running centralized HA routing to Red Hat OpenStack Platform 10 or later with only distributed routing.

14.5. Deploying DVR with ML2 OVS

The neutron-ovs-dvr.yaml environment file configures the required DVR-specific parameters. Configuring DVR for arbitrary deployment configuration requires additional consideration. The requirements are:

(a) The interface connected to the physical network for external network traffic must be configured on both the Compute and Controller nodes.

(b) A bridge must be created on Compute and Controller nodes, with an interface for external network traffic.

(c) Neutron must be configured to allow this bridge to be used.

The host networking configuration (a and b) are controlled by Heat templates that pass configuration to the Heat-managed nodes for consumption by the os-net-config process. This is essentially automation of provisioning host networking. Neutron must also be configured (c) to match the provisioned networking environment. The defaults are not expected to work in production environments. For example, a proof-of-concept environment using the typical defaults might be similar to the following:

1. Verify that the value for OS::TripleO::Compute::Net::SoftwareConfig in environments/neutron-ovs-dvr.yaml is the same as the OS::TripleO::Controller::Net::SoftwareConfig value in use. This can normally be found in the network environment file in use when deploying the overcloud, for example, environments/net-multiple-nics.yaml. This will create the appropriate external network bridge for the Compute node’s L3 agent.

Note

If customizations have been made to the Compute node’s network configuration, it may be necessary to add the appropriate configuration to those files instead.

2. Configure a neutron port for the Compute node on the external network by modifying OS::TripleO::Compute::Ports::ExternalPort to an appropriate value, such as OS::TripleO::Compute::Ports::ExternalPort: ../network/ports/external.yaml

3. Include environments/neutron-ovs-dvr.yaml as an environment file when deploying the overcloud. For example:

$ openstack overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/environments/neutron-ovs-dvr.yaml

4. Verify that L3 HA is disabled.

For production environments (or test environments that require special customization, for example, involving network isolation, dedicated NICs, among others) the example environments can be used as a guide. Particular attention should be give to the bridge mapping type parameters used by the L2 agents (for example, OVS) and any reference to external facing bridges for other agents (such as the L3 agent).

Note

The external bridge configuration for the L3 agent, while currently still provided, is deprecated and will be removed in the future.

14.6. Migrate Centralized Routers to Distributed Routing

This section describes the upgrade path to distributed routing for Red Hat OpenStack Platform deployments that use L3 HA centralized routing.

1. Upgrade your deployment and validate that it is working correctly.

2. Run the director stack update to configure DVR, as described in Deploying DVR with ML2 OVS.

3. Confirm that routing still works through the existing routers.

4. You will not be able to directly transition a L3 HA router to distributed. Instead, for each router, turn off the L3 HA option, and then turn on the distributed option:

4a. Move the router out of the admin state:

$ neutron router-update --admin-state-up=False

4b. Convert the router to the legacy type:

$ neutron router-update --ha=False

4c. Configure the router to use DVR:

$ neutron router-update --distributed=True

4d. Move the router into the admin state:

$ neutron router-update --admin-state-up=True

4e. Confirm that routing still works, and is now distributed.