Chapter 14. Configuring distributed virtual routing (DVR)

14.1. Understanding distributed virtual routing (DVR)

When you deploy Red Hat OpenStack Platform you can choose between a centralized routing model or DVR. While DVR is fully supported and available as a configuration option, Red Hat OpenStack Platform director still defaults to centralized routing.

It is important to note that both centralized routing and DVR are valid routing models, and each model has advantages and disadvantages. Use this document to carefully plan whether centralized routing or DVR better suits your needs.

14.1.1. Overview of Layer 3 routing

OpenStack Networking (neutron) provides routing services for project networks. Without a router, instances in a project network can communicate with other instances over a shared L2 broadcast domain. Creating a router and assigning it to a tenant network allows the instances in that network to communicate with other project networks or upstream (if an external gateway is defined for the router).

14.1.2. Routing flows

Routing services in OpenStack can be categorized into three main flows:

  • East-West routing - routing of traffic between different networks in the same tenant. This traffic does not leave the OpenStack deployment. This definition applies to both IPv4 and IPv6 subnets.
  • North-South routing with floating IPs - Floating IP addressing is a one-to-one NAT that can be modified and that floats between instances. While floating IPs are modeled as a one-to-one association between the floating IP and a neutron port, they are implemented by association with a neutron router that performs the NAT translation. The floating IPs themselves are taken from the uplink network that provides the router with external connectivity. As a result, instances can communicate with external resources (such as endpoints in the internet) or the other way around. Floating IPs are an IPv4 concept and do not apply to IPv6. It is assumed that the IPv6 addressing used by projects uses Global Unicast Addresses (GUAs) with no overlap across the projects, and therefore can be routed without NAT.
  • North-South routing without floating IPs (also known as SNAT) - Neutron offers a default port address translation (PAT) service for instances that do not have allocated floating IPs. With this service, instances can communicate with external endpoints through the router, but not the other way around. For example, an instance can browse a website in the Internet, but a web browser outside cannot browse a website hosted within the instance. SNAT is applied for IPv4 traffic only. In addition, neutron project networks that are assigned GUAs prefixes do not require NAT on the neutron router external gateway port to access the outside world.

14.1.3. Centralized routing

Originally, neutron was designed with a centralized routing model where a project’s virtual routers, managed by the neutron L3 agent, are all deployed in a dedicated node or cluster of nodes (referred to as the Network node, or Controller node). This means that each time a routing function is required (east/west, floating IPs or SNAT), traffic would traverse through a dedicated node in the topology. This introduced multiple challenges and resulted in sub-optimal traffic flows. For example:

  • Traffic between instances flows through a Controller node - when two instances need to communicate with each other using L3, traffic has to hit the Controller node. Even if the instances are scheduled on the same Compute node, traffic still has to leave the Compute node, flow through the Controller, and route back to the Compute node. This negatively impacts performance.
  • Instances with floating IPs receive and send packets through the Controller node - the external network gateway interface is available only at the Controller node, so whether the traffic is originating from an instance, or destined to an instance from the external network, it has to flow through the Controller node. Consequently, in large environments the Controller node is subject to heavy traffic load. This would affect performance and scalability, and also requires careful planning to accommodate enough bandwidth in the external network gateway interface. The same requirement applies for SNAT traffic.

To better scale the L3 agent, neutron can use the L3 HA feature, which distributes the virtual routers across multiple nodes. In the event that a Controller node is lost, the HA router will failover to a standby on another node and there will be packet loss until the HA router failover completes.

14.2. DVR overview

Distributed Virtual Routing (DVR) offers an alternative routing design. DVR isolates the failure domain of the Controller node and optimizes network traffic by deploying the L3 agent and schedule routers on every Compute node. DVR has these characteristics:

  • East-West traffic is routed directly on the Compute nodes in a distributed fashion.
  • North-South traffic with floating IP is distributed and routed on the Compute nodes. This requires the external network to be connected to every Compute node.
  • North-South traffic without floating IP is not distributed and still requires a dedicated Controller node.
  • The L3 agent on the Controller node uses the dvr_snat mode so that the node serves only SNAT traffic.
  • The neutron metadata agent is distributed and deployed on all Compute nodes. The metadata proxy service is hosted on all the distributed routers.

14.3. DVR known issues and caveats

Note

For Red Hat OpenStack Platform 13, do not use DVR unless your kernel version is at least kernel-3.10.0-514.1.1.el7

  • Support for DVR is limited to the ML2 core plug-in and the Open vSwitch (OVS) mechanism driver, as well as OVN. Other back ends are not supported.
  • On both OVS and OVN DVR deployments, network traffic for the Red Hat OpenStack Platform Load-balancing service (octavia) goes through the Controller and network nodes, instead of the compute nodes.
  • SNAT (source network address translation) traffic is not distributed, even when DVR is enabled. SNAT does work, but all ingress/egress traffic must traverse through the centralized Controller node.
  • IPv6 traffic is not distributed, even when DVR is enabled. IPv6 routing does work, but all ingress/egress traffic must traverse through the centralized Controller node. If you use IPv6 routing extensively, do not use DVR.
  • DVR is not supported in conjunction with L3 HA. If you use DVR with Red Hat OpenStack Platform 13 director, L3 HA is disabled. This means that routers are still scheduled on the Network nodes (and load-shared between the L3 agents), but if one agent fails, all routers hosted by this agent fail as well. This affects only SNAT traffic. The allow_automatic_l3agent_failover feature is recommended in such cases, so that if one network node fails, the routers are rescheduled to a different node.
  • DHCP servers, which are managed by the neutron DHCP agent, are not distributed and are still deployed on the Controller node. The DHCP agent is deployed in a highly available configuration on the Controller nodes, regardless of the routing design (centralized or DVR).
  • To work with floating IPs, each Compute node requires an interface on the External network. In addition, each Compute node requires one additional IP address. This is due to the implementation of the external gateway port and the floating IP network namespace.
  • VLAN, GRE, and VXLAN are all supported for project data separation. When you use GRE or VXLAN, you must enable the L2 Population feature. The Red Hat OpenStack Platform director enforces L2 Population during installation.

14.4. Supported routing architectures

  • Red Hat Enterprise Linux OpenStack Platform 8 through to Red Hat OpenStack Platform 15 for centralized, HA routing.
  • Red Hat OpenStack Platform 12 or later with distributed routing.
  • Upgrading from a Red Hat OpenStack Platform 8 or later deployment running centralized HA routing to Red Hat OpenStack Platform 10 or later with only distributed routing.

14.5. Deploying DVR with ML2 OVS

The neutron-ovs-dvr.yaml environment file contains the configuration for the required DVR-specific parameters. Configuring DVR for arbitrary deployment configuration requires additional consideration. The requirements are the following:

  1. You must configure the interface connected to the physical network for external network traffic on both the Compute and Controller nodes.
  2. You must create a bridge on Compute and Controller nodes, with an interface for external network traffic.
  3. You must configure Neutron to allow X to use this bridge.

The host networking configuration (A. and B.) are controlled by Heat templates that pass configuration to the Heat-managed nodes for consumption by the os-net-config process. This is essentially automation of provisioning host networking. You must also configure Neutron (C.) to match the provisioned networking environment. The defaults are not expected to work in production environments.

A proof-of-concept environment using the typical defaults might be similar to the following example:

  1. Verify that the value for OS::TripleO::Compute::Net::SoftwareConfig in the environments/neutron-ovs-dvr.yaml file is the same as the current OS::TripleO::Controller::Net::SoftwareConfig value.

    Normally, you can find this value in the network environment file that you use when deploying the overcloud, for example, environments/net-multiple-nics.yaml. This value creates the appropriate external network bridge for the Compute node L3 agent.

    Note

    If you customize the network configuration of the Compute node, it may be necessary to add the appropriate configuration to those files instead.

  2. Include the environments/neutron-ovs-dvr.yaml file in the deployment command when deploying the overcloud:

    $ openstack overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/environments/neutron-ovs-dvr.yaml
  3. Verify that L3 HA is disabled.

    For production environments (or test environments that require special customization, for example, involving network isolation, dedicated NICs, among others) you can use the example environments as a guide. Ensure that the bridge mapping type parameters that the L2 agents use are correct, as well as any reference to external facing bridges for other agents,such as the L3 agent.

    Note

    The external bridge configuration for the L3 agent, while currently still provided, is deprecated and will be removed in the future.

14.6. Migrating centralized routers to distributed routing

This section contains information about upgrading to distributed routing for Red Hat OpenStack Platform deployments that use L3 HA centralized routing.

  1. Upgrade your deployment and validate that it is working correctly.
  2. Run the director stack update to configure DVR.

    For more information, see Deploying DVR with ML2 OVS.

  3. Confirm that routing functions correctly through the existing routers.
  4. You cannot transition a L3 HA router to distributed directly. Instead, for each router, disable the L3 HA option, and then enable the distributed option:
  1. Disable the router:

    $ openstack router set --disable
  2. Clear high availability:

    $ openstack router set --no-ha
  3. Configure the router to use DVR:

    $ openstack router set --distributed
  1. Enable the router:

    $ openstack router set --enable
  2. Confirm that distributed routing functions correctly.