Red Hat Training

A Red Hat training course is available for Red Hat OpenStack Platform

Chapter 14. Configuring distributed virtual routing (DVR)

14.1. Understanding distributed virtual routing (DVR)

When you deploy Red Hat OpenStack Platform you can choose between a centralized routing model or DVR.

Each model has advantages and disadvantages. Use this document to carefully plan whether centralized routing or DVR better suits your needs.

DVR is enabled by default in new ML2/OVN deployments and disabled by default in new ML2/OVS deployments. The Heat template for the OpenStack Networking (neutron) API (deployment/neutron/neutron-api-container-puppet.yaml) contains a parameter to enable and disable Distributed Virtual Routing (DVR). To disable DVR, use the following in an environment file:

parameter_defaults:
  NeutronEnableDVR: false

14.1.1. Overview of Layer 3 routing

The Red Hat OpenStack Platform Networking service (neutron) provides routing services for project networks. Without a router, VM instances in a project network can communicate with other instances over a shared L2 broadcast domain. Creating a router and assigning it to a project network allows the instances in that network to communicate with other project networks or upstream (if an external gateway is defined for the router).

14.1.2. Routing flows

Routing services in OpenStack can be categorized into three main flows:

  • East-West routing - routing of traffic between different networks in the same project. This traffic does not leave the OpenStack deployment. This definition applies to both IPv4 and IPv6 subnets.
  • North-South routing with floating IPs - Floating IP addressing is a one-to-one network address translation (NAT) that can be modified and that floats between VM instances. While floating IPs are modeled as a one-to-one association between the floating IP and a Networking service (neutron) port, they are implemented by association with a Networking service router that performs the NAT translation. The floating IPs themselves are taken from the uplink network that provides the router with external connectivity. As a result, instances can communicate with external resources (such as endpoints on the internet) or the other way around. Floating IPs are an IPv4 concept and do not apply to IPv6. It is assumed that the IPv6 addressing used by projects uses Global Unicast Addresses (GUAs) with no overlap across the projects, and therefore can be routed without NAT.
  • North-South routing without floating IPs (also known as SNAT) - The Networking service offers a default port address translation (PAT) service for instances that do not have allocated floating IPs. With this service, instances can communicate with external endpoints through the router, but not the other way around. For example, an instance can browse a website on the internet, but a web browser outside cannot browse a website hosted within the instance. SNAT is applied for IPv4 traffic only. In addition, Networking service networks that are assigned GUAs prefixes do not require NAT on the Networking service router external gateway port to access the outside world.

14.1.3. Centralized routing

Originally, the Networking service (neutron) was designed with a centralized routing model where a project’s virtual routers, managed by the neutron L3 agent, are all deployed in a dedicated node or cluster of nodes (referred to as the Network node, or Controller node). This means that each time a routing function is required (east/west, floating IPs or SNAT), traffic would traverse through a dedicated node in the topology. This introduced multiple challenges and resulted in sub-optimal traffic flows. For example:

  • Traffic between instances flows through a Controller node - when two instances need to communicate with each other using L3, traffic has to hit the Controller node. Even if the instances are scheduled on the same Compute node, traffic still has to leave the Compute node, flow through the Controller, and route back to the Compute node. This negatively impacts performance.
  • Instances with floating IPs receive and send packets through the Controller node - the external network gateway interface is available only at the Controller node, so whether the traffic is originating from an instance, or destined to an instance from the external network, it has to flow through the Controller node. Consequently, in large environments the Controller node is subject to heavy traffic load. This would affect performance and scalability, and also requires careful planning to accommodate enough bandwidth in the external network gateway interface. The same requirement applies for SNAT traffic.

To better scale the L3 agent, the Networking service can use the L3 HA feature, which distributes the virtual routers across multiple nodes. In the event that a Controller node is lost, the HA router will failover to a standby on another node and there will be packet loss until the HA router failover completes.

14.2. DVR overview

Distributed Virtual Routing (DVR) offers an alternative routing design. DVR isolates the failure domain of the Controller node and optimizes network traffic by deploying the L3 agent and schedule routers on every Compute node. DVR has these characteristics:

  • East-West traffic is routed directly on the Compute nodes in a distributed fashion.
  • North-South traffic with floating IP is distributed and routed on the Compute nodes. This requires the external network to be connected to every Compute node.
  • North-South traffic without floating IP is not distributed and still requires a dedicated Controller node.
  • The L3 agent on the Controller node uses the dvr_snat mode so that the node serves only SNAT traffic.
  • The neutron metadata agent is distributed and deployed on all Compute nodes. The metadata proxy service is hosted on all the distributed routers.

14.3. DVR known issues and caveats

Note

For Red Hat OpenStack Platform 13, do not use DVR unless your kernel version is at least kernel-3.10.0-514.1.1.el7

  • Support for DVR is limited to the ML2 core plug-in and the Open vSwitch (OVS) mechanism driver or ML2/OVN mechanism driver. Other back ends are not supported.
  • On both OVS and OVN DVR deployments, network traffic for the Red Hat OpenStack Platform Load-balancing service (octavia) goes through the Controller and network nodes, instead of the compute nodes.
  • With an ML2/OVS mechanism driver network back end and DVR, it is possible to create VIPs. However, the IP address assigned to a bound port using allowed_address_pairs, should match the virtual port IP address (/32).

    If you use a CIDR format IP address for the bound port allowed_address_pairs instead, port forwarding is not configured in the back end, and traffic fails for any IP in the CIDR expecting to reach the bound IP port.

  • SNAT (source network address translation) traffic is not distributed, even when DVR is enabled. SNAT does work, but all ingress/egress traffic must traverse through the centralized Controller node.
  • IPv6 traffic is not distributed, even when DVR is enabled. IPv6 routing does work, but all ingress/egress traffic must traverse through the centralized Controller node. If you use IPv6 routing extensively, do not use DVR.
  • DVR is not supported in conjunction with L3 HA. If you use DVR with Red Hat OpenStack Platform 13 director, L3 HA is disabled. This means that routers are still scheduled on the Network nodes (and load-shared between the L3 agents), but if one agent fails, all routers hosted by this agent fail as well. This affects only SNAT traffic. The allow_automatic_l3agent_failover feature is recommended in such cases, so that if one network node fails, the routers are rescheduled to a different node.
  • DHCP servers, which are managed by the neutron DHCP agent, are not distributed and are still deployed on the Controller node. The DHCP agent is deployed in a highly available configuration on the Controller nodes, regardless of the routing design (centralized or DVR).
  • To work with floating IPs, each Compute node requires an interface on the External network. In addition, each Compute node requires one additional IP address. This is due to the implementation of the external gateway port and the floating IP network namespace.
  • VLAN, GRE, and VXLAN are all supported for project data separation. When you use GRE or VXLAN, you must enable the L2 Population feature. The Red Hat OpenStack Platform director enforces L2 Population during installation.

14.4. Supported routing architectures

Red Hat OpenStack Platform (RHOSP) supports both centralized, high-availability (HA) routing and distributed virtual routing (DVR) in the RHOSP versions listed:

  • RHOSP centralized HA routing support began in RHOSP 8.
  • RHOSP distributed routing support began in RHOSP 12.

14.5. Deploying DVR with ML2 OVS

To deploy and manage distributed virtual routing (DVR) in an ML2/OVS deployment, you configure settings in heat templates and environment files.

You use heat template settings to provision host networking:

  • Configure the interface connected to the physical network for external network traffic on both the Compute and Controller nodes.
  • Create a bridge on Compute and Controller nodes, with an interface for external network traffic.

You also configure the Networking service (neutron) to match the provisioned networking environment and allow traffic to use the bridge.

The default settings are provided as guidelines only. They are not expected to work in production or test environments which may require customization for network isolation, dedicated NICs, or any number of other variable factors. In setting up an environment, you need to correctly configure the bridge mapping type parameters used by the L2 agents and the external facing bridges for other agents, such as the L3 agent.

The following example procedure shows how to configure a proof-of-concept environment using the typical defaults.

Procedure

  1. Verify that the value for OS::TripleO::Compute::Net::SoftwareConfig matches the value of OS::TripleO::Controller::Net::SoftwareConfig in the file overcloud-resource-registry.yaml or in an environment file included in the deployment command.

    This value names a file, such as net_config_bridge.yaml. The named file configures Neutron bridge mappings for external networks Compute node L2 agents. The bridge routes traffic for the floating IP addresses hosted on Compute nodes in a DVR deployment. Normally, you can find this filename value in the network environment file that you use when deploying the overcloud, such as environments/net-multiple-nics.yaml.

    Note

    If you customize the network configuration of the Compute node, you may need to add the appropriate configuration to your custom files instead.

  2. Verify that the Compute node has an external bridge.

    1. Make a local copy of the openstack-tripleo-heat-templates directory.
    2. $ cd <local_copy_of_templates_directory.
    3. Run the process-templates script to render the templates to a temporary output directory:

      $ ./tools/process-templates.py -r <roles_data.yaml> \
        -n <network_data.yaml> -o <temporary_output_directory>
    4. Check the role files in <temporary_output_directory>/network/config.
  3. If needed, customize the Compute template to include an external bridge that matches the Controller nodes, and name the custom file path in OS::TripleO::Compute::Net::SoftwareConfig in an environment file.
  4. Include the environments/services/neutron-ovs-dvr.yaml file in the deployment command when deploying the overcloud:

    $ openstack overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-ovs-dvr.yaml
  5. Verify that L3 HA is disabled.

    Note

    The external bridge configuration for the L3 agent was deprecated in Red Hat OpenStack Platform 13 and removed in Red Hat OpenStack Platform 15.

14.6. Migrating centralized routers to distributed routing

This section contains information about upgrading to distributed routing for Red Hat OpenStack Platform deployments that use L3 HA centralized routing.

Procedure

  1. Upgrade your deployment and validate that it is working correctly.
  2. Run the director stack update to configure DVR.
  3. Confirm that routing functions correctly through the existing routers.
  4. You cannot transition an L3 HA router to distributed directly. Instead, for each router, disable the L3 HA option, and then enable the distributed option:

    1. Disable the router:

      Example

      $ openstack router set --disable router1

    2. Clear high availability:

      Example

      $ openstack router set --no-ha router1

    3. Configure the router to use DVR:

      Example

      $ openstack router set --distributed router1

    4. Enable the router:

      Example

      $ openstack router set --enable router1

    5. Confirm that distributed routing functions correctly.

Additional resources