Red Hat Training

A Red Hat training course is available for Red Hat OpenStack Platform

Chapter 7. High availability and clustering with OpenDaylight

Red Hat OpenStack Platform 13 supports High Availability clustering for both neutron and the OpenDaylight Controller. The table below shows the architecture recommended to run a high availability cluster:

Node typeNumber of nodesNode mode

Neutron

3

active/active/active

OpenDaylight

3

active/active/active

Compute nodes (nova or OVS)

any

 

The OpenDaylight role is composable, so it can be deployed on the same nodes as the neutron nodes, or on separate nodes. The setup is an all-active setup. All nodes can handle requests. If the receiving node cannot handle a request, the node forwards the request to another appropriate node. All nodes maintain synchronisation with each other. In Open vSwitch database schema (OVSDB) Southbound, available Controller nodes share the Open vSwitches, so that a specific node in the cluster handles each switch.

7.1. Configuring OpenDaylight for high availability and clustering

Because the Red Hat OpenStack Platform director deploys the OpenDaylight Controller nodes, it has all the information required to configure clustering for OpenDaylight. Each OpenDaylight node requires an akka.conf configuration file that identifies the node role (its name in the cluster) and lists at least some of the other nodes in the cluster, the seed nodes. The nodes also require a module-shards.conf file that defines how data is replicated in the cluster. The Red Hat OpenStack Platform director makes the correct settings based on the selected deployment configuration. The akka.conf file depends on the nodes, while the module-shards.conf file depends on the nodes and the installed datastores (and hence the installed features, which we control to a large extent).

Example akka.conf file:

$ docker exec -it opendaylight_api cat /var/lib/kolla/config_files/src/opt/opendaylight/configuration/initial/akka.conf


odl-cluster-data {
  akka {
    remote {
      netty.tcp {
        hostname = "192.0.2.1"
      }
    },
    cluster {
      seed-nodes = [
        "akka.tcp://opendaylight-cluster-data@192.0.2.1:2550",
        "akka.tcp://opendaylight-cluster-data@192.0.2.2:2550",
        "akka.tcp://opendaylight-cluster-data@192.0.2.3:2550"],
      roles = [ "member-1" ]
    }
  }
}

These example nodes are seed nodes. They do not need to reflect the current cluster setup as a whole. As long as one of the real nodes in the current cluster is reachable using the list of seed nodes, a starting-up node can join the cluster. In the configuration file, you can use names instead of IP addresses.

7.2. Cluster behaviour

The cluster is not defined dynamically, which means that it does not adjust automatically. It is not possible to start a new node and connect it to an existing cluster by configuring only the new node. The cluster must be informed about nodes' additions and removals through the cluster administration RPCs.

The cluster is a leader/followers model. One of the active nodes is elected as the leader, and the remaining active nodes become followers. The cluster handles persistence according to the Raft consensus-based model. Following this principle, a transaction is only committed if the majority of the nodes in the cluster agree.

In OpenDaylight, if a node loses its connection with the cluster, its local transactions will no longer make forward progress. Eventually they will timeout (10 minutes by default) and the front-end actor will stop. All this applies per shard, so different shards can have different leaders. The behaviour results in one of the following:

  • Lack of communication for less than ten minutes results in the minority nodes reconnecting with the majority leader. All the transactions are rolled back and the majority transactions are replayed.
  • Lack of communication for more than ten minutes results in the minority nodes stopping working and recording the information into log messages. Read-only requests should still complete, but no changes persist and the nodes are not able to re-join the cluster autonomously.

This means that users must monitor the nodes. Users must check for availability and cluster synchronisation and restart them if they are out of synchronization for too long. For monitoring the nodes, users can use the Jolokia REST service. For more information, see Monitoring with Jolokia.

7.3. Cluster requirements

There are no specific networking requirements to support the cluster, such as bonding or MTUs. The cluster communications do not support high latencies, but latencies on the order of data-centre level are acceptable.

7.4. Open vSwitch configuration

Red Hat OpenStack Platform director configures each switch with all of the controllers automatically. The OVSDB supports sharing switches among the cluster nodes, to allow some level of load-balancing. However, each switch contacts all the nodes in the cluster and picks the one that answers first and makes it the master switch by default. This behaviour leads to clustering of the controller assignments when the fastest answering node handles most of the switches.

7.5. Cluster monitoring

7.5.1. Monitoring with Jolokia

To monitor the status of the cluster, you must enable the Jolokia support in OpenDaylight.

Obtain a configuration datastore clustering report from the Jolokia address:

 # curl -u <odl_username><odl_password>http://<odl_ip>:8081/jolokia/read/org.opendaylight.controller:type=DistributedConfigDatastore,Category=ShardManager,name=shard-manager-config

Obtain an operational datastore clustering report from the Jolokia address:

 # curl -u <odl_username><odl_password>http://<odl_ip>:8081/jolokia/read/org.opendaylight.controller:type=DistributedOperationalDatastore,Category=ShardManager,name=shard-manager-operational

The reports are JSON documents.

Note

You must change the IP address and the member-1 values to match your environment. The IP address can point to a VIP, if there is no preference in which node will respond. However, addressing specific controllers provides more relevant results.

This description must indicate the same leader on each node.

Note

You can also monitor the cluster with the Cluster Monitor Tool that is being developed by the upstream OpenDaylight team. You can find it in the OpenDaylight Github repository.

The tool is not a part of Red Hat OpenStack Platform 13 and as such is not supported or provided by Red Hat.

7.6. Understanding OpenDaylight ports

The official list of all OpenDaylight ports is available on the OpenDaylight wiki page. The ports relevant for this scenario are:

Port numberUsed for

2550

clustering

6653

OpenFlow

6640, 6641

OVSDB

8087

neutron

8081

RESTCONF, Jolokia

Blocking traffic to these ports on the controller has the following effects:

Clustering
The clustered nodes will not be able to communicate. When running in clustered mode, each node must have at least one peer. If all traffic is blocked, the controller stops.
OpenFlow
The switches will not be able to reach the controller.
OVSDB
Open vSwitch will not be able to reach the controller. The controller will be able to initiate an active OVS connection, but any pings from the switch to the controller will fail and the switch will finally fail over to another controller.
neutron
Neutron will not be able to reach the controller.
RESTCONF
External tools using the REST endpoints will not be able to reach the controller. In this scenario, it only should affect the monitoring tools.

On the OpenDaylight side, the logs show only blocked traffic for clustering, because the other ports are used to talk to the ODL controller.

Blocking traffic to these ports on the target devices has the following effects:

Clustering
The clustered nodes will not be able to communicate. When running in clustered mode, each node must have at least one peer. If all traffic is blocked, the controller stops.
OpenFlow
The controller will not be able to push flows.
OVSDB
The controller will not be able to reach the switch (the controller will be able to respond to passive OVS connections).

In all cases in the latter situation, because OpenDaylight maintains its configuration and its operational status in distinct trees, the configuration still points to the unreachable devices, and the controller continues to try to connect to them.

7.7. Understanding OpenDaylight flows

FlowExplanation

Neutron → ODL

Neutron to HA Proxy to ODL. PaceMaker manages the VIP (with three backing PIPs). The driver tries to keep TCP sessions open which may have an impact (https://review.openstack.org/#/c/440866/).

ODL → Neutron

There are no ODL-initiated communications.

ODL → ODL

ODL nodes communicate with each other on port 2550 (configurable).

ODL → OVS

ODL communicates with switches using OVSDB (ports 6640 and 6641) and OpenFlow (port 6633). There is no VIP involved, ODL knows every switch’s IP address and each ODL node knows about every switch.

OVS → ODL

ODL communicates with switches using OVSDB (ports 6640 and 6641) and OpenFlow (port 6633). There is no VIP involved, ODL configures every switch so that it knows about all the controllers. Notifications from the switches to the controller are sent to all nodes.

7.8. Neutron DHCP agent HA

The default setup runs the DHCP agent on all neutron nodes, along with the OVS agent. The roles are composable, so the agents can be separated from the controllers. The DHCP agent is important for HA only during the port bringing-up phase and during lease renewal. On port creation, neutron assigns IP and MAC addresses and configures all the DHCP agents appropriately, before the port comes up. During this phase, all DHCP agents answer the resulting DHCP requests.

To maximise the dataplane availability in the case of a DHCP agent failure, the leases are configured with long lease times, and the nodes are configured with short renewal delays. Thus, the DHCP agents are seldom needed, but when they are, the requesting nodes will quickly fail an unavailable DHCP agent and issue a broadcast request, picking up any remaining DHCP agent automatically.

The agents have their own process monitors. systemd starts the agents, and they create their namespaces and start the processes inside them. If an agent dies, the namespace stays up, systemd restarts the agent without terminating or restarting any other processes (it does not own them). Then the agent re-attaches to the namespace and re-uses it together with all running processes.

7.9. Neutron Metadata agent HA

In the reference implementation, the metadata services run on the controllers, that are combined with the network nodes, in the same namespace as the corresponding DHCP agent. A metadata proxy listens on port 80, and a static route redirects the traffic from the virtual machines to the proxy using the well-known metadata address. The proxy uses a Unix socket to talk to the metadata service, which is on the same node, and the latter talks to nova. With Unix sockets, we do not need to be able to route IP between the proxy and the service, so the metadata service is available even if the node is not routed. HA is handled using keepalive and VRRP elections. Failover time is 2-5s. The agents are handled in the same way as DHCP agents (with systemd and namespaces).

The metadata service in Red Hat OpenStack Platform 11 is a custom Python script while in Red Hat OpenStack Platform 13 it is HAProxy, which lowers the memory usage by 30. This is particularly significant because many users run one proxy per router, and hundreds if not thousands of routers per controller.