Chapter 7. High availability and clustering with OpenDaylight

Red Hat OpenStack Platform 12 supports High Availability clustering for both neutron and the OpenDaylight controller. The table below shows the recommended architecture that is able to run the high availability cluster:

Node typeNumber of nodesNode mode

Neutron

3

active/active/active

OpenDaylight

3

active/active/active

Compute nodes (nova or OVS)

any

 

The OpenDaylight role is composable, so it can be deployed on the same nodes as the neutron nodes, or on separate nodes. The setup is an all-active setup. All nodes can handle requests. If the receiving node cannot handle them, it forwards them to another appropriate node. All nodes keep in synchronisation with each other. In Open_vSwitch database schema (OVSDB) Southbound, available controller nodes share the Open vSwitches, so that each switch is handled by a specific node in the cluster.

7.1. Configuring OpenDaylight for high availability and clustering

Since the Red Hat OpenStack Platform director deploys the OpenDaylight controller nodes, it has all the information required to configure clustering for OpenDaylight. Each OpenDaylight node needs an akka.conf configuration file that identifies the node’s role (its name in the cluster) and lists at least some of the other nodes in the cluster, the seed nodes. The nodes also needs a module-shards.conf file that describes how data is replicated in the cluster. The Red Hat OpenStack Platform director makes the correct settings based on the selected deployment configuration. The akka.conf file depends on the nodes, while the module-shards.conf file depends on the nodes and the installed datastores (and hence the installed features, which we control to a large extent).

An example of the akka.conf can look like this:

odl-cluster-data {
  akka {
    remote {
      netty.tcp {
        hostname = "192.0.2.1"
      }
    },
    cluster {
      seed-nodes = [
        "akka.tcp://opendaylight-cluster-data@192.0.2.1:2550",
        "akka.tcp://opendaylight-cluster-data@192.0.2.2:2550",
        "akka.tcp://opendaylight-cluster-data@192.0.2.3:2550"],
      roles = [ "member-1" ]
    }
  }
}

The nodes listed above are seed nodes. They do not have to reflect the current cluster setup as a whole. As long as one of the real nodes in the current cluster is reachable using the list of seed nodes, a starting-up node will then be able to join the cluster. In the configuration file, you can use names instead of IP addresses.

An example of module-shards.conf file:

module-shards = [
{
  name = "default"
  shards = [{
    name="default"
    replicas = [
      "member-1",
      "member-2",
      "member-3"
    ]
  }]
},
{
  name = "topology"
  shards = [{
    name="topology"
    replicas = [
      "member-1",
      "member-2",
      "member-3"
    ]
  }]
}]

Additional sections can be added if other datastores need to be configured, for “inventory” typically. This is not absolutely required, the default configuration will be used for shards with no explicit configuration, and will work fine in most cases. In this scenario, only the default shard is configured:

module-shards = [
{
  name = "default"
  shards = [{
    name="default"
    replicas = [
      "member-1",
      "member-2",
      "member-3"
    ]
  }]
}]

7.2. Cluster behaviour

The cluster is not defined dynamically, which means that it does not adjust automatically. It is not possible to start a new node and connect it to an existing cluster by configuring the new node only. The cluster needs to be informed about nodes' additions and removals through the cluster administration RPCs.

The cluster is based on a leader/followers model. One of the active nodes is elected as the leader, and the remaining active nodes become followers. Reasoning about the cluster mostly involves reasoning about persistence, which is handled according to the Raft consensus-based model. Following this principle, a transaction is only committed if the majority of the nodes in the cluster agree.

In OpenDaylight, if a node loses its connection with the cluster, its local transactions will no longer make forward progress. Eventually they will timeout (10 minutes by default) and the front-end actor will stop. All this applies per shard, so different shards can have different leaders. The behaviour results in one of the following:

  • Lack of communication for less than ten minutes results in the minority nodes reconnecting with the majority leader. All the transactions are rolled back and the majority transactions are replayed.
  • Lack of communication for more than ten minutes results in the minority nodes stopping working and recording the information into log messages. Read-only requests should still complete, but no changes persist and the nodes are not able to re-join the cluster on their own.

This means that users have to monitor the nodes on their own. They have to check for availability and cluster synchronisation and restart them if they get desynchronised for too long. For monitoring the nodes, users can use the Jolokia REST service (see Monitoring with Jolokia for more information).

7.3. Cluster requirements

There are no specific networking requirements to support the cluster, such as bonding, MTUs, and so on. The cluster communications do not support high latencies, but latencies on the order of data-centre level are acceptable.

7.4. Open vSwitch configuration

Each switch is configured with all the controllers by the Red Hat OpenStack Platform director which has all the information needed to do it automatically. The OVSDB supports sharing switches among the cluster nodes, to allow some level of load-balancing. However, each switch contacts all the nodes in the cluster and picks the one that answers first and makes it the master switch by default. This behaviour leads to clustering of the controller assignments when the fastest answering node ends up handling most of the switches.

7.5. Cluster monitoring

7.5.1. Monitoring with Jolokia

To monitor the status of the cluster, you must enable the Jolokia support in OpenDaylight. You can obtain a datastore clustering report from the Jolokia address (http://192.0.2.1:8181/jolokia/read/org.opendaylight.controller:Category=Shards,name=member-1-shard-inventory-config,type=DistributedConfigDatastore). The report is in form of a JSON document.

Note

You have to change the IP address and the member-1 values to match your environment. The IP address can point to a VIP, if there is no preference in which node will respond. However, addressing specific controllers provides more relevant results.

This description must indicate the same leader on each node.

Note

The cluster can also be monitored by the Cluster Monitor Tool that is being developed by the upstream OpenDaylight team. You can find it in the product’s Github repository.

The tool is not a part of the Red Hat OpenStack Platform 12 and as such is not supported or provided by Red Hat.

7.6. Understanding OpenDaylight ports

The official list of all OpenDaylight ports is available on the OpenDaylight wiki page. The ports relevant for this scenario are:

Port numberUsed for

2550

clustering

6653

OpenFlow

6640, 6641

OVSDB

8087

neutron

8181

RESTCONF, Jolokia

Blocking traffic to these ports on the controller has the following effects:

Clustering
The clustered nodes will not be able to communicate. When running in clustered mode, each node needs to have at least one peer. If all traffic is blocked, the controller will stop (see cluster behaviour description above).
OpenFlow
The switches will not be able to reach the controller.
OVSDB
Open vSwitch will not be able to reach the controller. The controller will be able to initiate an active OVS connection, but any pings from the switch to the controller will fail and the switch will finally fail over to another controller.
neutron
Neutron will not be able to reach the controller.
RESTCONF
External tools using the REST endpoints will not be able to reach the controller. In this scenario, it only should affect the monitoring tools.

On the OpenDaylight side, the logs would only show blocked traffic for clustering, nothing else (since the other ports are used to talk to the ODL controller). The ports that currently are opened via the Red Hat OpenStack Platform director are listed in https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/opendaylight-api.yaml#L72.

Blocking traffic to these ports on the target devices has the following effects:

Clustering
Same as above.
OpenFlow
The controller will not be able to push flows.
OVSDB
The controller will not be able to reach the switch (the controller will be able to respond to passive OVS connections).

In all cases in the latter situation, since OpenDaylight maintains its configuration and its operational status in distinct trees, the configuration will still point to the unreachable devices, and the controller will continue trying to connect to them.

7.7. Understanding OpenDaylight flows

FlowExplanation

Neutron → ODL

Neutron to HA Proxy to ODL (see Apex). PaceMaker manages the VIP (with three backing PIPs). The driver tries to keep TCP sessions open which may have an impact (https://review.openstack.org/#/c/440866/).

ODL → Neutron

There are no ODL-initiated communications.

ODL → ODL

ODL nodes communicate with each other on port 2550 (configurable).

ODL → OVS

ODL communicates with switches using OVSDB (ports 6640 and 6641) and OpenFlow (port 6633). There is no VIP involved, ODL knows every switch’s IP address and each ODL node knows about every switch.

OVS → ODL

ODL communicates with switches using OVSDB (ports 6640 and 6641) and OpenFlow (port 6633). There is no VIP involved, ODL configures every switch so that it knows about all the controllers. Notifications from the switches to the controller are sent to all nodes.

7.8. Neutron DHCP agent HA

The default setup runs the DHCP agent on all neutron nodes, along with the OVS agent. The roles are composable though, so the agents can be separated from the controllers. The DHCP agent is only important for HA during the port bringing-up phase and during lease renewal. On port creation, neutron assigns IP and MAC addresses and configures all the DHCP agents appropriately, before the port comes up. During this phase, all DHCP agents answer the resulting DHCP requests.

To maximise the dataplane availability in the case of a DHCP agent failure, the leases are configured with long lease times, and the nodes are configured with short renewal delays. Thus, the DHCP agents are seldom needed, but when they are, the requesting nodes will quickly fail an unavailable DHCP agent and issue a broadcast request, picking up any remaining DHCP agent automatically.

The agents have their own process monitors. systemd starts the agents, and they create their namespaces and start the processes inside them. If an agent dies, the namespace stays up, systemd restarts the agent without terminating or restarting any other processes (it does not own them).Then the agent re-attaches to the namespace and re-uses it together with all running processes.

7.9. Neutron Metadata agent HA

In the reference implementation, the metadata services run on the controllers, that are combined with the network nodes, in the same namespace as the corresponding DHCP agent. A metadata proxy listens on port 80, and a static route redirects the traffic from the virtual machines to the proxy using the well-known metadata address. The proxy uses a Unix socket to talk to the metadata service, which is on the same node, and the latter talks to nova. With Unix sockets, we do not need to be able to route IP between the proxy and the service, so the metadata service is available even if the node is not routed. HA is handled using keepalive and VRRP elections. Failover time is 2-5s. The agents are handled in the same way as DHCP agents (with systemd and namespaces).

The metadata service in Red Hat OpenStack Platform 11 is a custom Python script while in Red Hat OpenStack Platform 12 it is HAProxy, which lowers the memory usage by 30. This is particularly significant because many users run one proxy per router, and hundreds if not thousands of routers per controller.