Chapter 5. High Availability

Two key requirements of Telcos for any application are performance and high availability (HA). In the multi-layered world of NFV, HA cannot be achieved only at the infrastructure layer. HA needs to be expansive and overarching in all aspects of design including hardware support, following best practice for layer 2 and layer 3 design of the underlay network, at the OpenStack level and last but not the least at the application layer. ETSI NFV (ETSI GS NFV-REL 001 V1.1.1 (2015-01)) defines “Service Availability” rather than talking in terms of five 9s. “At a minimum, the Service Availability requirements for NFV should be the same as those for legacy systems (for the same service)”. This refers to the end-to-end service (VNFs and infrastructure components).

5.1. Geo-redundancy

vEPC components such as SGW and PGW may be placed either centrally at the core datacenter (National Datacenter) or regionally to serve the local cell sites or exit points. The eNodeBs are collocated with the tower (Cell Site). In a 4G/LTE network, deciding which SGW (Ingress into the network) to use is typically based on subscriber location (Tracking Area Code). The subscriber connection terminates on the PGW. The decision of which PGW to assign to the subscriber depends on the context (APN) and other factors. The actual assignment may be done based on DNS lookup. This is shown in Figure 7.

Because of this reason, the SGW and the PGWs could be placed in regional datacenters closer to the service they need to provide. This approach is taken by many operators. However, in many deployments, the gateways, MME and other functions may be collocated in a centralized national data-center. GiLAN and IMS services are typically deployed only in the core of the mobile network.

Regardless of which way it is deployed, it is important to have redundancy built in at a datacenter level. Additionally geo-redundancy should also be factored in so that similar functions can be leveraged from other datacenters in case of a catastrophic failure at one of the datacenters. It should be noted that apart from using traditional routing to reroute traffic to the gateways/elements that became active due to the failure of the primary, SDN and Software Defined Wide Area Network (SD-WAN) technology could be employed to reprogram the underlying layer 2 and layer 3 networks to provide the appropriate level of bandwidth and functionality needed.

vEPC Georedundancy

Figure 7: Datacenter redundancy and deployment of vEPC

5.2. Chassis/Blade Redundancy

For lightweight All In One (AIO) deployments (shown in Figure 8, common datacenter design practice should be followed. Don’t place active and standby VNFs on the same chassis/host. Follow datacenter and cloud deployment best practices:

  • Blade Redundancy - DO NOT deploy active and standby instance of the same/related VNF on the same host/blade
  • Chassis Redundancy - DO NOT deploy active and standby instances of the same/related VNF on the same chassis
  • Rack Redundancy - DO NOT deploy active and standby instances of the same/related VNF on the same rack/row
VoLTE and IMS

Figure 8: Lightweight, AIO VPC deployment

Multi-server deployments regardless of whether they run on top of KVM or a full fledged OpenStack deployed in HA mode should follow the above mentioned Chassis/Blade redundancy best practices. In addition, HA best practices should be followed at the application layer. Typically, VPC VNFs have dedicated VMs acting as Control Modules which handles control traffic and the data traffic is typically handled by switching or Service Modules. Depending on whether a single chassis is being deployed or multiple chassis are being deployed, it will be important to place the control modules and service modules to allow for high availability and redundancy.

Single chassis VNF

Figure 9: Single Chassis deployment of control and switching modules.

Figure 9 shows deployment of VPC VNF on a single blade server chassis. Multiple blade server will distribute the control modules (CMs) across the chassis so that each chassis to mitigate failure. The number of control and switching modules and their deployment best practices will come from the VNF vendors. This can be seen in Figure 10.

Multi chassis VNF

Figure 10: Multi-chassis VNF HA Deployment

GiLAN elements that form logical service chains (SFC), can be deployed the following manner:

  • All VNFs required to form a logical service function chain is deployed on each server. In this model, the server and its resources become one logical deployment unit for each service chain. The advantage of this is that the operation becomes simple as deployment is uniform. To scale you simply add more servers. The disadvantage is that some elements of the SFC tend to get more utilized than others and tend to become bottlenecks which means for example, even though a parental control VM may be at 30% utilization (because it is only applied to child subscribers), DPI may be at 90% utilization because it is very CPU intensive.
  • Alternately, VNFs are deployed based on their resource requirement of the server and managed at a higher level by the VNF Manger (VNFM) or NFV Orchestrator (NFVO). In this scenario, if a VNF is deemed non-responsive or additional unit of the VNF needs to be instantiated, an appropriate server that matches the resource requirements of the VNF. If such as server is not available, the entire service chain cannot be deployed.

IMS elements will be similar to GiLAN in the sense that they are made up of some discrete elements required to perform the various functions. Care has to be taken to ensure that hardware or network failure does not isolate functions. Basic principles of HA and best practices apply to for deployment of all virtual mobile network VNFs.

5.3. Datacenter Topology (Underlay)

Datacenter topologies have changed and evolved over time. Prior approach was three layered and consisted of Access, Aggregation(Distribution) and Core layers (Figure 11.) Access layers typically consisted of layer 2 switches where the servers connected. These switches connected into layer 2/layer 3 switches which were then connected up to pure layer 3 routers that formed the core.

Three tier datacenter topology

Figure 11: Traditional three-tier datacenter topology

This approach served as the classic datacenter topology for many years. However, it had limitations and complexity typically from running Spanning Tree Protocol (STP) which offers protection against layer 2 loops but could leave the network with half the capacity due to blocked ports, not to mention complexity of troubleshooting STP! Today’s datacenter has been replaced by what is referred to as leaf-spine architecture. In leaf-spine architecture, servers are dual-homed to two leaf switches. The leaf as well as spine switches are not connected to each other but form a full mesh between each other as shown in Figure 12.

Leaf spine datacenter topology

Figure 12: Leaf and spine datacenter topology

Leaf-spine can be either layer 2 or layer 3. In a layer 2 approach, all connections are active/active. In leaf-spine architecture, Transparent Interconnection of Lots of Links (TrILL) or Shortest Path Bridging (SPB) are used instead of STP. However, leaf-spine architecture with layer 3 has the advantage of simplicity. In such a layer-3-only network, hosts may run a routing agent that will announce /32 to the leafs. Equal Cost MultiPath (ECMP) is used for evenly distributing traffic across links. It is possible to use Open Shortest Path First (OSPF) or Border Gateway Protocol (BGP) to distribute routes across the datacenter fabric. BGP is becoming a favorite with new enhancements such as BGP unnumbered which makes configuration a breeze. BGP is also proven to be highly scalable and reliable as it has been carrying Internet Routes for decades now. It should be noted however, that OpenStack networking is typically layer 2 based (VLAN, VxLAN). Most OpenStack deployments including for NFV/virtual mobile networks are layer 2 based. Red Hat OpenStack Platform uses the Neutron networking service to create and deploy tenant networks. It should be noted that routed (layer 3) leaf-spine deployments are not currently supported by OpenStack Platform director. Director installations have the following pre-defined networks (https://access.redhat.com/documentation/en/red-hat-openstack-platform/10/paged/director-installation-and-usage/chapter-3-planning-your-overcloud):

  • Intelligent Platform Management Interface (IPMI)
  • Provisioning
  • Internal API
  • Tenant
  • Storage
  • Storage Management
  • External
  • Floating IP
  • Management

Each of these networks will be mapped to IP subnets and typically VLANs. Details of networking for vEPC is discussed under the networking section.

For the layer 2 based design, it is recommended when possible to use link bundling using Link Aggregation Control Protocol (LACP) on the servers and using Multi-chassis Link Aggregation (MLAG) on the switches. In this scheme, two (or more) NICs are bonded (using Linux LACP bonding). One of the NICs is connected to one leaf switch and the other NIC to a different leaf switch. The two switches will run MLAG between each other. This will ensure that failure of a single switch will not cause the server to be isolated. Figure 13 shows how this is connected

High availability server connection

Figure 13: High availability server connection

NIC 0 on all servers is or Intelligent Platform Management Interface (IPMI) that is used to configure the servers and to console into them. NIC1 is for Out of band management (OOB). That is used to connect to the servers directly over IP.

NICs 3 & 4 of the server carry Internal traffic including storage, internal API, etc. They are bonded. NIC 3 connects up to the switch on the top, whereas NIC 4 connects to the bottom switch. The two switches run MLAG. The bonded NIC will trunk all the VLANs for internal traffic. Similarly NICs 5 & 6 may be bonded to carry external traffic (accessible to the outside world). It should be noted however, that current deployments of vEPC use Single Root Input Output Virtualization (SR-IOV) for datapath (data plane) and in order to use SR-IOV, bonding should not be done at the host level. NIC bonding at guest VM/VNF level is however supported for SR-IOV and can be be used by VNFs to mitigate risk of uplink switch failure.

Network Equipment Providers (NEPs) are looking to move away from SR-IOV based vEPC solution in the future and are looking to use Open Virtual Switch (OVS) + Data Plane Development Kit (DPDK) as their fast datapath. NIC bonding at the host level will be supported for OVS+DPDK thus allowing NICs 5 & 6 to be bonded as well.

It should be noted that if another pair of NICs is available, it is a good idea to split off the storage network to have its own bond if we are using a CEPH cluster as our storage backend. If we are using external/cloud storage this is not required.

5.4. NFVi (OpenStack) Availability

Red Hat OpenStack Platform offers several features to support HA. These features are supported and can be deployed using Red Hat OpenStack director. Since OpenStack control plane and services mostly reside on the controller nodes, HA features are targeted towards the controller nodes. OpenStack HA features that are employed for HA are:

  • Pacemaker is responsible for lifecycle management of services within the HA cluster. It ensures the services are available. If it deems the service as “not available” either because the service itself went down or the node went down, Pacemaker can restart the service, reboot the node and if all else fails, remove the node from the cluster and eventually try to restore the service. HAProxy runs on all controller nodes and is responsible for load balancing traffic to the services running on these nodes.
  • Galera - Red Hat OpenStack Platform uses the MariaDB Galera Gluster to manage database replication.

OpenStack HA may be deployed in one of two modes:

  • Active/Active: In this mode, services are running on multiple controller nodes at the same time. HAProxy is used to load balance and distribute traffic. These services are managed by systemd.
  • Active/Passive: In this mode, the service is active only on one controller node. HAProxy directs that traffic to that node. These services are managed by pacemaker.

More details on Red Hat OpenStack Platform HA can be obtained in the document “UNDERSTANDING RED HAT OPENSTACK PLATFORM HIGH AVAILABILITY” which is available on the Red Hat customer portal at http://access.redhat.com. Additionally, it is important to note that with Red Hat OpenStack Platform 10 release (Newton), another HA component is introduced known as “systemd”. Some services will be managed by Pacemaker whereas others will be managed by systemd. Also there will be common controllers vs. dedicated controllers. These will be discussed in release-specific addendums to this architecture document where release-specific features and their configurations will be covered.

vEPC and virtual mobile NEPs in general expect the VIM (OpenStack) to be running in full HA mode as vEPC could be running mission critical services such as 911 and typically hosts millions of subscribers per instance.

5.5. Application Layer HA

NEPs have had resilience and high availability in mind through design and deployment of virtualized mobile networks (vEPC, GiLAN and IMS). The purpose built hardware had HA features built into the products from the get-go all the way from the backplane design, deployment of control modules and service modules (data plane processors) to port redundancy to in service software upgrades. With the advent of virtualized solutions (NFV), the control and service modules became VMs that use the “Internal Networks” created in OpenStack to communicate with each other. However, the application logic that existed to manage availability and detect failure of service modules continue to exist in the virtualized solution as well. Control modules typically keep tab of the deployed and available service modules, use some load-balancing criteria to distribute traffic and assign service modules to subscriber flows. N+1 or N+N redundancy depending on design makes certain service modules active while one or more service modules may serve as standby, thus allowing these standby modules to take over upon failure of an active service module. Since typically these applications use some sort of heart-beat mechanism to keep track of the service module liveliness, whether they run in a virtual environment or not becomes immaterial. Additionally, session recovery between physical chassis will continue to be supported in the virtual environment. For this typically, some control plane protocol is used to determine active and standby nodes, failure detection and state change from standby to active based on some failure condition.

5.5.1. Host Aggregates and Availability Zones

NEPs who provide mobility VNFs require creation of Availability Zones in OpenStack. This is used to ensure certain VMs or groups of VMs land on certain compute nodes. This could be based on availability of local disk within that zone, it could be based on hardware profiles (nodes that support NUMA, CPU pinning, or have certain type of NICs that support SR-IOV). The application may also chose to place VMs in different zones to achieve load-sharing and mitigate risk during failure.

5.5.2. Shared File System

Shared file system can be used to store configurations of VNFs to be able to mirror the functions that is perhaps located in a geo-redundant datacenter.

5.5.3. Instance HA

Previous sections discussed how OpenStack services can be protected using pacemaker and other tools. However, it is also important to keep track of VNFs that run as VMs on the top of OpenStack and recover them in case of failure. This is referred to as “Instance HA”. Instance HA is available in Red Hat OpenStack Platform 7 and later. It can be used for VNFs that are are not managed by the SVNFM (Specialized VNFM that is vEPC/GiLAN/IMS application aware).