Chapter 1. Components and Considerations

1.1. VMware SDDC Environment Considerations

This chapter provides an overview and description of the reference architecture for a highly available Red Hat OpenShift Container Platform 3 environment deployed on a VMware private cloud.

The image shown above provides a high-level representation of the components within this reference architecture. Virtual machine (VM) resources are highly available using VMware technologies; VMware HA (high availability), storage IO (input/output) control, and resource allocation via hypervisor affinity and anti-affinity rules. The Ansible deployment host is a virtual machine and acts as the entry point for access to the hosts and performs configuration of the internal servers by ensuring that all Secure Shell (SSH) traffic passes through it.

Authentication is managed by Microsoft Active Directory via lightweight directory access protocol (LDAP) authentication. OpenShift on VMware has three cloud native storage options; virtual machine persistent storage, network file system (NFS) and Gluster file system (OCS).

Virtual machine persistent storage is housed on virtual machine disk VMDKs on datastores located on external logical unit numbers (LUNs) or NFS shares.

The other storage utilized is for container persistent storage including the OCP registry. The network is configured to leverage a single load balancer for access to the OpenShift API & Console (8443/tcp) and the OpenShift routers (80/tcp, 443/tcp).

Finally, the image shows that domain name system (DNS) is handled by an external DNS source. This DNS source should be pre-configured with the proper entries prior to deployment. In this case the solutions engineering team is managing all DNS entries through a BIND server and a conditional lookup zone in Microsoft DNS.

1.2. Installation Steps

This reference architecture breaks down the deployment into three separate phases.

  • Phase 1: Provision the VM infrastructure on VMware (See Note)
  • Phase 2: Install Red Hat OpenShift Container Platform on VMware
  • Phase 3: Post deployment activities

Provisioning of the VMware environment is a prerequisite, and outside the scope of this document. Phase 1 proceeds with the deployment of virtual machines, following requirements listed in the Section 2.10.1, “Virtual Machine Hardware Requirements”

Phase 2 is the installation of OpenShift Container Platform, which is done via the Ansible playbooks installed by the openshift-ansible-playbooks rpm package. During Phase 2 the router and registry are also deployed.

The last phase, Phase 3, concludes the deployment by confirming the environment was deployed properly. This is done by running some command line tools.

1.3. VMware Software Details

This reference architecture utilizes the following versions of VMware software:

Table 1.1. Software versions

SoftwareVersion

vCenter Server via VCSA

6.7 U2 Build 13010631

vSphere Server

6.7.0 Build 13006603

1.4. Load Balancers

This guide uses an external load balancer running HAproxy to offer a single entry point for the many Red Hat OpenShift Container Platform components. Organizations can provide their own currently deployed load balancers in the event that the service already exists.

The Red Hat OpenShift Container Platform console, provided by the Red Hat OpenShift Container Platform master nodes, can be spread across multiple instances to provide both load balancing and high availability properties.

Application traffic passes through the Red Hat OpenShift Container Platform Router on its way to the container processes. The Red Hat OpenShift Container Platform Router is a reverse proxy service container that multiplexes the traffic to multiple containers making up a scaled application running inside Red Hat OpenShift Container Platform. The load balancer used by infra nodes acts as the public view for the Red Hat OpenShift Container Platform applications.

The destination for the master and application traffic must be set in the load balancer configuration after each instance is created, the floating IP address is assigned and before the installation. A single HAproxy load balancer can forward both sets of traffic to different destinations.

1.5. DNS

DNS service is an important component in the Red Hat OpenShift Container Platform environment. Regardless of the provider of DNS, an organization is required to have certain records in place to serve the various Red Hat OpenShift Container Platform components.

Since the load balancer values for the Red Hat OpenShift Container Platform master service and infrastructure nodes running router pods are known beforehand, entries should be configured into the DNS prior to starting the deployment procedure.

1.5.1. Application DNS

Applications served by OpenShift are accessible by the router on ports 80/TCP and 443/TCP. The router uses a wildcard record to map all host names under a specific sub domain to the same IP address without requiring a separate record for each name.

This allows Red Hat OpenShift Container Platform to add applications with arbitrary names as long as they are under that sub domain.

For example, a wildcard record for *.apps.example.com causes DNS name lookups for tax.apps.example.com and home-goods.apps.example.com to both return the same IP address: 10.19.x.y. All traffic is forwarded to the OpenShift Routers. The Routers examine the HTTP headers of the queries and forward them to the correct destination.

With a load-balancer host address of 10.19.x.y, the wildcard DNS record can be added as follows:

Table 1.2. Load Balancer DNS records

IP AddressHostnamePurpose

10.19.x.y

*.apps.example.com

User access to application web services

1.6. Red Hat OpenShift Container Platform Components

Red Hat OpenShift Container Platform is comprised of multiple instances running on a VMware SDDC that allow for scheduled and configured OpenShift services and supplementary containers. These containers can have persistent storage, if required, by the application and integrate with optional OpenShift services such as logging and metrics.

1.6.1. OpenShift Instances

The following sections describe the different instance and their roles to develop a Red Hat OpenShift Container Platform solution.

1.6.1.1. Master Instances

Master instances run the OpenShift master components, including the API server, controller manager server, and etcd. The master manages nodes in its Kubernetes cluster and schedules pods to run on nodes.

Note

The master instances are considered nodes as well and run the atomic-openshift-node service.

For optimal performance, the etcd service runs on the masters instances. When collocating etcd with master nodes, at least three instances are required. In order to have a single entry-point for the API, the master nodes should be deployed behind a load balancer.

In order to create master instances, set the following in the inventory file as:

... [OUTPUT ABBREVIATED] ...
[etcd]
master1.example.com
master2.example.com
master3.example.com

[masters]
master1.example.com
master2.example.com
master3.example.com

[nodes]
master1.example.com openshift_node_group_name="node-config-master"
master2.example.com openshift_node_group_name="node-config-master"
master3.example.com openshift_node_group_name="node-config-master"
Note

See the official OpenShift documentation for a detailed explanation on master nodes.

1.6.1.2. Infrastructure Instances

The infrastructure instances run the atomic-openshift-node service and host the Red Hat OpenShift Container Platform components such as Registry, Prometheus and Hawkular metrics. The infrastructure instances also run the Elastic Search, Fluentd, and Kibana(EFK) containers for aggregate logging. Persistent storage should be available to the services running on these nodes.

Depending on environment requirements at least three infrastructure nodes are required to provide a sharded/highly available aggregated logging service and to ensure that service interruptions do not occur during a reboot.

Note

For more infrastructure considerations, visit the official OpenShift documentation.

When creating infrastructure instances, set the following in the inventory file as:

... [OUTPUT ABBREVIATED] ...
[nodes]
infra1.example.com openshift_node_group_name="node-config-infra"
infra2.example.com openshift_node_group_name="node-config-infra"
infra3.example.com openshift_node_group_name="node-config-infra"
Note

The router and registry pods automatically are scheduled on nodes with the role of 'infra'.

1.6.1.3. Application Instances

The Application (app) instances run the atomic-openshift-node service. These nodes should be used to run containers created by the end users of the OpenShift service.

When creating node instances, set the following in the inventory file as:

... [OUTPUT ABBREVIATED] ...

[nodes]
node1.example.com openshift_node_group_name="node-config-compute"
node2.example.com openshift_node_group_name="node-config-compute"
node3.example.com openshift_node_group_name="node-config-compute"

1.6.2. etcd

etcd is a consistent and highly-available key value store used as Red Hat OpenShift Container Platform’s backing store for all cluster data. etcd stores the persistent master state while other components watch etcd for changes to bring themselves into the desired state.

Since values stored in etcd are critical to the function of Red Hat OpenShift Container Platform, firewalls should be implemented to regulate communication with and amongst etcd nodes. Inter-cluster and client-cluster communication is secured by utilizing x509 Public Key Infrastructure (PKI) key and certificate pairs.

etcd uses the RAFT algorithm to gracefully handle leader elections during network partitions and the loss of the current leader.

For a highly available Red Hat OpenShift Container Platform deployment, an odd number (starting with three) of etcd instances are required. Should a leader be lost, etcd will elect a new leader from the remaining nodes.

1.6.3. Labels

Labels are key/value pairs attached to objects such as pods. They are intended to be used to specify identifying attributes of objects that are meaningful and relevant to users but do not directly imply semantics to the core system. Labels can also be used to organize and select subsets of objects. Each object can have a set of labels defined at creation time or subsequently added and modified at any time.

Note

Each key must be unique for a given object.

"labels": {
  "key1" : "value1",
  "key2" : "value2"
}

Index and reverse-index labels are used for efficient queries, watches, sorting and grouping in UIs and CLIs, etc. Labels should not be polluted with non-identifying, large and/or structured data. Non-identifying information should instead be recorded using annotations.

1.6.3.1. Labels as Alternative Hierarchy

Service deployments and batch processing pipelines are often multi-dimensional entities (e.g., multiple partitions or deployments, multiple release tracks, multiple tiers, multiple micro-services per tier). Management of these deployments often requires cutting across the encapsulation of strictly hierarchical representations—​especially those rigid hierarchies determined by the infrastructure rather than by users. Labels enable users to map their own organizational structures onto system objects in a loosely coupled fashion, without requiring clients to store these mappings.

Example labels:

{"release" : "stable", "release" : "canary"}
{"environment" : "dev", "environment" : "qa", "environment" : "production"}
{"tier" : "frontend", "tier" : "backend", "tier" : "cache"}
{"partition" : "customerA", "partition" : "customerB"}
{"track" : "daily", "track" : "weekly"}

These are just examples of commonly used labels; the ability exists to develop specific conventions that best suit the deployed environment.

1.6.3.2. Labels as Node Selector

Node labels can be used as node selector where different nodes can be labeled to different use cases. The typical use case is to have nodes running Red Hat OpenShift Container Platform infrastructure components like the Red Hat OpenShift Container Platform registry, routers, metrics or logging components named "infrastructure nodes" to differentiate them from nodes dedicated to run user applications. Following this use case, the admin can label the "infrastructure nodes" with the role "infra" and the application nodes as "compute". Other uses can be having different hardware in the nodes and have classifications like "type=gold", "type=silver" or "type=bronze".

The scheduler can be configured to use node labels to assign pods to nodes depending on the node-selector. At times it makes sense to have different types of nodes to run certain pods, the node-selector can be set to select which labels are used to assign pods to nodes.

1.7. Software Defined Networking

Red Hat OpenShift Container Platform offers the ability to specify how pods communicate with each other. This could be through the use of Red Hat provided Software-defined networks (SDN) or a third-party SDN.

Deciding on the appropriate internal network for an Red Hat OpenShift Container Platform step is a crucial step. Unfortunately, there is no right answer regarding the appropriate pod network to chose, as this varies based upon the specific scenario requirements on how a Red Hat OpenShift Container Platform environment is to be used.

For the purposes of this reference environment, the Red Hat OpenShift Container Platform ovs-networkpolicy SDN plug-in is chosen due to its ability to provide pod isolation using Kubernetes NetworkPolicy. The following section, “OpenShift SDN Plugins”, discusses important details when deciding between the three popular options for the internal networks - ovs-multitenant, ovs-networkpolicy and ovs-subnet.

1.7.1. OpenShift SDN Plugins

This section focuses on multiple plugins for pod communication within Red Hat OpenShift Container Platform using OpenShift SDN. The three plugin options are listed below.

  • ovs-subnet - the original plugin that provides an overlay network created to allow pod-to-pod communication and services. This pod network is created using Open vSwitch (OVS).
  • ovs-multitenant - a plugin that provides an overlay network that is configured using OVS, similar to the ovs-subnet plugin, however, unlike the ovs-subnet it provides Red Hat OpenShift Container Platform project level isolation for pods and services.
  • ovs-networkpolicy - a plugin that provides an overlay network that is configured using OVS that provides the ability for Red Hat OpenShift Container Platform administrators to configure specific isolation policies using NetworkPolicy objects1.

1: https://docs.openshift.com/container-platform/3.11/admin_guide/managing_networking.html#admin-guide-networking-networkpolicy

Network isolation is important, which OpenShift SDN to choose?

With the above, this leaves two OpenShift SDN options: ovs-multitenant and ovs-networkpolicy. The reason ovs-subnet is ruled out is due to it not having network isolation.

While both ovs-multitenant and ovs-networkpolicy provide network isolation, the optimal choice comes down to what type of isolation is required. The ovs-multitenant plugin provides project-level isolation for pods and services. This means that pods and services from different projects cannot communicate with each other.

On the other hand, ovs-networkpolicy solves network isolation by providing project administrators the flexibility to create their own network policies using Kubernetes NetworkPolicy objects. This means that by default all pods in a project are accessible from other pods and network endpoints until NetworkPolicy objects are created. This in turn may allow pods from separate projects to communicate with each other assuming the appropriate NetworkPolicy is in place.

Depending on the level of isolation required, should determine the appropriate choice when deciding between ovs-multitenant and ovs-networkpolicy.

Note

Please see Section 2.17, “Installing Red Hat OpenShift Container Platform with VMware NSX-T (Optional)” for information about using NSX-T as the OpenShift SDN.

1.8. Ephemeral Container Storage

Container images are stored locally on the nodes running Red Hat OpenShift Container Platform pods. The container-storage-setup script uses the /etc/sysconfig/docker-storage-setup file to specify the storage configuration.

The /etc/sysconfig/docker-storage-setup file should be created before starting the docker service, otherwise the storage would be configured using a loopback device. The container storage setup is performed on all hosts running containers, therefore masters, infrastructure, and application nodes.

1.9. Persistent Storage

Containers by default offer ephemeral storage, but some applications require the storage to persist between different container deployments or upon container migration. PersistentVolumeClaims (PVC) objects are used to store this persistent application data. These claims can either be added into the environment by hand or provisioned dynamically using a StorageClass object.

1.9.1. Storage Classes

The StorageClass resource object describes and classifies different types of storage that can be requested, as well as provides a means for passing parameters to the backend for dynamically provisioned storage on demand. StorageClass objects can also serve as a management mechanism for controlling different levels of storage and access to the storage. The Cluster Administrators (cluster-admin) or Storage Administrators (storage-admin) OpenShift roles are able to define and create StorageClass objects that users can use without needing any intimate knowledge about the underlying storage volume sources. Because of this the naming of the storage class defined in the StorageClass object should be useful in understanding the type of storage it maps whether that is storage from VMware SDDC or from GlusterFS if deployed.

1.9.1.1. Persistent Volumes

PersistentVolumes (PV) objects provide pods with non-ephemeral storage by configuring and encapsulating underlying storage sources. A PersistentVolumeClaim (PVC) abstracts an underlying PV to provide provider agnostic storage to OpenShift resources. A PVC, when successfully fulfilled by the system, mounts the persistent storage to a specific directory (mountPath) within one or more pods. From the container point of view, the mountPath is connected to the underlying storage mount points by a regular bind mount.

1.10. Registry

OpenShift can build container images from source code, deploy them, and manage their lifecycle. To enable this, OpenShift provides an internal, integrated registry that can be deployed in the OpenShift environment to manage images.

The registry stores images and metadata. For production environment, persistent storage should be used for the registry, otherwise any images that were built or pushed into the registry would disappear if the pod were to restart.

1.11. Aggregated Logging

One of the Red Hat OpenShift Container Platform optional components named Red Hat OpenShift Container Platform aggregated logging collects and aggregates logs from the pods running in the Red Hat OpenShift Container Platform cluster as well as /var/log/messages on nodes enabling Red Hat OpenShift Container Platform users to view the logs of projects which they have view access using a web interface.

Red Hat OpenShift Container Platform aggregated logging component it is a modified version of the ELK stack composed by a few pods running on the Red Hat OpenShift Container Platform environment:

  • Elasticsearch: An object store where all logs are stored.
  • Kibana: A web UI for Elasticsearch.
  • Curator: Elasticsearch maintenance operations performed automatically on a per-project basis.
  • Fluentd: Gathers logs from nodes and containers and feeds them to Elasticsearch.
Note

Fluentd can be configured to send a copy of the logs to a different log aggregator and/or to a different Elasticsearch cluster, see OpenShift documentation for more information.

Once deployed in the cluster, Fluentd (deployed as a DaemonSet on any node with the right labels) gathers logs from all nodes and containers, enriches the log document with useful metadata (e.g. namespace, container_name, node) and forwards them into Elasticsearch, where Kibana provides a web interface to users to be able to view any logs. Cluster administrators can view all logs, but application developers can only view logs for projects they have permission to view. To avoid users seeing logs from pods in other projects, the Search Guard plugin for Elasticsearch is used.

A separate Elasticsearch cluster, a separate Kibana, and a separate Curator components can be deployed to form the OPS cluster where Fluentd send logs from the default, openshift, and openshift-infra projects as well as /var/log/messages on nodes into this different cluster. If the OPS cluster is not deployed those logs are hosted in the regular aggregated logging cluster.

Red Hat OpenShift Container Platform aggregated logging components can be customized for longer data persistence, pods limits, replicas of individual components, custom certificates, etc. The customization is provided by the Ansible variables as part of the deployment process.

The OPS cluster can be customized as well using the same variables using the suffix ops as in openshift_logging_es_ops_pvc_size.

Note

For more information about different customization parameters, see Aggregating Container Logs documentation.

Basic concepts for aggregated logging

  • Cluster: Set of Elasticsearch nodes distributing the workload
  • Node: Container running an instance of Elasticsearch, part of the cluster.
  • Index: Collection of documents (container logs)
  • Shards and Replicas: Indices can be split into sets of data containing the primary copy of the documents stored (primary shards) or backups of that primary copies (replica shards). Sharding allows the application to horizontally scaled the information and distributed/paralellized operations. Replication instead provides high availability and also better search throughput as searches are also executed on replicas.
Warning

Using NFS storage as a volume or a persistent volume (or via NAS such as Gluster) is not supported for Elasticsearch storage, as Lucene relies on file system behavior that NFS does not supply. Data corruption and other problems can occur.

By default every Elasticsearch pod of the Red Hat OpenShift Container Platform logging components have both master and data node. If only 2 Elasticsearch pods are deployed and one of the pods fails, all logging stops until the second master returns, so there is no availability advantage to deploy 2 Elasticsearch pods.

Note

Elasticsearch shards require their own storage, but Red Hat OpenShift Container Platform deploymentconfig shares storage volumes between all its pods, therefore every Elasticsearch pod is deployed using a different deploymentconfig so it cannot be scaled using oc scale. In order to scale the aggregated logging Elasticsearch replicas after the first deployment, it is required to modify the openshift_logging_es_cluster_size in the inventory file and re-run the openshift-logging.yml playbook.

Below is an example of some of the best practices when deploying Red Hat OpenShift Container Platform aggregated logging. Elasticsearch, and Kibana are deployed on nodes with the role of "infras". Specifying the node role ensures that the Elasticsearch and Kibana components are not competing with applications for resources. A highly-available environment for Elasticsearch is deployed to avoid data loss, therefore, at least 3 Elasticsearch replicas are deployed and openshift_logging_es_number_of_replicas parameter is configured to be 1 at least. The settings below would be defined in a variable file or static inventory. The curator is now a scheduled job and no longer a deployment configuration.

openshift_logging_install_logging=true
openshift_logging_es_pvc_dynamic=true
openshift_logging_es_pvc_size=30Gi
openshift_logging_es_cluster_size=1
openshift_logging_es_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_logging_kibana_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_logging_fluentd_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_logging_es_number_of_replicas=1

1.12. Aggregated Metrics

Red Hat OpenShift Container Platform has the ability to gather metrics from kubelet and store the values in Heapster. Red Hat OpenShift Container Platform Metrics provide the ability to view CPU, memory, and network-based metrics and display the values in the user interface. These metrics can allow for the horizontal autoscaling of pods based on parameters provided by an Red Hat OpenShift Container Platform user. It is important to understand capacity planning when deploying metrics into an Red Hat OpenShift Container Platform environment.

Red Hat OpenShift Container Platform metrics is composed by a few pods running on the Red Hat OpenShift Container Platform environment:

  • Heapster: Heapster scrapes the metrics for CPU, memory and network usage on every pod, then exports them into Hawkular Metrics.
  • Hawkular Metrics: A metrics engine that stores the data persistently in a Cassandra database.
  • Cassandra: Database where the metrics data is stored.

Red Hat OpenShift Container Platform metrics components can be customized for longer data persistence, pods limits, replicas of individual components, custom certificates, etc. The customization is provided by the Ansible variables as part of the deployment process.

As best practice, when metrics are deployed, persistent storage should be used to allow for metric data to be preserved. Node selectors should be used to specify where the Metrics components should run. In the reference architecture environment, the components are deployed on nodes with the role of "infra".

openshift_hosted_metrics_deploy=true
openshift_hosted_metrics_storage_kind=dynamic
openshift_hosted_metrics_storage_volume_size=10Gi
openshift_metrics_hawkular_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_metrics_cassandra_nodeselector={"node-role.kubernetes.io/infra": "true"}
openshift_metrics_heapster_nodeselector={"node-role.kubernetes.io/infra": "true"}