Warning message

Log in to add comments or rate this document.

OpenShift on VMware Cloud on AWS - Deployment guide

Updated 2020-12-17T22:24:05+00:00

Containers have become the de facto starting point for newly developed applications and those applications are frequently deployed to Kubernetes running in virtual machines. For many, this means deploying Red Hat OpenShift to VMware vSphere, either on-premises or in one of the cloud-based offerings such as VMware Cloud on AWS.

This implementation guidance is targeted at joint Red Hat OpenShift and VMware Cloud on AWS customers who want to deploy, manage, and use OpenShift deployed to and integrated with VMware vSphere. This includes the full-stack automated install experience, cloud provider integration for dynamic node provisioning, and storage integration using the Container Storage Interface (CSI).

Architecture Overview

This implementation guidance provides recommendations on how to deploy OpenShift 4.6 to VMware on AWS with vSphere 7 and VSAN 7. OpenShift is deployed using the full-stack automation experience and configured with the following integrations with the vSphere platform:

  • Cloud provider integration for dynamic provisioning and destruction of nodes
  • VMware Cloud Native Storage (CNS) integration via the CSI provisioner, providing Persistent Volumes (PVs) to applications from the VSAN datastore
  • VMware Tanzu Observability (previously Wavefront) Operator deployed to collect and forward metric and telemetry data for real-time and trend analysis

Red Hat OpenShift on VMware vSphere installation overview

As a result of OpenShift utilizing the VMware integrations, such as Cloud Native Storage, the vSphere administrator gains deep insights to the workload characteristics and utilization of the containerized applications running in OpenShift. This includes the ability to observe and analyze the storage characteristics for the applications, virtual machine resource utilization, and more.

VMware Cloud Native Storage (CNS), a Container Storage Interface (CSI) implementation for vSphere and VSAN datastores, enables dynamic provisioning for application Persistent Volumes (PV). When deployed to vSphere 7 and VSAN 7, as in this guide, both Read-Write-Once (RWO) and Read-Only are available for applications and services.

VMware vSphere Kubernetes integrations

In addition to enabling access to both RWO and RO storage for applications, VMware’s CNS integration provides detailed information to the vSphere administrator via vCenter APIs and GUI integration.

Design considerations

When deploying OpenShift to a VMware vSphere platform there are multiple considerations during the design phase which need to be taken into account. Utilizing VMware Cloud on AWS addresses many of these automatically as a result of the pre-deployed and pre-configured environment, allowing the vSphere administrator to focus on the applications and not the underlying infrastructure while providing capabilities automatically which would normally take significant additional effort to configure and manage.

Fully automated OpenShift Installation

OpenShift 4 changed the deployment and installation paradigm from OpenShift 3, bringing a simplified and streamlined experience focused on reliably and repeatedly deploying OpenShift clusters that are customized after install. OpenShift 4.5, released in mid-2020, added VMware vSphere as a supported provider for the full-stack automation install and Machine API paradigm for managing cluster nodes.

With OpenShift 4 and the full-stack automation deployment method, the installer and the cluster create and manage the infrastructure resources needed by the OpenShift deployment. This includes creating and configuring network, virtual machine, and storage assets automatically using the cloud provider API. In contrast to the traditional method of creating virtual machines, installing the operating system, and then deploying OpenShift, OpenShift 4’s installer-provisioned infrastructure requires much less effort and knowledge for the OpenShift administrator to begin using and deploying applications rapidly.

OpenShift 4 deployment process overview

After installation, OpenShift uses the Machine API Operator to interact with the underlying infrastructure provider for tasks like creating and destroying virtual machines. This enables the cluster to dynamically and automatically scale nodes up and down as needed to accommodate the application workload. Furthermore, the VMware Storage for Kubernetes in-tree provisioner is automatically configured when deploying OpenShift to VMware vSphere, which enables applications to request dynamically provisioned persistent volumes.

This document uses the full-stack automation deployment method to instantiate OpenShift clusters in a VMware Cloud on AWS environment. As a result of this, the ingress and API endpoints are managed automatically by OpenShift, which greatly simplifies the prerequisites. However, this configuration depends on OpenShift being able to manage the virtual IP addresses used for the endpoints. As a result, exposing the OpenShift apps (*.apps.clustername.domainname) and API (api.clustername.domainname) endpoints publicly via an AWS load balancer is not possible with this configuration. The VMC platform offers Public IP NAT capabilities natively. If you want to expose the default internal OpenShift ingress to the outside world, additional architecture and design considerations are needed. This is outside of the scope of this document.

Storage

Many applications, including containerized applications, need persistent storage for data. There are many options for a storage solution with OpenShift and vSphere. VMware VSAN, the integrated, hyper-converged storage solution from VMware is configured and available by default with VMC platform.

Software-defined Network (SDN)

VMware’s software-defined network capability, NSX-T, is deployed and configured with VMC, abstracting connectivity between virtual machines and providing services such as DHCP, firewalling, and load balancing for applications. This implementation guide does not use the VMware NSX Container Plugin for Kubernetes (NCP), NSX is also not used as the OpenShift SDN. This is due to NSX-T being part of the VMC managed service, and NCP not a component of this service at this time.

The NSX-T DHCP service is used for virtual machine IP management with the full-stack automated OpenShift deployment by using the Machine API integration with vSphere. Additionally, NSX-T firewall rules will be created to enable access to / from the OpenShift cluster and between the bastion management / installation host and the VMC vSphere hosts.

Scheduling OpenShift virtual machines

The VMC platform provides the necessary resilience and resource balancing for OpenShift without needing to explicitly describe VM to VM affinity rules. Should a physical host be lost VMC HA will recover OpenShift virtual machines to a running state on the existing physical hosts (and replace the failed host automatically).

Finally, utilizing vSphere resource pools to allocate vSphere resources for the OpenShift virtual machines is an important consideration for ensuring adequate performance of both the OpenShift components and the containerized applications hosted by OpenShift.

Sizing your OpenShift deployment

OpenShift sizing can be broken into three different categories:

  • Control plane nodes: the control plane nodes are sized based on the maximum number of worker nodes expected in the cluster. Refer to the documentation for specific sizing requirements.
    • There are always three control plane nodes in a production OpenShift cluster. The minimum size per node is 4 vCPU, 16GB RAM, and 120GB of storage.
  • Dedicated nodes for OpenShift infrastructure services, such as logging, metrics, registry, and routing, are also sized based on the maximum number of worker nodes expected in the cluster. The number of infrastructure nodes depends on the OpenShift services you are expecting to utilize.
    • Refer to this section of the documentation for specific sizing recommendations of infrastructure nodes for the combined metrics, router, and registry services.
    • The logging service resource requirements are heavily dependent on the amount of logs generated by applications. For specific suggestions refer to the Elasticsearch documentation, however a suggested starting point for this guide is three nodes with 8 vCPU and 24GB of memory.
  • Worker nodes are sized based on the application requirements. The size and quantity of worker nodes is, ideally, balanced to allow the application to scale up and out while providing enough distributed resources to avoid overly large failure domains. No specific recommendations for sizing will be provided in this guide, please refer to the documentation. The default worker node size created by the install process is 2 vCPU and 8 GB RAM, if you desire to adjust the default size, the install-config.yaml can be modified to use a different VM size. (Ref: Table 6. Optional VMware vSphere machine pool parameters)

VMware Cloud on AWS Sizer Tool

VMC is built on top of AWS bare metal infrastructure, this is the same bare metal infrastructure which runs AWS native services. When a VMware cloud on AWS SDDC is deployed, you consume these physical server nodes and run the VMware ESXi hypervisor in a single tenant fashion. Essentially, you are not sharing this physical infrastructure with any other customer of VMware or AWS. With this in mind, it is important to consider how many physical hosts you will need to host your virtual infrastructure. In order to determine this, VMware have provided the VMware Cloud on AWS Sizer. With this tool you can select the types of workload which you are bringing to VMware Cloud on AWS and input the number of virtual machines, specification information such as amount of storage required, number of vCPUs, vRAM and overcommit ratios. With this detail the sizer tool will generate a report, based on VMware best practices and describe for you the number of hosts you will need and the cluster configuration.

VMC on AWS Sizer

After sizing the OpenShift cluster(s), utilize that information, in addition to any other workload planned for the environment, as inputs for the VMC Sizer tool.

You also have the capability to use popular virtual infrastructure data collection tools such as RVTools and LiveOptics. The VMC Sizer tool will accept the outputs of these data collection tools as inputs to then inform you about the number of physical ESXi hosts required to run the virtual infrastructure within your VMware Cloud on AWS SDDC

VMC Sizer data import

Below is an example output, this output can be easily exported into a PDF report if required.

VMC Sizer output

The recommendation which is provided can be toggled between 2 different physical host types which are available in VMware Cloud on AWS:

  • i3.metal - 72 vCPU, 512 GiB RAM, Storage (RAW) 15TB
  • i3en.metal - 96 vCPU, 768GiB RAM, Storage (RAW) 60TB

Prerequisites

Refer to the OpenShift documentation for details about the prerequisites to deploying using full-stack automation with VMware vSphere.

You will need the following configured in the VMC environment prior to OpenShift deployment:

  • A non-exclusive, DHCP enabled, NSX-T network segment and subnet. Other VMs may be hosted on the subnet, but at least eight IPs must be available for the OpenShift deployment.
  • Allocate two IP addresses, outside the DHCP range, and configure them with reverse DNS records
    • A DNS record for api.<cluster name>.<domain name> pointing to the allocated IP address
    • A DNS record for *.apps.<cluster name>.<domain name> pointing to the allocated IP address
  • The following firewall rules need to be created, if this level of access between networks does not already exist.
  • VMC Management Gateway
    • An inbound HTTPs rule between the ESXi hosts and the SDDC management network. This is needed to upload the Red Hat Enterprise Linux CoreOS OVA during deployment.
    • An inbound HTTPs rule between the OpenShift compute network and SDDC Management Network to the vCenter. This connection is needed to allow OpenShift to communicate with vCenter for provisioning and managing nodes, PVCs, etc.

Below is an example rule allowing access from our OpenShift compute network to the vCenter on the Management Gateway.

Example VMC firewall rule for OCP access to the management network

You will need the following information to deploy OpenShift:

  • The OpenShift cluster name, for example “vmc-prod-1”.
  • The base DNS name, for example “companyname.com”
  • If not using the default, the Pod network CIDR (default: 10.128.0.0/14) and Services network CIDR (default: 172.30.0.0/16) must be identified. These are used for pod-to-pod and pod-to-service communication and will not be accessed externally, however they should not overlap with existing subnets in your organization.
  • vCenter information needed for deployment:
    • vCenter hostname, username, and password
    • Datacenter will be “SDDC-Datacenter
    • Cluster name will be “Cluster-1
    • Network name
    • Datastore name will be “Workload Datastore
  • A Linux-based host deployed to VMC for bastion management
    • Can be Red Hat Enterprise Linux, CentOS, Fedora, or another Linux which must have internet connectivity and the ability to upload an OVA to the ESXi host(s).
    • Download and install the OCP CLI tools to the bastion management host
    • Connect this bastion management host to your SDDC Management network that has been configured for access to the vCenter and ESXi hosts as per the above firewall rules.

Deploying OpenShift to VMC

Once the above prerequisites have been met, install OpenShift using the openshift-install command from the bastion mangement host, co-located in the VMC environment. The installer and control plane will automate the process of deploying and managing the resources needed for the OpenShift cluster.

openshift-install create cluster --dir=<cluster name>

You can see an example of the full-stack automated deployment process with vSphere, including a video of the deployment, in this blog post. If you have questions or issues during the deployment process, please refer to the documentation for additional information.

Note: If you want to customize default settings used in the install-config.yaml before the cluster is deployed, such as the ServiceNetwork, run the below command and review the configuration file before running the above create cluster”` command.

openshift-install create install-config --dir=<cluster name>

Post-installation configuration

After deploying OpenShift to VMware Cloud on AWS, there are a few post-install steps to take to complete the integration between OpenShift and vSphere.

VMware Cloud Native Storage Provisioner

The OpenShift installer will deploy and configure the in-tree volume provisioner using the credentials and datastore provided in install-config.yaml. To maximize the storage capabilities and vCenter integration, it is recommended you deploy the VMware Cloud Native Storage (CNS) provisioner. The steps below walk through the process.

  1. Update OpenShift virtual machines to hardware version 15 or later

    This is easiest to accomplish before any workload is hosted in the OpenShift cluster because it requires the nodes to be shut down and their properties modified. The default VM hardware version, which is not editable during the install process, is version 13. To take advantage of the VMware Cloud Native Storage provisioner, the VMs need to use version 15 or later.

    To update the hardware, follow the OpenShift documentation to shutdown the cluster. Once the nodes are powered off, follow the vSphere documentation to update the configuration, then gracefully restart the cluster.

  2. Deploy the VMware Cloud Native Storage provisioner

    The required YAML for the commands can be reviewed in this GitHub repo.

    To deploy the provisioner, you’ll first need to create the ClusterRole, ClusterRoleBinding, and SecurityContextConstraint in OpenShift.

    These can be created using the following command:

    oc apply -f https://raw.githubusercontent.com/dav1x/ocp-vsphere-csi/master/csi-driver-rbac.yaml
    

    Finally, deploy the below StatefulSet and daemonSet. These can be created using the following commands.

    oc apply -f https://raw.githubusercontent.com/dav1x/ocp-vsphere-csi/master/csi-driver-deploy-sts.yaml
    
    oc apply -f https://raw.githubusercontent.com/dav1x/ocp-vsphere-csi/master/csi-driver-deploy-ds.yaml
    

    For more information regarding the vSphere Storage for Kubernetes follow the VMware documentation.

    Please note: The default “thin” storage class is required and cannot be removed. If you choose to install the VMware Cloud Native Storage CSI driver, it is recommended to change default storage class as per the commands below.

    # remove the thin StorageClass as default
    oc patch storageclass thin -p '{"metadata": {"annotations": \
     {"storageclass.kubernetes.io/is-default-class": "false"}}}'
    
    # set the CSI StorageClass as default
    oc patch storageclass csi-sc -p '{"metadata": {"annotations": \
     {"storageclass.kubernetes.io/is-default-class": "true"}}}'
    

Infrastructure node MachineSet

To reduce OpenShift entitlement requirements, and control resources assigned to OpenShift services, it is recommended to create an infrastructure node MachineSet for the router, registry, and metrics services.

The documentation explains how to size the nodes responsible for hosting the router, registry, and metrics services. Once you have decided on the size of the virtual machines, use the documentation to create a custom MachineSet and migrate these three services to the new hosts.

Note: If you are deploying and using the OpenShift logging service, configure the components to deploy to the infrastructure nodes as well.

Node autoscaling

OpenShift will automatically deploy and destroy virtual machines to scale cluster compute resources based on workload after configuring the ClusterAutoscaler and MachineAutoscaler for your MachineSet(s).

A MachineAutoscaler defines the minimum and maximum number of nodes for a specific MachineSet which the cluster will scale up, or scale down, to accommodate the application workload.

A ClusterAutoScaler defines the minimum and maximum number of nodes in the entire cluster, which will be the sum of all machines deployed from each of the MachineSets defined.

Each of these objects must be defined before OpenShift will automatically scale the cluster. Follow the documentation for setting the values based on your expected workload. Note that MachineAutoscalers must have a value greater than or equal to one (1) for the minReplicas.

Monitoring the Platform

VMware Tanzu Observability, formerly Wavefront, uses an Operator to deploy and manage the components for collecting and forwarding data. The certified Operator is hosted on Red Hat Operator Hub, accessible from the OpenShift administrator console. To deploy and configure Tanzu Observability in your VMC-hosted OpenShift cluster, follow these steps:

  1. As a cluster administrator using the OpenShift admin console, browse to Administration > Namespaces. Create a namespace named wavefront.
  2. In the left pane, navigate to Operators > OperatorHub.
  3. From the list of Operators, search for and select the Wavefront Operator.
  4. Click on the Wavefront Operator and click Install.
  5. Toggle the radio button for “Installation Mode” to “A specific namespace”, select wavefront as the namespace to use. Set the other values according to your preference, click “Install” when ready.
  6. When the deployment is successful, the Operator is listed under Installed Operators and deploys Wavefront Proxy and Wavefront Collector CRDs into the project.
  7. Select Operators > Installed Operators > Wavefront Operator > Wavefront Proxy > Create New to deploy the proxy.
  8. Create a Wavefront proxy custom resource by specifying the following parameters in the proxy spec and leave the rest of the values as default.
    • token = <your wavefront API token>
    • url = <your wavefront instance url>
  9. Click Create. This deploys the proxy service named example-wavefrontproxy with port 2878 as a metric port. In addition, the Operator will create a Persistent Volume Claim (PVC) with the name wavefront-proxy-storage. This will use the default storage class to provision.
  10. Select Operators > Installed Operators > Wavefront Operator > Wavefront Collector > Create New to deploy the collector.
  11. Click Create without changing any values in the proxy definition.

Because default parameters are used, the collector runs as a daemonset and uses example-wavefrontproxy as a sink. The collector auto-discovers the pods and services that expose metrics and dynamically starts collecting metrics for the targets. Metrics will also be collected from the kubernetes API server if configured.

Once the collector is running, connect to the Wavefront UI, select the “Kubernetes” tile and then the “Dashboards” tab.

VMware Tanzu Observability dashboards view

The ‘Kubernetes Summary’ dashboard as shown below:

VMware Tanzu Observability Kubernetes dashboard

The Wavefront Operator and combined Wavefront Kubernetes Dashboard provides a near real time view of the OpenShift cluster running on VMC. Typically, this dashboard would be customised based on operational requirements.

Additional day 2+ configuration

The OpenShift documentation has extensive guidance on additional, optional, configuration which is done post-deployment. You should carefully review and consider which configuration is appropriate for your cluster based on the user and application requirements.

A subset of the tasks found in the OpenShift documentation include:

Summary

This document provides guidelines for successfully deploying OpenShift using the full-stack automation experience to VMware Cloud on AWS hosted vSphere clusters, including integration between the platforms where possible. Deploying Red Hat OpenShift to VMware Cloud on AWS provides a robust and scalable application platform which enables applications to be deployed and managed both on-premises and off-premises, across the hybrid cloud, with common automation and knowledge sets. OpenShift clusters seamlessly scale to accommodate application workloads while VMware on AWS clusters scale to meet the needs of OpenShift and the other virtualized applications in your organization delivering the best of both worlds from an application and infrastructure platform.

Credits

The following individuals contributed and helped validate this document.
* Andrew Sullivan - Red Hat
* Robbie Jerrom - VMware
* Davis Phillips - Red Hat
* Dean Lewis - VMware