Chapter 4. Operational Management

With the successful deployment of OpenShift, the following section demonstrates how to confirm proper functionality of the Red Hat OpenShift Container Platform.

4.1. Validate the Deployment

With the successful deployment of OpenShift, the following section demonstrates how to confirm proper functionality of the OpenShift environment. An Ansible script in the git repository will allow for an application to be deployed which will test the functionality of the master, nodes, registry, and router. The playbook will test the deployment and clean up any projects and pods created during the validation run.

The playbook will perform the following steps:

Environment Validation

  • Validate the public OpenShift ELB address from the installation system
  • Validate the public OpenShift ELB address from the master nodes
  • Validate the internal OpenShift ELB address from the master nodes
  • Validate the master local master address
  • Validate the health of the ETCD cluster to ensure all ETCD nodes are healthy
  • Create a project in OpenShift called validate
  • Create an OpenShift Application
  • Add a route for the Application
  • Validate the URL returns a status code of 200 or healthy
  • Delete the validation project
Note

Ensure the URLs below and the tag variables match the variables used during deployment.

$ cd /home/<user>/git/openshift-ansible-contrib/reference-architecture/aws-ansible
$ ansible-playbook -i inventory/aws/hosts/ -e 'public_hosted_zone=sysdeseng.com wildcard_zone=apps.sysdeseng.com console_port=443 stack_name=dev' playbooks/validation.yaml

4.2. Gathering hostnames

With all of the steps that occur during the installation of OpenShift it is possible to lose track of the names of the instances in the recently deployed environment. One option to get these hostnames is to browse to the AWS EC2 dashboard and select Running Instances under Resources. Selecting Running Resources shows all instances currently running within EC2. To view only instances specific to the reference architecture deployment filters can be used. Under Instances → Instances within EC2 click beside the magnifying glass. Select a Tag Key such as openshift-role and click All values. The filter shows all instances relating to the reference architecture deployment.

To help facilitate the Operational Management Chapter the following hostnames will be used.

  • ose-master01.sysdeseng.com
  • ose-master02.sysdeseng.com
  • ose-master03.sysdeseng.com
  • ose-infra-node01.sysdeseng.com
  • ose-infra-node02.sysdeseng.com
  • ose-infra-node03.sysdeseng.com
  • ose-app-node01.sysdeseng.com
  • ose-app-node02.sysdeseng.com

4.3. Running Diagnostics

Perform the following steps from the first master node.

To run diagnostics, SSH into the first master node (ose-master01.sysdeseng.com). Direct access is provided to the first master node because of the configuration of the local ~/.ssh/config file.

$ ssh ec2-user@ose-master01.sysdeseng.com
$ sudo -i

Connectivity to the first master node (ose-master01.sysdeseng.com) as the root user should have been established. Run the diagnostics that are included as part of the install.

# oadm diagnostics
... ommitted ...
[Note] Summary of diagnostics execution (version v3.5.5.5):
[Note] Warnings seen: 8
Note

The warnings will not cause issues in the environment

Based on the results of the diagnostics, actions can be taken to alleviate any issues.

4.4. Checking the Health of ETCD

This section focuses on the ETCD cluster. It describes the different commands to ensure the cluster is healthy. The internal DNS names of the nodes running ETCD must be used.

SSH into the first master node (ose-master01.sysdeseng.com). Using the output of the command hostname issue the etcdctl command to confirm that the cluster is healthy.

$ ssh ec2-user@ose-master01.sysdeseng.com
$ sudo -i
# hostname
ip-10-20-1-106.ec2.internal
# etcdctl -C https://ip-10-20-1-106.ec2.internal:2379 --ca-file /etc/etcd/ca.crt --cert-file=/etc/origin/master/master.etcd-client.crt --key-file=/etc/origin/master/master.etcd-client.key cluster-health
member 82c895b7b0de4330 is healthy: got healthy result from https://10.20.1.106:2379
member c8e7ac98bb93fe8c is healthy: got healthy result from https://10.20.3.74:2379
member f7bbfc4285f239ba is healthy: got healthy result from https://10.20.2.157:2379
Note

In this configuration the ETCD services are distributed among the OpenShift master nodes.

4.5. Default Node Selector

As explained in section 2.12.4 node labels are an important part of the OpenShift environment. By default of the reference architecture installation, the default node selector is set to "role=apps" in /etc/origin/master/master-config.yaml on all of the master nodes. This configuration parameter is set by the OpenShift installation playbooks on all masters and the master API service is restarted that is required when making any changes to the master configuration.

SSH into the first master node (ose-master01.sysdeseng.com) to verify the defaultNodeSelector is defined.

# vi /etc/origin/master/master-config.yaml
...omitted...
projectConfig:
  defaultNodeSelector: "role=app"
  projectRequestMessage: ""
  projectRequestTemplate: ""
...omitted...
Note

If making any changes to the master configuration then the master API service must be restarted or the configuration change will not take place. Any changes and the subsequent restart must be done on all masters.

4.6. Management of Maximum Pod Size

Quotas are set on ephemeral volumes within pods to prohibit a pod from becoming too large and impacting the node. There are three places where sizing restrictions should be set. When persistent volume claims are not set a pod has the ability to grow as large as the underlying filesystem will allow. The required modifications are set using a combination of user-data and Ansible.

Openshift Volume Quota

At launch time user-data creates a xfs partition on the /dev/xvdc block device, adds an entry in fstab, and mounts the volume with the option of gquota. If gquota is not set the OpenShift node will not be able to start with the perFSGroup parameter defined below. This disk and configuration is done on the infrastructure and application nodes.

SSH into the first infrastructure node (ose-infra-node01.sysdeseng.com) to verify the entry exists within fstab.

# vi /etc/fstab
/dev/xvdc /var/lib/origin/openshift.local.volumes xfs gquota 0 0

Docker Storage Setup

The docker-storage-setup file is created at launch time by user-data. This file tells the Docker service to use /dev/xvdb and create the volume group of docker-vol. The extra Docker storage options ensures that a container can grow no larger than 3G. Docker storage setup is performed on all master, infrastructure, and application nodes.

SSH into the first infrastructure node (ose-infra-node01.sysdeseng.com) to verify /etc/sysconfig/docker-storage-setup matches the information below.

# vi /etc/sysconfig/docker-storage-setup
DEVS=/dev/xvdb
VG=docker-vol
DATA_SIZE=95%VG
EXTRA_DOCKER_STORAGE_OPTIONS="--storage-opt dm.basesize=3G"

OpenShift Emptydir Quota

The parameter openshift_node_local_quota_per_fsgroup in the file playbooks/openshift-setup.yaml configures perFSGroup on all nodes. The perFSGroup setting restricts the ephemeral emptyDir volume from growing larger than 512Mi. This empty dir quota is done on the master, infrastructure, and application nodes.

SSH into the first infrastructure node (ose-infra-node01.sysdeseng.com) to verify /etc/origin/node/node-config.yml matches the information below.

# vi /etc/origin/node/node-config.yml
...omitted...
volumeConfig:
  localQuota:
     perFSGroup: 512Mi

4.7. Yum Repositories

In section 2.3 Required Channels the specific repositories for a successful OpenShift installation were defined. All systems except for the bastion host should have the same subscriptions. To verify subscriptions match those defined in Required Channels perform the following. The repositories below are enabled during the rhsm-repos playbook during the installation. The installation will be unsuccessful if the repositories are missing from the system.

# yum repolist
Loaded plugins: amazon-id, rhui-lb, search-disabled-repos, subscription-manager
repo id                                                 repo name                                                        status
rhel-7-server-extras-rpms/x86_64                        Red Hat Enterprise Linux 7 Server - Extras (RPMs)                   249
rhel-7-fast-datapath-rpms/7Server/x86_64                Red Hat Enterprise Linux Fast Datapath (RHEL 7 Server) (RPMs)        27
rhel-7-server-ose-3.5-rpms/x86_64                       Red Hat OpenShift Container Platform 3.5 (RPMs)                  404+10
rhel-7-server-rpms/7Server/x86_64                       Red Hat Enterprise Linux 7 Server (RPMs)                         11,088
!rhui-REGION-client-config-server-7/x86_64              Red Hat Update Infrastructure 2.0 Client Configuration Server 7       6
!rhui-REGION-rhel-server-releases/7Server/x86_6         Red Hat Enterprise Linux Server 7 (RPMs)                         11,088
!rhui-REGION-rhel-server-rh-common/7Server/x86_         Red Hat Enterprise Linux Server 7 RH Common (RPMs)                  196
repolist: 23,196
Note

All rhui repositories are disabled and only those repositories defined in the Ansible role rhsm-repos are enabled.

4.8. Console Access

This section will cover logging into the OpenShift Container Platform management console via the GUI and the CLI. After logging in via one of these methods applications can then be deployed and managed.

4.8.1. Log into GUI console and deploy an application

Perform the following steps from the local workstation.

Open a browser and access https://openshift-master.sysdeseng.com/console. When logging into the OpenShift web interface the first time the page will redirect and prompt for GitHub credentials. Log into GitHub using an account that is a member of the Organization specified during the install. Next, GitHub will prompt to grant access to authorize the login. If GitHub access is not granted the account will not be able to login to the OpenShift web console.

To deploy an application, click on the New Project button. Provide a Name and click Create. Next, deploy the jenkins-ephemeral instant app by clicking the corresponding box. Accept the defaults and click Create. Instructions along with a URL will be provided for how to access the application on the next screen. Click Continue to Overview and bring up the management page for the application. Click on the link provided and access the application to confirm functionality.

4.8.2. Log into CLI and Deploy an Application

Perform the following steps from your local workstation.

Install the oc client by visiting the public URL of the OpenShift deployment. For example, https://openshift-master.sysdeseng.com/console/command-line and click latest release. When directed to https://access.redhat.com, login with the valid Red Hat customer credentials and download the client relevant to the current workstation. Follow the instructions located on the production documentation site for getting started with the cli.

A token is required to login using GitHub OAuth and OpenShift. The token is presented on the https://openshift-master.sysdeseng.com/console/command-line page. Click the click to show token hyperlink and perform the following on the workstation in which the oc client was installed.

$ oc login https://openshift-master.sysdeseng.com --token=fEAjn7LnZE6v5SOocCSRVmUWGBNIIEKbjD9h-Fv7p09

After the oc client is configured, create a new project and deploy an application.

$ oc new-project test-app

$ oc new-app https://github.com/openshift/cakephp-ex.git --name=php
--> Found image 2997627 (7 days old) in image stream "php" in project "openshift" under tag "5.6" for "php"

    Apache 2.4 with PHP 5.6
    -----------------------
    Platform for building and running PHP 5.6 applications

    Tags: builder, php, php56, rh-php56

    * The source repository appears to match: php
    * A source build using source code from https://github.com/openshift/cakephp-ex.git will be created
      * The resulting image will be pushed to image stream "php:latest"
    * This image will be deployed in deployment config "php"
    * Port 8080/tcp will be load balanced by service "php"
      * Other containers can access this service through the hostname "php"

--> Creating resources with label app=php ...
    imagestream "php" created
    buildconfig "php" created
    deploymentconfig "php" created
    service "php" created
--> Success
    Build scheduled, use 'oc logs -f bc/php' to track its progress.
    Run 'oc status' to view your app.

$ oc expose service php
route "php" exposed

Display the status of the application.

$ oc status
In project test-app on server https://openshift-master.sysdeseng.com

http://test-app.apps.sysdeseng.com to pod port 8080-tcp (svc/php)
  dc/php deploys istag/php:latest <- bc/php builds https://github.com/openshift/cakephp-ex.git with openshift/php:5.6
    deployment #1 deployed about a minute ago - 1 pod

Access the application by accessing the URL provided by oc status. The CakePHP application should be visible now.

4.9. Explore the Environment

4.9.1. List Nodes and Set Permissions

If you try to run the following command, it should fail.

# oc get nodes --show-labels
Error from server: User "sysdes-admin" cannot list all nodes in the cluster

The reason it is failing is because the permissions for that user are incorrect. Get the username and configure the permissions.

$ oc whoami

Once the username has been established, log back into a master node and enable the appropriate permissions for your user. Perform the following step from the first master (ose-master01.sysdeseng.com).

# oadm policy add-cluster-role-to-user cluster-admin sysdesadmin

Attempt to list the nodes again and show the labels.

# oc get nodes --show-labels
NAME                          STATUS                     AGE
ip-10-30-1-164.ec2.internal   Ready                      1d
ip-10-30-1-231.ec2.internal   Ready                      1d
ip-10-30-1-251.ec2.internal   Ready,SchedulingDisabled   1d
ip-10-30-2-142.ec2.internal   Ready                      1d
ip-10-30-2-157.ec2.internal   Ready,SchedulingDisabled   1d
ip-10-30-2-97.ec2.internal    Ready                      1d
ip-10-30-3-74.ec2.internal    Ready,SchedulingDisabled   1d

4.9.2. List Router and Registry

List the router and registry by changing to the default project.

Note

If the OpenShift account configured on the workstation has cluster-admin privileges perform the following. If the account does not have this privilege ssh to one of the OpenShift masters and perform the steps.

# oc project default
# oc get all
NAME                         REVISION        DESIRED       CURRENT   TRIGGERED BY
dc/docker-registry           1               3             3         config
dc/router                    1               3             3         config
NAME                         DESIRED         CURRENT       AGE
rc/docker-registry-1         3               3             10m
rc/router-1                  3               3             10m
NAME                         CLUSTER-IP      EXTERNAL-IP   PORT(S)                   AGE
svc/docker-registry          172.30.243.63   <none>        5000/TCP                  10m
svc/kubernetes               172.30.0.1      <none>        443/TCP,53/UDP,53/TCP     20m
svc/router                   172.30.224.41   <none>        80/TCP,443/TCP,1936/TCP   10m
NAME                         READY           STATUS        RESTARTS                  AGE
po/docker-registry-1-2a1ho   1/1             Running       0                         8m
po/docker-registry-1-krpix   1/1             Running       0                         8m
po/router-1-1g84e            1/1             Running       0                         8m
po/router-1-t84cy            1/1             Running       0                         8m

Observe the output of oc get all

4.9.3. Explore the Registry

The OpenShift Ansible playbooks configure three infrastructure nodes that have three registries running. In order to understand the configuration and mapping process of the registry pods, the command 'oc describe' is used. oc describe details how registries are configured and mapped to the Amazon S3 buckets for storage. Using oc describe should help explain how HA works in this environment.

Note

If the OpenShift account configured on the workstation has cluster-admin privileges perform the following. If the account does not have this privilege ssh to one of the OpenShift masters and perform the steps.

$ oc describe svc/docker-registry
Name:			docker-registry
Namespace:		default
Labels:			docker-registry=default
Selector:		docker-registry=default
Type:			ClusterIP
IP:			172.30.110.31
Port:			5000-tcp	5000/TCP
Endpoints:		172.16.4.2:5000,172.16.4.3:5000
Session Affinity:	ClientIP
No events.

Notice that the registry has two endpoints listed. Each of those endpoints represents a container. The ClusterIP listed is the actual ingress point for the registries.

The oc client allows similar functionality to the docker command. To find out more information about the registry storage perform the following.

# oc get pods
NAME                      READY     STATUS    RESTARTS   AGE
docker-registry-2-8b7c6   1/1       Running   0          2h
docker-registry-2-drhgz   1/1       Running   0          2h
docker-registry-2-2s2ca   1/1       Running   0          2h
# oc exec docker-registry-2-8b7c6 cat /etc/registry/config.yml
version: 0.1
log:
  level: debug
http:
  addr: :5000
storage:
  cache:
    layerinfo: inmemory
  s3:
    accesskey: "AKIAJZO3LDPPKZFORUQQ"
    secretkey: "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"
    region: us-east-1
    bucket: "1476274760-openshift-docker-registry"
    encrypt: true
    secure: true
    v4auth: true
    rootdirectory: /registry
auth:
  openshift:
    realm: openshift
middleware:
  repository:
    - name: openshift

Observe the S3 stanza. Confirm the bucket name is listed, and access the AWS console. Click on the S3 AWS and locate the bucket. The bucket should contain content. Confirm that the same bucket is mounted to the other registry via the same steps.

4.9.4. Explore Docker Storage

This section will explore the Docker storage on an infrastructure node.

The example below can be performed on any node but for this example the infrastructure node(ose-infra-node01.sysdeseng.com) is used.

The output below describing the Storage Driver: docker—​vol-docker—​pool states that docker storage is not using a loop back device.

$ docker info
Containers: 2
 Running: 2
 Paused: 0
 Stopped: 0
Images: 4
Server Version: 1.10.3
Storage Driver: devicemapper
 Pool Name: docker--vol-docker--pool
 Pool Blocksize: 524.3 kB
 Base Device Size: 3.221 GB
 Backing Filesystem: xfs
 Data file:
 Metadata file:
 Data Space Used: 1.221 GB
 Data Space Total: 25.5 GB
 Data Space Available: 24.28 GB
 Metadata Space Used: 307.2 kB
 Metadata Space Total: 29.36 MB
 Metadata Space Available: 29.05 MB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.107-RHEL7 (2016-06-09)
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
 Volume: local
 Network: bridge null host
 Authorization: rhel-push-plugin
Kernel Version: 3.10.0-327.10.1.el7.x86_64
Operating System: Employee SKU
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 2
CPUs: 2
Total Memory: 7.389 GiB
Name: ip-10-20-3-46.ec2.internal
ID: XDCD:7NAA:N2S5:AMYW:EF33:P2WM:NF5M:XOLN:JHAD:SIHC:IZXP:MOT3
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Registries: registry.access.redhat.com (secure), docker.io (secure)

Verify 3 disks are attached to the instance. The disk /dev/xvda is used for the OS, /dev/xvdb is used for docker storage, and /dev/xvdc is used for emptyDir storage for containers that do not use a persistent volume.

$ fdisk -l
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.

Disk /dev/xvda: 26.8 GB, 26843545600 bytes, 52428800 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: gpt


#         Start          End    Size  Type            Name
 1         2048         4095      1M  BIOS boot parti
 2         4096     52428766     25G  Microsoft basic

Disk /dev/xvdc: 53.7 GB, 53687091200 bytes, 104857600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/xvdb: 26.8 GB, 26843545600 bytes, 52428800 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x00000000

    Device Boot      Start         End      Blocks   Id  System
/dev/xvdb1            2048    52428799    26213376   8e  Linux LVM

Disk /dev/mapper/docker--vol-docker--pool_tmeta: 29 MB, 29360128 bytes, 57344 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/docker--vol-docker--pool_tdata: 25.5 GB, 25497174016 bytes, 49799168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/docker--vol-docker--pool: 25.5 GB, 25497174016 bytes, 49799168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 131072 bytes / 524288 bytes


Disk /dev/mapper/docker-202:2-75507787-4a813770697f04b1a4e8f5cdaf29ff52073ea66b72a2fbe2546c469b479da9b5: 3221 MB, 3221225472 bytes, 6291456 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 131072 bytes / 524288 bytes


Disk /dev/mapper/docker-202:2-75507787-260bda602f4e740451c428af19bfec870a47270f446ddf7cb427eee52caafdf6: 3221 MB, 3221225472 bytes, 6291456 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 131072 bytes / 524288 bytes

4.9.5. Explore Security Groups

As mentioned earlier in the document several security groups have been created. The purpose of this section is to encourage exploration of the security groups that were created.

Note

Perform the following steps from the AWS web console.

On the main AWS console, click on EC2. Next on the left hand navigation panel select the Security Groups. Click through each group and check out both the Inbound and Outbound rules that were created as part of the infrastructure provisioning. For example, notice how the Bastion security group only allows SSH traffic inbound. That can be further restricted to a specific network or host if required. Next take a look at the Master security group and explore all the Inbound and Outbound TCP and UDP rules and the networks from which traffic is allowed.

4.9.6. Explore the AWS Elastic Load Balancers

As mentioned earlier in the document several ELBs have been created. The purpose of this section is to encourage exploration of the ELBs that were created.

Note

Perform the following steps from the AWS web console.

On the main AWS console, click on EC2. Next on the left hand navigation panel select the Load Balancers. Select the ose-master load balancer and on the Description page note the Port Configuration and how it is configured for port 443. That is for the OpenShift web console traffic. On the same tab, check the Availability Zones, note how those are Public subnets. Move to the Instances tab. There should be three master instances running with a Status of InService. Next check the Health Check tab and the options that were configured. Further details of the configuration can be viewed by exploring the Ansible playbooks to see exactly what was configured. Finally, change to the ose-internal-master and compare the subnets. The subnets for the ose-internal-master are all private. They are private because that ELB is reserved for traffic coming from the OpenShift infrastructure to the master servers. This results in reduced charges from Amazon because the packets do not have to be processed by the public facing ELB.

4.9.7. Explore the AWS VPC

As mentioned earlier in the document a Virtual Private Cloud was created. The purpose of this section is to encourage exploration of the VPC that was created.

Note

Perform the following steps from the AWS web console.

On the main Amazon Web Services console, click on VPC. Next on the left hand navigation panel select the Your VPCs. Select the VPC recently created and explore the Summary and Tags tabs. Next, on the left hand navigation panel, explore the Subnets, Route Tables, Internet Gateways, DHCP Options Sets, NAT Gateways, Security Groups and Network ACLs. More detail can be looked at with the configuration by exploring the Ansible playbooks to see exactly what was configured.

4.10. Testing Failure

In this section, reactions to failure are explored. After a successful install and some of the smoke tests noted above have been completed, failure testing is executed.

4.10.1. Generate a Master Outage

Note

Perform the following steps from the AWS web console and the OpenShift public URL.

Log into the AWS console. On the dashboard, click on the EC2 web service and then click Instances. Locate your running ose-master02.sysdeseng.com instance, select it, right click and change the state to stopped.

Ensure the console can still be accessed by opening a browser and accessing openshift-master.sysdeseng.com. At this point, the cluster is in a degraded state because only 2/3 master nodes are running, but complete functionality remains.

4.10.2. Observe the Behavior of ETCD with a Failed Master Node

SSH into the first master node (ose-master01.sysdeseng.com). Using the output of the command hostname issue the etcdctl command to confirm that the cluster is healthy.

$ ssh ec2-user@ose-master01.sysdeseng.com
$ sudo -i
# hostname
ip-10-20-1-106.ec2.internal
# etcdctl -C https://ip-10-20-1-106.ec2.internal:2379 --ca-file /etc/etcd/ca.crt --cert-file=/etc/origin/master/master.etcd-client.crt --key-file=/etc/origin/master/master.etcd-client.key cluster-health
failed to check the health of member 82c895b7b0de4330 on https://10.20.2.251:2379: Get https://10.20.1.251:2379/health: dial tcp 10.20.1.251:2379: i/o timeout
member 82c895b7b0de4330 is unreachable: [https://10.20.1.251:2379] are all unreachable
member c8e7ac98bb93fe8c is healthy: got healthy result from https://10.20.3.74:2379
member f7bbfc4285f239ba is healthy: got healthy result from https://10.20.1.106:2379
cluster is healthy

Notice how one member of the ETCD cluster is now unreachable. Restart ose-master02.sysdeseng.com by following the same steps in the AWS web console as noted above.

4.10.3. Generate an Infrastructure Node outage

This section shows what to expect when an infrastructure node fails or is brought down intentionally.

4.10.3.1. Confirm Application Accessibility

Note

Perform the following steps from the browser on a local workstation.

Before bringing down an infrastructure node, check behavior and ensure things are working as expected. The goal of testing an infrastructure node outage is to see how the OpenShift routers and registries behave. Confirm the simple application deployed from before is still functional. If it is not, deploy a new version. Access the application to confirm connectivity. As a reminder, to find the required information the ensure the application is still running, list the projects, change to the project that the application is deployed in, get the status of the application which including the URL and access the application via that URL.

$ oc get projects
NAME               DISPLAY NAME   STATUS
openshift                         Active
openshift-infra                   Active
ttester                           Active
test-app1                         Active
default                           Active
management-infra                  Active

$ oc project test-app1
Now using project "test-app1" on server "https://openshift-master.sysdeseng.com".

$ oc status
In project test-app1 on server https://openshift-master.sysdeseng.com

http://php-test-app1.apps.sysdeseng.com to pod port 8080-tcp (svc/php-prod)
  dc/php-prod deploys istag/php-prod:latest <-
    bc/php-prod builds https://github.com/openshift/cakephp-ex.git with openshift/php:5.6
    deployment #1 deployed 27 minutes ago - 1 pod

Open a browser and ensure the application is still accessible.

4.10.3.2. Confirm Registry Functionality

This section is another step to take before initiating the outage of the infrastructure node to ensure that the registry is functioning properly. The goal is to push to the OpenShift registry.

Note

Perform the following steps from CLI on a local workstation and ensure that the oc client has been configured.

A token is needed so that the registry can be logged into.

# oc whoami -t
feAeAgL139uFFF_72bcJlboTv7gi_bo373kf1byaAT8

Pull a new docker image for the purposes of test pushing.

# docker pull fedora/apache
# docker images

Capture the registry endpoint. The svc/docker-registry shows the endpoint.

# oc status
In project default on server https://internal-openshift-master.sysdeseng.com:443

https://docker-registry-default.apps.sysdeseng.com (passthrough) (svc/docker-registry)
  dc/docker-registry deploys docker.io/openshift3/ose-docker-registry:v3.5.5.5
    deployment #1 deployed 44 minutes ago - 3 pods

svc/kubernetes - 172.30.0.1 ports 443, 53->8053, 53->8053

https://registry-console-default.apps.sysdeseng.com (passthrough) (svc/registry-console)
  dc/registry-console deploys registry.access.redhat.com/openshift3/registry-console:3.5
    deployment #1 deployed 43 minutes ago - 1 pod

svc/router - 172.30.41.42 ports 80, 443, 1936
  dc/router deploys docker.io/openshift3/ose-haproxy-router:v3.5.5.5
    deployment #1 deployed 45 minutes ago - 3 pods

View details with 'oc describe <resource>/<name>' or list everything with 'oc get all'.

Tag the docker image with the endpoint from the previous step.

# docker tag docker.io/fedora/apache 172.30.110.31:5000/openshift/prodapache

Check the images and ensure the newly tagged image is available.

# docker images

Issue a Docker login.

# docker login -u sysdesadmin -e sysdesadmin -p $(oc whoami -t) 172.30.110.31:5000
# oadm policy add-role-to-user admin sysdesadmin -n openshift
# oadm policy add-role-to-user system:registry sysdesadmin
# oadm policy add-role-to-user system:image-builder sysdesadmin

Push the image to the OpenShift registry now.

# docker push 172.30.110.222:5000/openshift/prodapache
The push refers to a repository [172.30.110.222:5000/openshift/prodapache]
389eb3601e55: Layer already exists
c56d9d429ea9: Layer already exists
2a6c028a91ff: Layer already exists
11284f349477: Layer already exists
6c992a0e818a: Layer already exists
latest: digest: sha256:ca66f8321243cce9c5dbab48dc79b7c31cf0e1d7e94984de61d37dfdac4e381f size: 6186

4.10.3.3. Get Location of Router and Registry.

Note

Perform the following steps from the CLI of a local workstation.

Change to the default OpenShift project and check the router and registry pod locations.

$ oc project default
Now using project "default" on server "https://openshift-master.sysdeseng.com".

$ oc get pods -o wide
NAME                      READY     STATUS    RESTARTS   AGE       IP            NODE
docker-registry-2-gmvdr   1/1       Running   1          21h       172.16.4.2    ip-10-30-1-17.ec2.internal
docker-registry-2-jueep   1/1       Running   0          7h        172.16.3.3    ip-10-30-2-208.ec2.internal
router-1-6y5td            1/1       Running   1          21h       172.16.4.4    ip-10-30-1-17.ec2.internal
router-1-rlcwj            1/1       Running   1          21h       172.16.3.5    ip-10-30-2-208.ec2.internal

4.10.3.4. Initiate the Failure and Confirm Functionality

Note

Perform the following steps from the AWS web console and a browser.

Log into the AWS console. On the dashboard, click on the EC2 web service. Locate your running infra01 instance, select it, right click and change the state to stopped. Wait a minute or two for the registry and pod to migrate over to infra01. Check the registry locations and confirm that they are on the same node.

NAME                      READY     STATUS    RESTARTS   AGE       IP            NODE
docker-registry-2-gmvdr   1/1       Running   1          21h       172.16.3.6    ip-10-30-2-208.ec2.internal
docker-registry-2-jueep   1/1       Running   0          7h        172.16.3.3    ip-10-30-2-208.ec2.internal
router-1-6y5td            1/1       Running   1          21h       172.16.3.7    ip-10-30-2-208.ec2.internal
router-1-rlcwj            1/1       Running   1          21h       172.16.3.5    ip-10-30-2-208.ec2.internal

Follow the procedures above to ensure an image can still be pushed to the registry now that infra01 is down.

4.11. Updating the OpenShift Deployment

Playbooks are provided to upgrade the OpenShift deployment when minor releases occur.

4.11.1. Performing the Upgrade

From the workstation that was used to perform the installation of OpenShift on AWS run the following to ensure that the newest openshift-ansible playbooks and roles are available and to perform the minor upgrade against the deployed environment.

Note

Ensure the variables below are relevant to the deployed OpenShift environment. The variables that should be customerized for the deployed OpenShift environment are stack_name, public_hosted_zone, console_port, region, and containerized.

4.11.1.1. Non-Containerized Upgrade

Use the following lines below to perform the upgrade in a non-containerized environment.

$ yum update atomic-openshift-utils ansible
$ cd ~/git/openshift-ansible-contrib/reference-architecture/aws-ansible
$ ansible-playbook -i inventory/aws/hosts -e 'stack_name=openshift-infra public_hosted_zone=sysdeseng.com console_port=443 region=us-east-1' playbooks/openshift-minor-upgrade.yaml

4.11.1.2. Containerized Upgrade

Use the following lines below to perform the upgrade in a containerized environment.

$ yum update atomic-openshift-utils ansible
$ cd ~/git/openshift-ansible-contrib/reference-architecture/aws-ansible
$ ansible-playbook -i inventory/aws/hosts -e 'stack_name=openshift-infra public_hosted_zone=sysdeseng.com console_port=443 region=us-east-1 containerized=true' playbooks/openshift-minor-upgrade.yaml

4.11.2. Upgrading and Restarting the OpenShift Environment (Opitonal)

The openshift-minor-update.yaml playbook will not restart the instances after updating occurs. Restarting the nodes including the masters can be completed by adding the following line to the minor-update.yaml playbook.

$ cd ~/git/openshift-ansible-contrib/playbooks
$ vi minor-update.yaml
    openshift_rolling_restart_mode: system

4.11.3. Specifying the OpenShift Version when Upgrading

The deployed OpenShift environment may not be the latest major version of OpenShift. The minor-update.yaml allows for a variable to be passed to perform an upgrade on previous versions. Below is an example of performing the upgrade on a 3.3 non-containerized environment.

$ yum update atomic-openshift-utils ansible
$ cd ~/git/openshift-ansible-contrib/reference-architecture/aws-ansible
$ ansible-playbook -i inventory/aws/hosts -e 'stack_name=openshift-infra public_hosted_zone=sysdeseng.com console_port=443 region=us-east-1 openshift_vers=v3_4' playbooks/openshift-minor-upgrade.yaml