Menu Close
Chapter 4. Operational Management
With the successful deployment of OpenShift Container Platform, the following section demonstrates how to confirm proper functionality of the Red Hat OpenShift Container Platform.
4.1. SSH configuration
Optionally, to be able to connect easily to the VMs, the following SSH
configuration file can be applied to the workstation that will perform the SSH
commands:
$ cat /home/<user>/.ssh/config
Host bastion
HostName <resourcegroup>b.<region>.cloudapp.azure.com
User <user>
StrictHostKeyChecking no
ProxyCommand none
CheckHostIP no
ForwardAgent yes
IdentityFile /home/<user>/.ssh/id_rsa
Host master? infranode? node??
ProxyCommand ssh <user>@bastion -W %h:%p
user <user>
IdentityFile /home/<user>/.ssh/id_rsa
To connect to any VM it is only needed the hostname as:
$ ssh infranode3
4.2. Gathering hostnames
With all of the steps that occur during the installation of OpenShift Container Platform, it is possible to lose track of the names of the instances in the recently deployed environment. One option to get these hostnames is to browse to the Azure Resource Group
dashboard and select Overview. The filter shows all instances relating to the reference architecture deployment.
To help facilitate the Chapter 4, Operational Management chapter the following hostnames will be used.
- master1
- master2
- master3
- infranode1
- infranode2
- infranode3
- node01
- node02
- node03
4.3. Running Diagnostics
To run diagnostics, SSH
into the first master node (master1), via the bastion host using the admin user specified in the template:
$ ssh <user>@<resourcegroup>b.<region>.cloudapp.azure.com $ ssh <user>@master1 $ sudo -i
Connectivity to the first master node (master1.<region>.cloudapp.azure.com) as the root
user should have been established. Run the diagnostics that are included as part of the OpenShift Container Platform installation:
# oadm diagnostics [Note] Determining if client configuration exists for client/cluster diagnostics Info: Successfully read a client config file at '/root/.kube/config' Info: Using context for cluster-admin access: 'default/sysdeseng-westus-cloudapp-azure-com:8443/system:admin' [Note] Performing systemd discovery [Note] Running diagnostic: ConfigContexts[default/sysdeseng-westus-cloudapp-azure-com:8443/system:admin] Description: Validate client config context is complete and has connectivity Info: The current client config context is 'default/sysdeseng-westus-cloudapp-azure-com:8443/system:admin': The server URL is 'https://sysdeseng.westus.cloudapp.azure.com:8443' The user authentication is 'system:admin/sysdeseng-westus-cloudapp-azure-com:8443' The current project is 'default' Successfully requested project list; has access to project(s): [default gsw kube-system logging management-infra openshift openshift-infra] [Note] Running diagnostic: DiagnosticPod Description: Create a pod to run diagnostics from the application standpoint [Note] Running diagnostic: PodCheckDns Description: Check that DNS within a pod works as expected [Note] Summary of diagnostics execution (version v3.6.5.5): [Note] Warnings seen: 0 [Note] Errors seen: 0 [Note] Running diagnostic: NetworkCheck Description: Create a pod on all schedulable nodes and run network diagnostics from the application standpoint [Note] Running diagnostic: CheckExternalNetwork Description: Check that external network is accessible within a pod [Note] Running diagnostic: CheckNodeNetwork Description: Check that pods in the cluster can access its own node. [Note] Running diagnostic: CheckPodNetwork Description: Check pod to pod communication in the cluster. In case of ovs-subnet network plugin, all pods should be able to communicate with each other and in case of multitenant network plugin, pods in non-global projects should be isolated and pods in global projects should be able to access any pod in the cluster and vice versa. [Note] Running diagnostic: CheckServiceNetwork Description: Check pod to service communication in the cluster. In case of ovs-subnet network plugin, all pods should be able to communicate with all services and in case of multitenant network plugin, services in non-global projects should be isolated and pods in global projects should be able to access any service in the cluster. [Note] Running diagnostic: CollectNetworkInfo Description: Collect network information in the cluster. [Note] Summary of diagnostics execution (version v3.6.5.5): [Note] Warnings seen: 0 [Note] Running diagnostic: CheckNodeNetwork Description: Check that pods in the cluster can access its own node. [Note] Running diagnostic: CheckPodNetwork Description: Check pod to pod communication in the cluster. In case of ovs-subnet network plugin, all pods should be able to communicate with each other and in case of multitenant network plugin, pods in non-global projects should be isolated and pods in global projects should be able to access any pod in the cluster and vice versa. [Note] Running diagnostic: CheckServiceNetwork Description: Check pod to service communication in the cluster. In case of ovs-subnet network plugin, all pods should be able to communicate with all services and in case of multitenant network plugin, services in non-global projects should be isolated and pods in global projects should be able to access any service in the cluster. [Note] Running diagnostic: CollectNetworkInfo Description: Collect network information in the cluster. [Note] Summary of diagnostics execution (version v3.6.5.5): [Note] Warnings seen: 0 [Note] Running diagnostic: CheckNodeNetwork Description: Check that pods in the cluster can access its own node. [Note] Running diagnostic: CheckPodNetwork Description: Check pod to pod communication in the cluster. In case of ovs-subnet network plugin, all pods should be able to communicate with each other and in case of multitenant network plugin, pods in non-global projects should be isolated and pods in global projects should be able to access any pod in the cluster and vice versa. [Note] Running diagnostic: CheckServiceNetwork Description: Check pod to service communication in the cluster. In case of ovs-subnet network plugin, all pods should be able to communicate with all services and in case of multitenant network plugin, services in non-global projects should be isolated and pods in global projects should be able to access any service in the cluster. [Note] Running diagnostic: CollectNetworkInfo Description: Collect network information in the cluster. [Note] Summary of diagnostics execution (version v3.6.5.5): [Note] Warnings seen: 0 [Note] Skipping diagnostic: AggregatedLogging Description: Check aggregated logging integration for proper configuration Because: No LoggingPublicURL is defined in the master configuration [Note] Running diagnostic: ClusterRegistry Description: Check that there is a working Docker registry [Note] Running diagnostic: ClusterRoleBindings Description: Check that the default ClusterRoleBindings are present and contain the expected subjects Info: clusterrolebinding/cluster-readers has more subjects than expected. Use theoadm policy reconcile-cluster-role-bindings
command to update the role binding to remove extra subjects. Info: clusterrolebinding/cluster-readers has extra subject {ServiceAccount management-infra management-admin }. Info: clusterrolebinding/cluster-readers has extra subject {ServiceAccount default router }. Info: clusterrolebinding/self-provisioners has more subjects than expected. Use theoadm policy reconcile-cluster-role-bindings
command to update the role binding to remove extra subjects. Info: clusterrolebinding/self-provisioners has extra subject {ServiceAccount management-infra management-admin }. [Note] Running diagnostic: ClusterRoles Description: Check that the default ClusterRoles are present and contain the expected permissions [Note] Running diagnostic: ClusterRouterName Description: Check there is a working router [Note] Running diagnostic: MasterNode Description: Check if master is also running node (for Open vSwitch) WARN: [DClu3004 from diagnostic MasterNode@openshift/origin/pkg/diagnostics/cluster/master_node.go:164] Unable to find a node matching the cluster server IP. This may indicate the master is not also running a node, and is unable to proxy to pods over the Open vSwitch SDN. [Note] Skipping diagnostic: MetricsApiProxy Description: Check the integrated heapster metrics can be reached via the API proxy Because: The heapster service does not exist in the openshift-infra project at this time, so it is not available for the Horizontal Pod Autoscaler to use as a source of metrics. [Note] Running diagnostic: NodeDefinitions Description: Check node records on master WARN: [DClu0003 from diagnostic NodeDefinition@openshift/origin/pkg/diagnostics/cluster/node_definitions.go:112] Node master1 is ready but is marked Unschedulable. This is usually set manually for administrative reasons. An administrator can mark the node schedulable with: oadm manage-node master1 --schedulable=true While in this state, pods should not be scheduled to deploy on the node. Existing pods will continue to run until completed or evacuated (see other options for 'oadm manage-node'). WARN: [DClu0003 from diagnostic NodeDefinition@openshift/origin/pkg/diagnostics/cluster/node_definitions.go:112] Node master2 is ready but is marked Unschedulable. This is usually set manually for administrative reasons. An administrator can mark the node schedulable with: oadm manage-node master2 --schedulable=true While in this state, pods should not be scheduled to deploy on the node. Existing pods will continue to run until completed or evacuated (see other options for 'oadm manage-node'). WARN: [DClu0003 from diagnostic NodeDefinition@openshift/origin/pkg/diagnostics/cluster/node_definitions.go:112] Node master3 is ready but is marked Unschedulable. This is usually set manually for administrative reasons. An administrator can mark the node schedulable with: oadm manage-node master3 --schedulable=true While in this state, pods should not be scheduled to deploy on the node. Existing pods will continue to run until completed or evacuated (see other options for 'oadm manage-node'). [Note] Running diagnostic: ServiceExternalIPs Description: Check for existing services with ExternalIPs that are disallowed by master config [Note] Running diagnostic: AnalyzeLogs Description: Check for recent problems in systemd service logs Info: Checking journalctl logs for 'atomic-openshift-node' service Info: Checking journalctl logs for 'docker' service [Note] Running diagnostic: MasterConfigCheck Description: Check the master config file WARN: [DH0005 from diagnostic MasterConfigCheck@openshift/origin/pkg/diagnostics/host/check_master_config.go:52] Validation of master config file '/etc/origin/master/master-config.yaml' warned: assetConfig.loggingPublicURL: Invalid value: "": required to view aggregated container logs in the console assetConfig.metricsPublicURL: Invalid value: "": required to view cluster metrics in the console auditConfig.auditFilePath: Required value: audit can now be logged to a separate file [Note] Running diagnostic: NodeConfigCheck Description: Check the node config file Info: Found a node config file: /etc/origin/node/node-config.yaml [Note] Running diagnostic: UnitStatus Description: Check status for related systemd units [Note] Summary of diagnostics execution (version v3.6.5.5): [Note] Warnings seen: 5 [Note] Errors seen: 0
The warnings will not cause issues in the environment
Based on the results of the diagnostics, actions can be taken to alleviate any issues.
4.4. Checking the Health of etcd
This section focuses on the etcd
cluster. It describes the different commands to ensure the cluster is healthy. The internal DNS
names of the nodes running etcd
must be used.
SSH
into the first master node (master1) as before:
$ ssh <user>@<resourcegroup>b.<region>.cloudapp.azure.com $ ssh <user>@master1 $ sudo -i
Using the output of the command hostname
issue the etcdctl
command to confirm that the cluster is healthy.
# etcdctl --endpoints https://master1:2379,https://master2:2379,https://master3:2379 --ca-file /etc/etcd/ca.crt --cert-file=/etc/origin/master/master.etcd-client.crt --key-file=/etc/origin/master/master.etcd-client.key cluster-health
member 82c895b7b0de4330 is healthy: got healthy result from https://10.0.0.4:2379
member c8e7ac98bb93fe8c is healthy: got healthy result from https://10.0.0.5:2379
member f7bbfc4285f239ba is healthy: got healthy result from https://10.0.0.6:2379
In this configuration the etcd
services are distributed among the OpenShift Container Platform master nodes.
4.5. Default Node Selector
As explained in Nodes section, node labels are an important part of the OpenShift Container Platform environment. By default of the reference architecture installation, the default node selector is set to role=apps
in /etc/origin/master/master-config.yaml
on all of the master nodes. This configuration parameter is set during the installation of OpenShift on all masters.
SSH
into the first master node (master1) to verify the defaultNodeSelector
is defined.
$ ssh <user>@<resourcegroup>b.<region>.cloudapp.azure.com $ ssh <user>@master1 $ sudo -i # vi /etc/origin/master/master-config.yaml ... [OUTPUT ABBREVIATED] ... projectConfig: defaultNodeSelector: "role=app" projectRequestMessage: "" projectRequestTemplate: "" ... [OUTPUT ABBREVIATED] ...
If making any changes to the master configuration then the master API service must be restarted or the configuration change will not take place. Any changes and the subsequent restart must be done on all masters.
4.6. Management of Maximum Pod Size
Quotas are set on ephemeral volumes within pods to prohibit a pod from becoming too large and impacting the node. There are three places where sizing restrictions should be set. When persistent volume claims are not set a pod has the ability to grow as large as the underlying filesystem will allow. The required modifications are set by automatically.
OpenShift Volume Quota
At launch time a script creates a XFS
partition on the block device, adds an entry in /etc/fstab
, and mounts the volume with the option of gquota
. If gquota
is not set the OpenShift Container Platform node will not be able to start with the perFSGroup
parameter defined below. This disk and configuration is done on the master, infrastructure, and application nodes.
SSH
into the first infrastructure node (infranode1) to verify the entry exists within /etc/fstab
$ ssh <user>@<resourcegroup>b.<region>.cloudapp.azure.com $ ssh <user>@infranode1 $ grep "/var/lib/origin/openshift.local.volumes" /etc/fstab /dev/sdc1 /var/lib/origin/openshift.local.volumes xfs gquota 0 0
OpenShift Emptydir Quota
During installation a value for perFSGroup
is set within the node configuration. The perFSGroup
setting restricts the ephemeral emptyDir
volume from growing larger than 512Mi. This emptyDir
quota is done on the master, infrastructure, and application nodes.
SSH
into the first infrastructure node (infranode1) to verify /etc/origin/node/node-config.yml
matches the information below.
$ ssh <user>@<resourcegroup>b.<region>.cloudapp.azure.com $ ssh <user>@infranode1 $ sudo grep -B2 perFSGroup /etc/origin/node/node-config.yaml volumeConfig: localQuota: perFSGroup: 512Mi
Docker Storage Setup
The /etc/sysconfig/docker-storage-setup
file is created at launch time by the bash script on every node. This file tells the Docker service to use a specific volume group for containers. Docker storage setup is performed on all master, infrastructure, and application nodes.
SSH
into the first infrastructure node (infranode1) to verify /etc/sysconfig/docker-storage-setup
matches the information below.
$ ssh <user>@<resourcegroup>b.<region>.cloudapp.azure.com $ ssh <user>@infranode1 $ cat /etc/sysconfig/docker-storage-setup DEVS=/dev/sdd VG=docker_vol DATA_SIZE=95%VG STORAGE_DRIVER=overlay2 CONTAINER_ROOT_LV_NAME=dockerlv CONTAINER_ROOT_LV_MOUNT_PATH=/var/lib/docker CONTAINER_ROOT_LV_SIZE=100%FREE
4.7. Yum Repositories
In section Required Channels the specific repositories for a successful OpenShift Container Platform installation were defined. All systems except for the bastion host should have the same repositories configured. To verify subscriptions match those defined in Required Channels perform the following. The repositories below are enabled during the rhsm-repos playbook during the installation. The installation will be unsuccessful if the repositories are missing from the system.
SSH
into the first infrastructure node (infranode1) and verify the command output matches the information below.
$ ssh <user>@<resourcegroup>b.<region>.cloudapp.azure.com $ ssh <user>@infranode1 $ yum repolist Loaded plugins: langpacks, product-id, search-disabled-repos repo id repo name status rhel-7-fast-datapath-rpms/7Server/x86_64 Red Hat Enterprise Linux Fast Datapath (RHEL 7 Server) (RPMs) 27 rhel-7-server-extras-rpms/x86_64 Red Hat Enterprise Linux 7 Server - Extras (RPMs) 461+4 rhel-7-server-ose-3.6-rpms/x86_64 Red Hat OpenShift Container Platform 3.6 (RPMs) 437+30 rhel-7-server-rpms/7Server/x86_64 Red Hat Enterprise Linux 7 Server (RPMs) 14.285 repolist: 15.210
4.8. Console Access
This section will cover logging into the OpenShift Container Platform management console via the GUI and the CLI. After logging in via one of these methods applications can then be deployed and managed.
4.8.1. Log into GUI console and deploy an application
Perform the following steps from the local workstation.
Open a browser and access the OpenShift Container Platform web console located in https://<resourcegroupname>.<region>.cloudapp.azure.com/console The resourcegroupname
was given in the ARM
template, and region
is the Microsoft Azure zone selected during install. When logging into the OpenShift Container Platform web console, use the user login and password specified during the launch of the ARM
template.
Once logged, to deploy an example application:
- Click on the [New Project] button
- Provide a "Name" and click [Create]
-
Next, deploy the
jenkins-ephemeral
instant app by clicking the corresponding box. - Accept the defaults and click [Create]. Instructions along with a URL will be provided for how to access the application on the next screen.
- Click [Continue to Overview] and bring up the management page for the application.
- Click on the link provided as the route and access the application to confirm functionality.
4.8.2. Log into CLI and Deploy an Application
Perform the following steps from the local workstation.
Install the oc
CLI by visiting the public URL of the OpenShift Container Platform deployment. For example, https://resourcegroupname.region.cloudapp.azure.com/console/command-line and click latest release. When directed to https://access.redhat.com, login with the valid Red Hat customer credentials and download the client relevant to the current workstation operating system. Follow the instructions located on documentation site for getting started with the cli.
A token is required to login to OpenShift Container Platform. The token is presented on the https://resourcegroupname.region.cloudapp.azure.com/console/command-line page. Click to show token hyperlink and perform the following on the workstation in which the oc
client was installed.
$ oc login https://resourcegroupname.region.cloudapp.azure.com --token=fEAjn7LnZE6v5SOocCSRVmUWGBNIIEKbjD9h-Fv7p09
oc
command also supports logging with username and password combination. See oc help login
output for more information
After the oc
client is configured, create a new project and deploy an application, in this case, a php sample application (CakePHP):
$ oc new-project test-app $ oc new-app https://github.com/openshift/cakephp-ex.git --name=php --> Found image 2997627 (7 days old) in image stream "php" in project "openshift" under tag "5.6" for "php" Apache 2.4 with PHP 5.6 ----------------------- Platform for building and running PHP 5.6 applications Tags: builder, php, php56, rh-php56 * The source repository appears to match: php * A source build using source code from https://github.com/openshift/cakephp-ex.git will be created * The resulting image will be pushed to image stream "php:latest" * This image will be deployed in deployment config "php" * Port 8080/tcp will be load balanced by service "php" * Other containers can access this service through the hostname "php" --> Creating resources with label app=php ... imagestream "php" created buildconfig "php" created deploymentconfig "php" created service "php" created --> Success Build scheduled, use 'oc logs -f bc/php' to track its progress. Run 'oc status' to view your app. $ oc expose service php route "php" exposed
Display the status of the application.
$ oc status
In project test-app on server https://resourcegroupname.region.cloudapp.azure.com
http://test-app.apps.13.93.162.100.nip.io to pod port 8080-tcp (svc/php)
dc/php deploys istag/php:latest <- bc/php builds https://github.com/openshift/cakephp-ex.git with openshift/php:5.6
deployment #1 deployed about a minute ago - 1 pod
Access the application by accessing the URL provided by oc status
. The CakePHP application should be visible now.
4.9. Explore the Environment
4.9.1. List Nodes and Set Permissions
$ oc get nodes --show-labels
NAME STATUS AGE
infranode1 Ready 16d
infranode2 Ready 16d
infranode3 Ready 16d
master1 Ready,SchedulingDisabled 16d
master2 Ready,SchedulingDisabled 16d
master3 Ready,SchedulingDisabled 16d
node01 Ready 16d
node02 Ready 16d
node03 Ready 16d
Running this command with a regular user should fail.
$ oc get nodes --show-labels
Error from server: User "nonadmin" cannot list all nodes in the cluster
The reason it is failing is because the permissions for that user are incorrect.
For more information about the roles and permissions, see Authorization documentation
4.9.2. List Router and Registry
List the router and registry pods by changing to the default
project.
Perform the following steps from the local workstation.
$ oc project default $ oc get all NAME REVISION DESIRED CURRENT TRIGGERED BY dc/docker-registry 1 1 1 config dc/router 1 2 2 config NAME DESIRED CURRENT AGE rc/docker-registry-1 1 1 10m rc/router-1 2 2 10m NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE svc/docker-registry 172.30.243.63 <none> 5000/TCP 10m svc/kubernetes 172.30.0.1 <none> 443/TCP,53/UDP,53/TCP 20m svc/router 172.30.224.41 <none> 80/TCP,443/TCP,1936/TCP 10m NAME READY STATUS RESTARTS AGE po/docker-registry-1-2a1ho 1/1 Running 0 8m po/router-1-1g84e 1/1 Running 0 8m po/router-1-t84cy 1/1 Running 0 8m
Observe the output of oc get all
4.9.3. Explore the Docker Registry
The OpenShift Container Platform ansible playbooks configure three infrastructure nodes that have one registry running. In order to understand the configuration and mapping process of the registry pods, the command oc describe
is used. oc describe
details how registries are configured and mapped to the Azure Blob Storage
using the REGISTRY_STORAGE_*
environment variables.
Perform the following steps from the local workstation.
$ oc describe dc/docker-registry
... [OUTPUT ABBREVIATED] ...
Environment Variables:
REGISTRY_HTTP_ADDR: :5000
REGISTRY_HTTP_NET: tcp
REGISTRY_HTTP_SECRET: 7H7ihSNi2k/lqR0i5iINHtx+ItA2cGnpccBAz2URT5c=
REGISTRY_MIDDLEWARE_REPOSITORY_OPENSHIFT_ENFORCEQUOTA: false
REGISTRY_HTTP_TLS_KEY: /etc/secrets/registry.key
REGISTRY_HTTP_TLS_CERTIFICATE: /etc/secrets/registry.crt
REGISTRY_STORAGE: azure
REGISTRY_STORAGE_AZURE_ACCOUNTKEY: DUo2VfsnPwGl+4yEmye0iSQuHVrPCVmj7D+oIsYVlmaNJXS4YkZoXODvOfx3luLL6qb4j+1YhV8Nr/slKE9+IQ==
REGISTRY_STORAGE_AZURE_ACCOUNTNAME: sareg<resourcegroup>
REGISTRY_STORAGE_AZURE_CONTAINER: registry
... [OUTPUT ABBREVIATED] ...
To see if the docker images are being stored in the Azure Blob Storage
properly, save the REGISTRY_STORAGE_AZURE_ACCOUNTKEY
value from the command output before and perform the following command on the host you installed the Azure CLI
Node.js package:
$ azure storage blob list registry --account-name=sareg<resourcegroup> --account-key=<account_key>
info: Executing command storage blob list
+ Getting blobs in container registry
data: Name Blob Type Length Content Type Last Modified Snapshot Time
data: ---------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------- -------- ------------------------ ----------------------------- -------------
data: /docker/registry/v2/blobs/sha256/31/313a6203b84e37d24fe7e43185f9c8b12b727574a1bc98bf464faf78dc8e9689/data AppendBlob 9624 application/octet-stream Tue, 23 May 2017 15:44:24 GMT
data: /docker/registry/v2/blobs/sha256/4c/4c1fa39c5cda68c387cfc7dd32207af1a25b2413c266c464580001c97939cce0/data AppendBlob 43515975 application/octet-stream Tue, 23 May 2017 15:43:45 GMT
... [OUTPUT ABBREVIATED] ...
info: storage blob list command OK
4.9.4. Explore Docker Storage
This section will explore the Docker storage on an infrastructure node.
The example below can be performed on any node but for this example the infrastructure node (infranode1) is used.
The output below verifies docker storage is using the devicemapper
driver in the Storage Driver
section and using the proper LVM VolumeGroup
:
$ ssh <user>@<resourcegroup>b.<region>.cloudapp.azure.com $ ssh <user>@infranode1 $ sudo -i # docker info Containers: 2 Running: 2 Paused: 0 Stopped: 0 Images: 4 Server Version: 1.10.3 Storage Driver: devicemapper Pool Name: docker--vol-docker--pool Pool Blocksize: 524.3 kB Base Device Size: 3.221 GB Backing Filesystem: xfs Data file: Metadata file: Data Space Used: 1.221 GB Data Space Total: 25.5 GB Data Space Available: 24.28 GB Metadata Space Used: 307.2 kB Metadata Space Total: 29.36 MB Metadata Space Available: 29.05 MB Udev Sync Supported: true Deferred Removal Enabled: true Deferred Deletion Enabled: true Deferred Deleted Device Count: 0 Library Version: 1.02.107-RHEL7 (2016-06-09) Execution Driver: native-0.2 Logging Driver: json-file Plugins: Volume: local Network: bridge null host Authorization: rhel-push-plugin Kernel Version: 3.10.0-327.10.1.el7.x86_64 Operating System: Employee SKU OSType: linux Architecture: x86_64 Number of Docker Hooks: 2 CPUs: 2 Total Memory: 7.389 GiB Name: ip-10-20-3-46.azure.internal ID: XDCD:7NAA:N2S5:AMYW:EF33:P2WM:NF5M:XOLN:JHAD:SIHC:IZXP:MOT3 WARNING: bridge-nf-call-iptables is disabled WARNING: bridge-nf-call-ip6tables is disabled Registries: registry.access.redhat.com (secure), docker.io (secure) # vgs VG #PV #LV #SN Attr VSize VFree docker-vg 1 1 0 wz--n- 128,00g 76,80g
If it was in loopback as Storage Mode, the output would list the loopback file. As the below output does not contain the word loopback, the docker daemon is working in the optimal way.
For more information about the docker storage requirements, check Configuring docker storage documentation
4.9.5. Explore the Microsoft Azure Load Balancers
As mentioned earlier in the document two Azure Load Balancers
have been created. The purpose of this section is to encourage exploration of the load balancers that were created.
Perform the following steps from the Azure web console
.
On the main Microsoft Azure dashboard, click on [Resource Groups] icon. Then select the resource group that corresponds with the OpenShift Container Platform deployment, and then find the [Load Balancers] within the resource group. Select the AppLB
load balancer and on the [Description] page note the [Port Configuration] and how it is configured. That is for the OpenShift Container Platform application traffic. There should be three master instances running with a [Status] of Ok
. Next check the [Health Check] tab and the options that were configured. Further details of the configuration can be viewed by exploring the ARM
templates to see exactly what was configured.
4.9.6. Explore the Microsoft Azure Resource Group
As mentioned earlier in the document an Azure Resource Group
was created. The purpose of this section is to encourage exploration of the resource group
that was created.
Perform the following steps from the Azure web console
.
On the main Microsoft Azure console, click on [Resource Group]. Next on the left hand navigation panel select the [Your Resource Groups]. Select the Resource Group
recently created and explore the [Summary] tabs. Next, on the right hand navigation panel, explore the [Virtual Machines], [Storage Accounts], [Load Balancers], and [Networks] tabs More detail can be looked at with the configuration by exploring the ansible playbooks and ARM
json files to see exactly what was configured.
4.10. Testing Failure
In this section, reactions to failure are explored. After a successful install and some of the smoke tests noted above have been completed, failure testing is executed.
4.10.1. Generate a Master Outage
Perform the following steps from the Azure web console
and the OpenShift public URL.
Log into the Microsoft Azure console. On the dashboard, click on the [Resource Group] web service and then click [Overview]. Locate the running master2 instance, select it, right click and change the state to stopped
.
Ensure the console can still be accessed by opening a browser and accessing https://resourcegroupname.region.cloudapp.azure.com. At this point, the cluster is in a degraded state because only 2/3 master nodes are running, but complete functionality remains.
4.10.2. Observe the Behavior of etcd with a Failed Master Node
SSH
into the first master node (master1) from the bastion. Using the output of the command hostname
issue the etcdctl
command to confirm that the cluster is healthy.
$ ssh <user>@<resourcegroup>b.<region>.cloudapp.azure.com $ ssh <user>@master1 $ sudo -i # etcdctl --endpoints https://master1:2379,https://master2:2379,https://master3:2379 --ca-file /etc/etcd/ca.crt --cert-file=/etc/origin/master/master.etcd-client.crt --key-file=/etc/origin/master/master.etcd-client.key cluster-health failed to check the health of member 82c895b7b0de4330 on https://10.20.2.251:2379: Get https://10.20.1.251:2379/health: dial tcp 10.20.1.251:2379: i/o timeout member 82c895b7b0de4330 is unreachable: [https://10.20.1.251:2379] are all unreachable member c8e7ac98bb93fe8c is healthy: got healthy result from https://10.20.3.74:2379 member f7bbfc4285f239ba is healthy: got healthy result from https://10.20.1.106:2379 cluster is healthy
Notice how one member of the etcd
cluster is now unreachable. Restart master2 by following the same steps in the Azure web console
as noted above.
4.10.3. Generate an Infrastructure Node outage
This section shows what to expect when an infrastructure node fails or is brought down intentionally.
4.10.3.1. Confirm Application Accessibility
Perform the following steps from the browser on a local workstation.
Before bringing down an infrastructure node, check behavior and ensure things are working as expected. The goal of testing an infrastructure node outage is to see how the OpenShift Container Platform routers and registries behave. Confirm the simple application deployed from before is still functional. If it is not, deploy a new version. Access the application to confirm connectivity. As a reminder, to find the required information to ensure the application is still running, list the projects, change to the project that the application is deployed in, get the status of the application which including the URL and access the application via that URL.
$ oc get projects NAME DISPLAY NAME STATUS openshift Active openshift-infra Active ttester Active test-app1 Active default Active management-infra Active $ oc project test-app1 Now using project "test-app1" on server "https://resourcegroupname.region.cloudapp.azure.com". $ oc status In project test-app1 on server https://resourcegroupname.region.cloudapp.azure.com http://test-app1.apps.13.93.162.100.nip.io to pod port 8080-tcp (svc/php-prod) dc/php-prod deploys istag/php-prod:latest <- bc/php-prod builds https://github.com/openshift/cakephp-ex.git with openshift/php:5.6 deployment #1 deployed 27 minutes ago - 1 pod
Open a browser and ensure the application is still accessible.
4.10.3.2. Confirm Registry Functionality
This section is another step to take before initiating the outage of the infrastructure node to ensure that the registry is functioning properly. The goal is to push a image to the OpenShift Container Platform registry.
Perform the following steps from a CLI on a local workstation and ensure that the oc
client has been configured as explained before.
In order to be able to push images to the registry, the docker configuration on the workstation will be modified to trust the docker registry certificate.
Get the name of the docker-registry pod:
$ oc get pods -n default | grep docker-registry
docker-registry-4-9r033 1/1 Running 0 2h
Get the registry certificate and save it:
$ oc exec docker-registry-4-9r033 cat /etc/secrets/registry.crt >> /tmp/my-docker-registry-certificate.crt
Capture the registry route:
$ oc get route docker-registry -n default
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
docker-registry docker-registry-default.13.64.245.134.nip.io docker-registry <all> passthrough None
Create the proper directory in /etc/docker/certs.d/
for the registry:
$ sudo mkdir -p /etc/docker/certs.d/docker-registry-default.13.64.245.134.nip.io
Move the certificate to the directory previously created and restart the docker service in the workstation
$ sudo mv /tmp/my-docker-registry-certificate.crt /etc/docker/certs.d/docker-registry-default.13.64.245.134.nip.io/ca.crt $ sudo systemctl restart docker
A token is needed so that the Docker registry can be logged into.
$ oc whoami -t
feAeAgL139uFFF_72bcJlboTv7gi_bo373kf1byaAT8
Pull a new docker image for the purposes of test pushing.
$ docker pull fedora/apache $ docker images | grep fedora/apache docker.io/fedora/apache latest c786010769a8 3 months ago 396.4 MB
Tag the docker image with the registry hostname
$ docker tag docker.io/fedora/apache docker-registry-default.13.64.245.134.nip.io/openshift/prodapache
Check the images and ensure the newly tagged image is available.
$ docker images | grep openshift/prodapache
docker-registry-default.13.64.245.134.nip.io/openshift/prodapache latest c786010769a8 3 months ago 396.4 MB
Issue a Docker login.
$ docker login -u $(oc whoami) -e <email> -p $(oc whoami -t) docker-registry-default.13.64.245.134.nip.io
Login Succeeded
The email doesn’t need to be valid and it will be deprecated in next versions of the docker cli
Push the image to the OpenShift Container Platform registry:
$ docker push docker-registry-default.13.64.245.134.nip.io/openshift/prodapache
The push refers to a repository [docker-registry-default.13.64.245.134.nip.io/openshift/prodapache]
3a85ee80fd6c: Pushed
5b0548b012ca: Pushed
a89856341b3d: Pushed
a839f63448f5: Pushed
e4f86288aaf7: Pushed
latest: digest: sha256:e2a15a809ce2fe1a692b2728bd07f58fbf06429a79143b96b5f3e3ba0d1ce6b5 size: 7536
4.10.3.3. Get Location of Registry
Perform the following steps from the CLI of a local workstation.
Change to the default OpenShift Container Platform project and check the registry pod location
$ oc get pods -o wide -n default
NAME READY STATUS RESTARTS AGE IP NODE
docker-registry-4-9r033 1/1 Running 0 2h 10.128.6.5 infranode3
registry-console-1-zwzsl 1/1 Running 0 5d 10.131.4.2 infranode2
router-1-09x4g 1/1 Running 0 5d 10.0.2.5 infranode2
router-1-6135c 1/1 Running 0 5d 10.0.2.4 infranode1
router-1-l2562 1/1 Running 0 5d 10.0.2.6 infranode3
4.10.3.4. Initiate the Failure and Confirm Functionality
Perform the following steps from the Azure web console
and a browser.
Log into the Azure web console
. On the dashboard, click on the [Resource Group]. Locate the running instance where the registry pod is running (infranode3 in the previous example), select it, right click and change the state to stopped
. Wait a minute or two for the registry pod to be migrate over to a different infranode. Check the registry location and confirm that it moved to a different infranode:
$ oc get pods -o wide -n default | grep docker-registry
docker-registry-4-kd40f 1/1 Running 0 1m 10.130.4.3 infranode1
Follow the procedures above to ensure a Docker image can still be pushed to the registry now that infranode3 is down.
4.11. Metrics exploration
Red Hat OpenShift Container Platform metrics components enable additional features in the Red Hat OpenShift Container Platform web interface. If the environment has been deployed choosing to deploy metrics, there will be a new tab in the pod section named "Metrics" where it shows usage data of CPU, memory and network resources for a period of time:

If metrics don’t show, check if the hawkular certificate has been trusted. Visit the metrics route using the browser and accept the self signed certificate warning and refresh the metrics tab to check if metrics are shown. Future revisions of this reference architecture document will include how to create proper certificates to avoid trusting self signed certificates.
Using the CLI, the cluster-admin can observe the usage of the pods and nodes using the following commands as well:
$ oc adm top pod --heapster-namespace="openshift-infra" --heapster-scheme="https" --all-namespaces NAMESPACE NAME CPU(cores) MEMORY(bytes) openshift-infra hawkular-cassandra-1-h9mrq 161m 1423Mi logging logging-fluentd-g5jqw 8m 92Mi logging logging-es-ops-b44n3gav-1-zkl3r 19m 861Mi ... [OUTPUT ABBREVIATED] ... $ oc adm top node --heapster-namespace="openshift-infra" --heapster-scheme="https" NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% infranode3 372m 9% 4657Mi 33% master3 68m 1% 1923Mi 13% node02 43m 1% 1437Mi 5% ... [OUTPUT ABBREVIATED] ...
4.11.1. Using the Horizontal Pod Autoscaler
In order to be able to use the HorizontalPodAutoscaler
feature, the metrics components should be deployed and limits should be configured for the pod in order to set the target percentage when the pod will be scaled.
The following commands shows how to create a new project, deploy an example pod and set some limits:
$ oc new-project autoscaletest Now using project "autoscaletest" on server "https://myocp.eastus2.cloudapp.azure.com:8443". ... [OUTPUT ABBREVIATED] ... $ oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git --> Found Docker image d9c9735 (10 days old) from Docker Hub for "centos/ruby-22-centos7" ... [OUTPUT ABBREVIATED] ... $ oc patch dc/ruby-ex -p \'{"spec":{"template":{"spec":{"containers":[{"name":"ruby-ex","resources":{"limits":{"cpu":"80m"}}}]}}}}' "ruby-ex" patched $ oc get pods NAME READY STATUS RESTARTS AGE ruby-ex-1-210l9 1/1 Running 0 2m ruby-ex-1-build 0/1 Completed 0 4m $ oc describe pod ruby-ex-1-210l9 Name: ruby-ex-1-210l9 ... [OUTPUT ABBREVIATED] ... Limits: cpu: 80m Requests: cpu: 80m
Once the pod is running, create the autoscaler:
$ oc autoscale dc/ruby-ex --min 1 --max 10 --cpu-percent=50 deploymentconfig "ruby-ex" autoscaled $ oc get horizontalpodautoscaler NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE ruby-ex DeploymentConfig/ruby-ex 50% 0% 1 10 53s
Access the pod and create some CPU load, as:
$ oc rsh ruby-ex-1-210l9 sh-4.2$ while true; do echo "cpu hog" >> mytempfile; rm -f mytempfile; done
Observe the events and the pods running and after a while a new replica will be created:
$ oc get events -w LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE 2017-07-13 13:28:35 +0000 UTC 2017-07-13 13:26:30 +0000 UTC 7 ruby-ex HorizontalPodAutoscaler Normal DesiredReplicasComputed {horizontal-pod-autoscaler } Computed the desired num of replicas: 0 (avgCPUutil: 0, current replicas: 1) 2017-07-13 13:29:05 +0000 UTC 2017-07-13 13:29:05 +0000 UTC 1 ruby-ex HorizontalPodAutoscaler Normal DesiredReplicasComputed {horizontal-pod-autoscaler } Computed the desired num of replicas: 2 (avgCPUutil: 67, current replicas: 1) 2017-07-13 13:29:05 +0000 UTC 2017-07-13 13:29:05 +0000 UTC 1 ruby-ex DeploymentConfig Normal ReplicationControllerScaled {deploymentconfig-controller } Scaled replication controller "ruby-ex-1" from 1 to 2 2017-07-13 13:29:05 +0000 UTC 2017-07-13 13:29:05 +0000 UTC 1 ruby-ex HorizontalPodAutoscaler Normal SuccessfulRescale {horizontal-pod-autoscaler } New size: 2; reason: CPU utilization above target 2017-07-13 13:29:05 +0000 UTC 2017-07-13 13:29:05 +0000 UTC 1 ruby-ex-1-zwmxd Pod Normal Scheduled {default-scheduler } Successfully assigned ruby-ex-1-zwmxd to node02 $ oc get pods NAME READY STATUS RESTARTS AGE ruby-ex-1-210l9 1/1 Running 0 8m ruby-ex-1-build 0/1 Completed 0 9m ruby-ex-1-zwmxd 1/1 Running 0 58s
After canceling the CPU hog command, the events will show how the deploymentconfig returns to a single replica:
$ oc get events -w
LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
2017-07-13 13:34:05 +0000 UTC 2017-07-13 13:34:05 +0000 UTC 1 ruby-ex HorizontalPodAutoscaler Normal SuccessfulRescale {horizontal-pod-autoscaler } New size: 1; reason: All metrics below target
2017-07-13 13:34:05 +0000 UTC 2017-07-13 13:34:05 +0000 UTC 1 ruby-ex DeploymentConfig Normal ReplicationControllerScaled {deploymentconfig-controller } Scaled replication controller "ruby-ex-1" from 2 to 1
2017-07-13 13:34:05 +0000 UTC 2017-07-13 13:34:05 +0000 UTC 1 ruby-ex-1 ReplicationController Normal SuccessfulDelete {replication-controller } Deleted pod: ruby-ex-1-zwmxd
4.12. Logging exploration
Red Hat OpenShift Container Platform aggregated logging components enable additional features in the Red Hat OpenShift Container Platform web interface. If the environment has been deployed choosing to deploy logging, there will be a new link in the pod logs section named "View Archive" that will redirect to the Kibana web interface for the user to see the pods logs, create queries, filters, etc.

For more information about Kibana, see Kibana documentation
In case the "opslogging" cluster has been deployed, there will be a route "kibana-ops" in the "logging" project where cluster-admin users can browse infrastructure logs.
