Language:
Format:

Chapter 3. Performing maintenance on the undercloud and overcloud with Instance HA

To perform maintenance on the undercloud and overcloud, you must shut down and start up the undercloud and overcloud nodes in a specific order to ensure minimal issues when your start your overcloud. You can also perform maintenance on a specific Compute or Controller node by stopping the node and disabling the Pacemaker resources on the node.

3.1. Prerequisites

A running undercloud and overcloud with Instance HA enabled.

3.2. Undercloud and overcloud shutdown order

To shut down the Red Hat OpenStack Platform environment, you must shut down the overcloud and undercloud in the following order:

Shut down instances on overcloud Compute nodes
Shut down Compute nodes
Stop all high availability and OpenStack Platform services on Controller nodes
Shut down Ceph Storage nodes
Shut down Controller nodes
Shut down the undercloud

3.2.1. Shutting down instances on overcloud Compute nodes

As a part of shutting down the Red Hat OpenStack Platform environment, shut down all instances on Compute nodes before shutting down the Compute nodes.

Prerequisites

An overcloud with active Compute services

Procedure

Log in to the undercloud as the stack user.
Source the credentials file for your overcloud:
```
$ source ~/overcloudrc
```
View running instances in the overcloud:
```
$ openstack server list --all-projects
```
Stop each instance in the overcloud:
```
$ openstack server stop <INSTANCE>
```
Repeat this step for each instance until you stop all instances in the overcloud.

3.2.2. Stopping instance HA services on overcloud Compute nodes

As a part of shutting down the Red Hat OpenStack Platform environment, you must shut down all Instance HA services that run on Compute nodes before stopping the instances and shutting down the Compute nodes.

Prerequisites

An overcloud with active Compute services
Instance HA is enabled on Compute nodes

Procedure

Log in as the root user to an overcloud node that runs Pacemaker.
Disable the Pacemaker remote resource on each Compute node:
1. Identify the Pacemaker Remote resource on Compute nodes:
```
# pcs resource status
```
  These resources use the ocf::pacemaker:remote agent and are usually named after the Compute node host format, such as overcloud-novacomputeiha-0.
2. Disable each Pacemaker Remote resource. The following example shows how to disable the resource for overcloud-novacomputeiha-0:
```
# pcs resource disable overcloud-novacomputeiha-0
```
Disable the Compute node STONITH devices:
1. Identify the Compute node STONITH devices:
```
# pcs stonith status
```
2. Disable each Compute node STONITH device:
```
# pcs stonith disable <STONITH_DEVICE>
```

3.2.3. Shutting down Compute nodes

As a part of shutting down the Red Hat OpenStack Platform environment, log in to and shut down each Compute node.

Prerequisites

Shut down all instances on the Compute nodes

Procedure

Log in as the root user to a Compute node.
Shut down the node:
```
# shutdown -h now
```
Perform these steps for each Compute node until you shut down all Compute nodes.

3.2.4. Stopping services on Controller nodes

As a part of shutting down the Red Hat OpenStack Platform environment, stop services on the Controller nodes before shutting down the nodes. This includes Pacemaker and systemd services.

Prerequisites

An overcloud with active Pacemaker services

Procedure

Log in as the root user to a Controller node.
Stop the Pacemaker cluster.
```
# pcs cluster stop --all
```
This command stops the cluster on all nodes.
Wait until the Pacemaker services stop and check that the services stopped.
1. Check the Pacemaker status:
```
# pcs status
```
2. Check that no Pacemaker services are running in Podman:
```
# podman ps --filter "name=.*-bundle.*"
```
Stop the Red Hat OpenStack Platform services:
```
# systemctl stop 'tripleo_*'
```
Wait until the services stop and check that services are no longer running in Podman:
```
# podman ps
```

3.2.5. Shutting down Ceph Storage nodes

As a part of shutting down the Red Hat OpenStack Platform environment, disable Ceph Storage services then log in to and shut down each Ceph Storage node.

Prerequisites

A healthy Ceph Storage cluster
Ceph MON services are running on standalone Ceph MON nodes or on Controller nodes

Procedure

Log in as the root user to a node that runs Ceph MON services, such as a Controller node or a standalone Ceph MON node.
Check the health of the cluster. In the following example, the podman command runs a status check within a Ceph MON container on a Controller node:
```
# sudo podman exec -it ceph-mon-controller-0 ceph status
```
Ensure that the status is HEALTH_OK.

Set the noout, norecover, norebalance, nobackfill, nodown, and pause flags for the cluster. In the following example, the podman commands set these flags through a Ceph MON container on a Controller node:

# sudo podman exec -it ceph-mon-controller-0 ceph osd set noout
# sudo podman exec -it ceph-mon-controller-0 ceph osd set norecover
# sudo podman exec -it ceph-mon-controller-0 ceph osd set norebalance
# sudo podman exec -it ceph-mon-controller-0 ceph osd set nobackfill
# sudo podman exec -it ceph-mon-controller-0 ceph osd set nodown
# sudo podman exec -it ceph-mon-controller-0 ceph osd set pause

Shut down each Ceph Storage node:
1. Log in as the root user to a Ceph Storage node.
2. Shut down the node:
```
# shutdown -h now
```
3. Perform these steps for each Ceph Storage node until you shut down all Ceph Storage nodes.
Shut down any standalone Ceph MON nodes:
1. Log in as the root user to a standalone Ceph MON node.
2. Shut down the node:
```
# shutdown -h now
```
3. Perform these steps for each standalone Ceph MON node until you shut down all standalone Ceph MON nodes.

Additional resources

"What is the procedure to shutdown and bring up the entire ceph cluster?"

3.2.6. Shutting down Controller nodes

As a part of shutting down the Red Hat OpenStack Platform environment, log in to and shut down each Controller node.

Prerequisites

Stop the Pacemaker cluster
Stop all Red Hat OpenStack Platform services on the Controller nodes

Procedure

Log in as the root user to a Controller node.
Shut down the node:
```
# shutdown -h now
```
Perform these steps for each Controller node until you shut down all Controller nodes.

3.2.7. Shutting down the undercloud

As a part of shutting down the Red Hat OpenStack Platform environment, log in to the undercloud node and shut down the undercloud.

Prerequisites

A running undercloud

Procedure

Log in to the undercloud as the stack user.
Shut down the undercloud:
```
$ sudo shutdown -h now
```

3.3. Performing system maintenance

After you completely shut down the undercloud and overcloud, perform any maintenance to the systems in your environment and then start up the undercloud and overcloud.

3.4. Undercloud and overcloud startup order

To start the Red Hat OpenStack Platform environment, you must start the undercloud and overcloud in the following order:

Start the undercloud
Start Controller nodes
Start Ceph Storage nodes
Start Compute nodes
Start instances on overcloud Compute nodes

3.4.1. Starting the undercloud

As a part of starting the Red Hat OpenStack Platform environment, power on the undercloud node, log in to the undercloud, and check the undercloud services.

Prerequisites

The undercloud is powered down.

Procedure

Power on the undercloud and wait until the undercloud boots.

Verification

Log in to the undercloud host as the stack user.
Source the stackrc undercloud credentials file:
```
$ source ~/stackrc
```
Check the services on the undercloud:
```
$ systemctl list-units 'tripleo_*'
```

Create and validate a static inventory file named inventory.yaml:

$ tripleo-ansible-inventory --static-yaml-inventory inventory.yaml
$ openstack tripleo validator run --group pre-introspection \
 -i inventory.yaml

Check that all services and containers are active and healthy:

$ openstack tripleo validator run --validation service-status \
 --limit undercloud -i inventory.yaml

Additional resources

Using the validation framework

3.4.2. Starting Controller nodes

As a part of starting the Red Hat OpenStack Platform environment, power on each Controller node and check the non-Pacemaker services on the node.

Prerequisites

Powered down Controller nodes

Procedure

Power on each Controller node.

Verification

Log in to each Controller node as the root user.
Check the services on the Controller node:
```
$ systemctl -t service
```
Only non-Pacemaker based services are running.
Wait until the Pacemaker services start and check that the services started:
```
$ pcs status
```
Note
If your environment uses Instance HA, the Pacemaker resources do not start until you start the Compute nodes or perform a manual unfence operation with the pcs stonith confirm <compute_node> command. You must run this command on each Compute node that uses Instance HA.

3.4.3. Starting Ceph Storage nodes

As a part of starting the Red Hat OpenStack Platform environment, power on the Ceph MON and Ceph Storage nodes and enable Ceph Storage services.

Prerequisites

A powered down Ceph Storage cluster
Ceph MON services are enabled on powered down standalone Ceph MON nodes or on powered on Controller nodes

Procedure

If your environment has standalone Ceph MON nodes, power on each Ceph MON node.
Power on each Ceph Storage node.
Log in as the root user to a node that runs Ceph MON services, such as a Controller node or a standalone Ceph MON node.
Check the status of the cluster nodes. In the following example, the podman command runs a status check within a Ceph MON container on a Controller node:
```
# sudo podman exec -it ceph-mon-controller-0 ceph status
```
Ensure that each node is powered on and connected.

Unset the noout, norecover, norebalance, nobackfill, nodown and pause flags for the cluster. In the following example, the podman commands unset these flags through a Ceph MON container on a Controller node:

# sudo podman exec -it ceph-mon-controller-0 ceph osd unset noout
# sudo podman exec -it ceph-mon-controller-0 ceph osd unset norecover
# sudo podman exec -it ceph-mon-controller-0 ceph osd unset norebalance
# sudo podman exec -it ceph-mon-controller-0 ceph osd unset nobackfill
# sudo podman exec -it ceph-mon-controller-0 ceph osd unset nodown
# sudo podman exec -it ceph-mon-controller-0 ceph osd unset pause

Verification

Check the health of the cluster. In the following example, the podman command runs a status check within a Ceph MON container on a Controller node:
```
# sudo podman exec -it ceph-mon-controller-0 ceph status
```
Ensure the status is HEALTH_OK.

Additional resources

"What is the procedure to shutdown and bring up the entire ceph cluster?"

3.4.4. Starting Compute nodes

As a part of starting the Red Hat OpenStack Platform environment, power on each Compute node and check the services on the node.

Prerequisites

Powered down Compute nodes

Procedure

Power on each Compute node.

Verification

Log in to each Compute as the root user.
Check the services on the Compute node:
```
$ systemctl -t service
```

3.4.5. Starting instance HA services on overcloud Compute nodes

As a part of starting the Red Hat OpenStack Platform environment, start all Instance HA services on the Compute nodes.

Prerequisites

An overcloud with running Compute nodes
Instance HA is enabled on Compute nodes

Procedure

Log in as the root user to an overcloud node that runs Pacemaker.
Enable the STONITH device for a Compute node:
1. Identify the Compute node STONITH device:
```
# pcs stonith status
```
2. Clear any STONITH errors for the Compute node:
```
# pcs stonith confirm <COMPUTE_NODE>
```
  This command returns the node to a clean STONITH state.
3. Enable the Compute node STONITH device:
```
# pcs stonith enable <STONITH_DEVICE>
```
4. Perform these steps for each Compute node with STONITH.
Enable the Pacemaker remote resource on each Compute node:
1. Identify the Pacemaker remote resources on Compute nodes:
```
# pcs resource status
```
  These resources use the ocf::pacemaker:remote agent and are usually named after the Compute node host format, such as overcloud-novacomputeiha-0.
2. Enable each Pacemaker Remote resource. The following example shows how to enable the resource for overcloud-novacomputeiha-0:
```
# pcs resource enable overcloud-novacomputeiha-0
```
3. Perform these steps for each Compute node with Pacemaker remote management.
Wait until the Pacemaker services start and check that the services started:
```
# pcs status
```
If any Pacemaker resources fail to start during the startup process, reset the status and the fail count of the resource:
```
# pcs resource cleanup
```
Note
Some services might require more time to start, such as fence_compute and fence_kdump.

3.4.6. Starting instances on overcloud Compute nodes

As a part of starting the Red Hat OpenStack Platform environment, start the instances on on Compute nodes.

Prerequisites

An active overcloud with active nodes

Procedure

Log in to the undercloud as the stack user.
Source the credentials file for your overcloud:
```
$ source ~/overcloudrc
```
View running instances in the overcloud:
```
$ openstack server list --all-projects
```
Start an instance in the overcloud:
```
$ openstack server start <INSTANCE>
```

Select Your Language

Chapter 3. Performing maintenance on the undercloud and overcloud with Instance HA

3.1. Prerequisites

3.2. Undercloud and overcloud shutdown order

3.2.1. Shutting down instances on overcloud Compute nodes

3.2.2. Stopping instance HA services on overcloud Compute nodes

3.2.3. Shutting down Compute nodes

3.2.4. Stopping services on Controller nodes

3.2.5. Shutting down Ceph Storage nodes

3.2.6. Shutting down Controller nodes

3.2.7. Shutting down the undercloud

3.3. Performing system maintenance

3.4. Undercloud and overcloud startup order

3.4.1. Starting the undercloud

3.4.2. Starting Controller nodes

3.4.3. Starting Ceph Storage nodes

3.4.4. Starting Compute nodes

3.4.5. Starting instance HA services on overcloud Compute nodes

3.4.6. Starting instances on overcloud Compute nodes

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Language and Page Formatting Options

Chapter 3. Performing maintenance on the undercloud and overcloud with Instance HA

3.1. Prerequisites

3.2. Undercloud and overcloud shutdown order

3.2.1. Shutting down instances on overcloud Compute nodes

3.2.2. Stopping instance HA services on overcloud Compute nodes

3.2.3. Shutting down Compute nodes

3.2.4. Stopping services on Controller nodes

3.2.5. Shutting down Ceph Storage nodes

3.2.6. Shutting down Controller nodes

3.2.7. Shutting down the undercloud

3.3. Performing system maintenance

3.4. Undercloud and overcloud startup order

3.4.1. Starting the undercloud

3.4.2. Starting Controller nodes

3.4.3. Starting Ceph Storage nodes

3.4.4. Starting Compute nodes

3.4.5. Starting instance HA services on overcloud Compute nodes

3.4.6. Starting instances on overcloud Compute nodes

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links