Chapter 7. Post-Deployment

The following subsections describe several post-deployment operations for managing the Ceph cluster.

7.1. Accessing the Overcloud

The director generates a script to configure and help authenticate interactions with your Overcloud from the director host. The director saves this file (overcloudrc) in your stack user’s home directory. Run the following command to use this file:

$ source ~/overcloudrc

This loads the necessary environment variables to interact with your Overcloud from the director host’s CLI. To return to interacting with the director’s host, run the following command:

$ source ~/stackrc

7.2. Monitoring Ceph Storage Nodes

After completing the Overcloud creation, we recommend that you check the status of the Ceph Storage Cluster to ensure it is working properly. To do this, log into a Controller node as the heat-admin user:

$ nova list
$ ssh heat-admin@192.168.0.25

Check the health of the cluster:

$ sudo ceph health

If the cluster has no issues, the command reports back HEALTH_OK. This means the cluster is safe to use.

Check the status of the Ceph Monitor quorum:

$ sudo ceph quorum_status

This shows the monitors participating in the quorum and which one is the leader.

Check if all Ceph OSDs are running:

$ ceph osd stat

For more information on monitoring Ceph Storage clusters, see Monitoring in the Red Hat Ceph Storage Administration Guide.

7.3. Rebooting the Environment

A situation might occur where you need to reboot the environment. For example, when you might need to modify the physical servers, or you might need to recover from a power outage. In this situation, it is important to make sure your Ceph Storage nodes boot correctly.

Make sure to boot the nodes in the following order:

  • Boot all Ceph Monitor nodes first - This ensures the Ceph Monitor service is active in your high availability cluster. By default, the Ceph Monitor service is installed on the Controller node. If the Ceph Monitor is separate from the Controller in a custom role, make sure this custom Ceph Monitor role is active.
  • Boot all Ceph Storage nodes - This ensures the Ceph OSD cluster can connect to the active Ceph Monitor cluster on the Controller nodes.

Use the following process to reboot the Ceph Storage nodes:

  1. Log into a Ceph MON or Controller node and disable Ceph Storage cluster rebalancing temporarily:

    $ sudo ceph osd set noout
    $ sudo ceph osd set norebalance
  2. Select the first Ceph Storage node to reboot and log into it.
  3. Reboot the node:

    $ sudo reboot
  4. Wait until the node boots.
  5. Log into the node and check the cluster status:

    $ sudo ceph -s

    Check that the pgmap reports all pgs as normal (active+clean).

  6. Log out of the node, reboot the next node, and check its status. Repeat this process until you have rebooted all Ceph storage nodes.
  7. When complete, log into a Ceph MON or Controller node and enable cluster rebalancing again:

    $ sudo ceph osd unset noout
    $ sudo ceph osd unset norebalance
  8. Perform a final status check to verify the cluster reports HEALTH_OK:

    $ sudo ceph status

If a situation occurs where all Overcloud nodes boot at the same time, the Ceph OSD services might not start correctly on the Ceph Storage nodes. In this situation, reboot the Ceph Storage OSDs so they can connect to the Ceph Monitor service.

Verify a HEALTH_OK status of the Ceph Storage node cluster with the following command:

$ sudo ceph status

7.4. Scaling Up the Ceph Cluster

You can scale up the number of Ceph Storage nodes in your overcloud by re-running the deployment with the number of Ceph Storage nodes you need.

Before doing so, ensure that you have enough nodes for the updated deployment. These nodes must be registered with the director and tagged accordingly.

Registering New Ceph Storage Nodes

To register new Ceph storage nodes with the director, follow these steps:

  1. Log into the director host as the stack user and initialize your director configuration:

    $ source ~/stackrc
  2. Define the hardware and power management details for the new nodes in a new node definition template; for example, instackenv-scale.json.
  3. Import this file to the OpenStack director:

    $ openstack overcloud node import ~/instackenv-scale.json

    Importing the node definition template registers each node defined there to the director.

  4. Assign the kernel and ramdisk images to all nodes:

    $ openstack overcloud node configure
Note

For more information about registering new nodes, see Section 2.2, “Registering Nodes”.

Manually Tagging New Nodes

After registering each node, you will need to inspect the hardware and tag the node into a specific profile. Profile tags match your nodes to flavors, and in turn the flavors are assigned to a deployment role.

To inspect and tag new nodes, follow these steps:

  1. Trigger hardware introspection to retrieve the hardware attributes of each node:

    $ openstack overcloud node introspect --all-manageable --provide
    • The --all-manageable option introspects only nodes in a managed state. In this example, it is all of them.
    • The --provide option resets all nodes to an active state after introspection.

      Important

      Make sure this process runs to completion. This process usually takes 15 minutes for bare metal nodes.

  2. Retrieve a list of your nodes to identify their UUIDs:

    $ openstack baremetal node list
  3. Add a profile option to the properties/capabilities parameter for each node to manually tag a node to a specific profile.

    For example, the following commands tag three additional nodes with the ceph-storage profile:

    $ ironic node-update 551d81f5-4df2-4e0f-93da-6c5de0b868f7 add properties/capabilities='profile:ceph-storage,boot_option:local'
    $ ironic node-update 5e735154-bd6b-42dd-9cc2-b6195c4196d7 add properties/capabilities='profile:ceph-storage,boot_option:local'
    $ ironic node-update 1a2b090c-299d-4c20-a25d-57dd21a7085b add properties/capabilities='profile:ceph-storage,boot_option:local'
Tip

If the nodes you just tagged and registered use multiple disks, you can set the director to use a specific root disk on each node. See Section 2.4, “Defining the Root Disk for Ceph Storage Nodes” for instructions on how to do so.

Re-deploying the Overcloud with Additional Ceph Storage Nodes

After registering and tagging the new nodes, you can now scale up the number of Ceph Storage nodes by re-deploying the overcloud. When you do, set the CephStorageCount parameter in the parameter_defaults of your environment file (in this case, ~/templates/storage-config.yaml). In Section 6.1, “Assigning Nodes and Flavors to Roles”, the overcloud is configured to deploy with 3 Ceph Storage nodes. To scale it up to 6 nodes instead, use:

parameter_defaults:
  ControllerCount: 3
  OvercloudControlFlavor: control
  ComputeCount: 3
  OvercloudComputeFlavor: compute
  CephStorageCount: 6
  OvercloudCephStorageFlavor: ceph-storage
  CephMonCount: 3
  OvercloudCephMonFlavor: ceph-mon

Upon re-deployment with this setting, the overcloud should now have 6 Ceph Storage nodes instead of 3.

7.5. Scaling Down and Replacing Ceph Storage Nodes

In some cases, you may need to scale down your Ceph cluster, or even replace a Ceph Storage node (for example, if a Ceph Storage node is faulty). In either situation, you need to disable and rebalance any Ceph Storage node you are removing from the Overcloud to ensure no data loss. This procedure explains the process for replacing a Ceph Storage node.

Note

This procedure uses steps from the Red Hat Ceph Storage Administration Guide to manually remove Ceph Storage nodes. For more in-depth information about manual removal of Ceph Storage nodes, see Adding and Removing OSD Nodes from the Red Hat Ceph Storage Administration Guide.

Log into either a Controller node or a Ceph Storage node as the heat-admin user. The director’s stack user has an SSH key to access the heat-admin user.

List the OSD tree and find the OSDs for your node. For example, your node to remove might contain the following OSDs:

-2 0.09998     host overcloud-cephstorage-0
0 0.04999         osd.0                         up  1.00000          1.00000
1 0.04999         osd.1                         up  1.00000          1.00000

Disable the OSDs on the Ceph Storage node. In this case, the OSD IDs are 0 and 1.

[heat-admin@overcloud-controller-0 ~]$ sudo ceph osd out 0
[heat-admin@overcloud-controller-0 ~]$ sudo ceph osd out 1

The Ceph Storage cluster begins rebalancing. Wait for this process to complete. You can follow the status using the following command:

[heat-admin@overcloud-controller-0 ~]$ sudo ceph -w

Once the Ceph cluster completes rebalancing, log into the Ceph Storage node you are removing (in this case, overcloud-cephstorage-0) as the heat-admin user and stop the node.

[heat-admin@overcloud-cephstorage-0 ~]$ sudo systemctl disable ceph-osd@0
[heat-admin@overcloud-cephstorage-0 ~]$ sudo systemctl disable ceph-osd@1

While logged into overcloud-cephstorage-0, remove it from the CRUSH map so that it no longer receives data.

[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph osd crush remove osd.0
[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph osd crush remove osd.1

Remove the OSD authentication key.

[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph auth del osd.0
[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph auth del osd.1

Remove the OSD from the cluster.

[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph osd rm 0
[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph osd rm 1

Leave the node and return to the director host as the stack user.

[heat-admin@overcloud-cephstorage-0 ~]$ exit
[stack@director ~]$

Disable the Ceph Storage node so the director does not reprovision it.

[stack@director ~]$ ironic node-list
[stack@director ~]$ ironic node-set-maintenance UUID true

Removing a Ceph Storage node requires an update to the overcloud stack in the director using the local template files. First identify the UUID of the Overcloud stack:

$ heat stack-list

Identify the UUIDs of the Ceph Storage node to delete:

$ nova list

Run the following command to delete the node from the stack and update the plan accordingly:

$ openstack overcloud node delete --stack STACK_UUID --templates -e ENVIRONMENT_FILE NODE_UUID
Important

If you passed any extra environment files when you created the overcloud, pass them again here using the -e option to avoid making undesired changes to the overcloud. For more information, see Modifying the Overcloud Environment (from Director Installation and Usage).

Wait until the stack completes its update. Monitor the stack update using the heat stack-list --show-nested command.

Add new nodes to the director’s node pool and deploy them as Ceph Storage nodes. Use the CephStorageCount parameter in the parameter_defaults of your environment file (in this case, ~/templates/storage-config.yaml) to define the total number of Ceph Storage nodes in the Overcloud. For example:

parameter_defaults:
  ControllerCount: 3
  OvercloudControlFlavor: control
  ComputeCount: 3
  OvercloudComputeFlavor: compute
  CephStorageCount: 3
  OvercloudCephStorageFlavor: ceph-storage
  CephMonCount: 3
  OvercloudCephMonFlavor: ceph-mon

See Section 6.1, “Assigning Nodes and Flavors to Roles” for details on how to define the number of nodes per role.

Upon updating your environment file, re-deploy the overcloud as normal:

$ openstack overcloud deploy --templates -e ENVIRONMENT_FILES

The director provisions the new node and updates the entire stack with the new node’s details.

Log into a Controller node as the heat-admin user and check the status of the Ceph Storage node. For example:

[heat-admin@overcloud-controller-0 ~]$ sudo ceph status

Confirm that the value in the osdmap section matches the number of desired nodes in your cluster. The Ceph Storage node you removed has now been replaced with a new node.

7.6. Adding and Removing OSD Disks from Ceph Storage Nodes

In situations when an OSD disk fails and requires a replacement, use the standard instructions from the Red Hat Ceph Storage Administration Guide: