Chapter 9. Scaling the Ceph Cluster
9.1. Scaling Up the Ceph Cluster
You can scale up the number of Ceph Storage nodes in your overcloud by re-running the deployment with the number of Ceph Storage nodes you need.
Before doing so, ensure that you have enough nodes for the updated deployment. These nodes must be registered with the director and tagged accordingly.
Registering New Ceph Storage Nodes
To register new Ceph storage nodes with the director, follow these steps:
Log into the director host as the
stackuser and initialize your director configuration:$ source ~/stackrc
-
Define the hardware and power management details for the new nodes in a new node definition template; for example,
instackenv-scale.json. Import this file to the OpenStack director:
$ openstack overcloud node import ~/instackenv-scale.json
Importing the node definition template registers each node defined there to the director.
Assign the kernel and ramdisk images to all nodes:
$ openstack overcloud node configure
For more information about registering new nodes, see Section 2.2, “Registering Nodes”.
Manually Tagging New Nodes
After registering each node, you will need to inspect the hardware and tag the node into a specific profile. Profile tags match your nodes to flavors, and in turn the flavors are assigned to a deployment role.
To inspect and tag new nodes, follow these steps:
Trigger hardware introspection to retrieve the hardware attributes of each node:
$ openstack overcloud node introspect --all-manageable --provide
-
The
--all-manageableoption introspects only nodes in a managed state. In this example, it is all of them. The
--provideoption resets all nodes to anactivestate after introspection.ImportantMake sure this process runs to completion. This process usually takes 15 minutes for bare metal nodes.
-
The
Retrieve a list of your nodes to identify their UUIDs:
$ openstack baremetal node list
Add a profile option to the
properties/capabilitiesparameter for each node to manually tag a node to a specific profile.For example, the following commands tag three additional nodes with the
ceph-storageprofile:$ ironic node-update 551d81f5-4df2-4e0f-93da-6c5de0b868f7 add properties/capabilities='profile:ceph-storage,boot_option:local' $ ironic node-update 5e735154-bd6b-42dd-9cc2-b6195c4196d7 add properties/capabilities='profile:ceph-storage,boot_option:local' $ ironic node-update 1a2b090c-299d-4c20-a25d-57dd21a7085b add properties/capabilities='profile:ceph-storage,boot_option:local'
If the nodes you just tagged and registered use multiple disks, you can set the director to use a specific root disk on each node. See Section 2.4, “Defining the Root Disk for Ceph Storage Nodes” for instructions on how to do so.
Re-deploying the Overcloud with Additional Ceph Storage Nodes
After registering and tagging the new nodes, you can now scale up the number of Ceph Storage nodes by re-deploying the overcloud. When you do, set the CephStorageCount parameter in the parameter_defaults of your environment file (in this case, ~/templates/storage-config.yaml). In Section 6.1, “Assigning Nodes and Flavors to Roles”, the overcloud is configured to deploy with 3 Ceph Storage nodes. To scale it up to 6 nodes instead, use:
parameter_defaults:
ControllerCount: 3
OvercloudControlFlavor: control
ComputeCount: 3
OvercloudComputeFlavor: compute
CephStorageCount: 6
OvercloudCephStorageFlavor: ceph-storage
CephMonCount: 3
OvercloudCephMonFlavor: ceph-monUpon re-deployment with this setting, the overcloud should now have 6 Ceph Storage nodes instead of 3.
9.2. Scaling Down and Replacing Ceph Storage Nodes
In some cases, you may need to scale down your Ceph cluster, or even replace a Ceph Storage node (for example, if a Ceph Storage node is faulty). In either situation, you need to disable and rebalance any Ceph Storage node you are removing from the Overcloud to ensure no data loss. This procedure explains the process for replacing a Ceph Storage node.
This procedure uses steps from the Red Hat Ceph Storage Administration Guide to manually remove Ceph Storage nodes. For more in-depth information about manual removal of Ceph Storage nodes, see Adding and Removing OSD Nodes from the Red Hat Ceph Storage Administration Guide.
Log into either a Controller node or a Ceph Storage node as the heat-admin user. The director’s stack user has an SSH key to access the heat-admin user.
List the OSD tree and find the OSDs for your node. For example, your node to remove might contain the following OSDs:
-2 0.09998 host overcloud-cephstorage-0 0 0.04999 osd.0 up 1.00000 1.00000 1 0.04999 osd.1 up 1.00000 1.00000
Disable the OSDs on the Ceph Storage node. In this case, the OSD IDs are 0 and 1.
[heat-admin@overcloud-controller-0 ~]$ sudo ceph osd out 0 [heat-admin@overcloud-controller-0 ~]$ sudo ceph osd out 1
The Ceph Storage cluster begins rebalancing. Wait for this process to complete. You can follow the status using the following command:
[heat-admin@overcloud-controller-0 ~]$ sudo ceph -w
Once the Ceph cluster completes rebalancing, log into the Ceph Storage node you are removing (in this case, overcloud-cephstorage-0) as the heat-admin user and stop the node.
[heat-admin@overcloud-cephstorage-0 ~]$ sudo systemctl disable ceph-osd@0 [heat-admin@overcloud-cephstorage-0 ~]$ sudo systemctl disable ceph-osd@1
While logged into overcloud-cephstorage-0, remove it from the CRUSH map so that it no longer receives data.
[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph osd crush remove osd.0 [heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph osd crush remove osd.1
Remove the OSD authentication key.
[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph auth del osd.0 [heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph auth del osd.1
Remove the OSD from the cluster.
[heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph osd rm 0 [heat-admin@overcloud-cephstorage-0 ~]$ sudo ceph osd rm 1
Leave the node and return to the director host as the stack user.
[heat-admin@overcloud-cephstorage-0 ~]$ exit [stack@director ~]$
Disable the Ceph Storage node so the director does not reprovision it.
[stack@director ~]$ ironic node-list
[stack@director ~]$ ironic node-set-maintenance UUID true
Removing a Ceph Storage node requires an update to the overcloud stack in the director using the local template files. First identify the UUID of the Overcloud stack:
$ heat stack-list
Identify the UUIDs of the Ceph Storage node to delete:
$ nova list
Run the following command to delete the node from the stack and update the plan accordingly:
$ openstack overcloud node delete --stack STACK_UUID --templates -e ENVIRONMENT_FILE NODE_UUID
If you passed any extra environment files when you created the overcloud, pass them again here using the -e option to avoid making undesired changes to the overcloud. For more information, see Modifying the Overcloud Environment (from Director Installation and Usage).
Wait until the stack completes its update. Monitor the stack update using the heat stack-list --show-nested command.
Add new nodes to the director’s node pool and deploy them as Ceph Storage nodes. Use the CephStorageCount parameter in the parameter_defaults of your environment file (in this case, ~/templates/storage-config.yaml) to define the total number of Ceph Storage nodes in the Overcloud. For example:
parameter_defaults:
ControllerCount: 3
OvercloudControlFlavor: control
ComputeCount: 3
OvercloudComputeFlavor: compute
CephStorageCount: 3
OvercloudCephStorageFlavor: ceph-storage
CephMonCount: 3
OvercloudCephMonFlavor: ceph-monSee Section 6.1, “Assigning Nodes and Flavors to Roles” for details on how to define the number of nodes per role.
Upon updating your environment file, re-deploy the overcloud as normal:
$ openstack overcloud deploy --templates -e ENVIRONMENT_FILESThe director provisions the new node and updates the entire stack with the new node’s details.
Log into a Controller node as the heat-admin user and check the status of the Ceph Storage node. For example:
[heat-admin@overcloud-controller-0 ~]$ sudo ceph status
Confirm that the value in the osdmap section matches the number of desired nodes in your cluster. The Ceph Storage node you removed has now been replaced with a new node.
9.3. Adding and Removing OSD Disks from Ceph Storage Nodes
In situations when an OSD disk fails and requires a replacement, use the standard instructions from the Red Hat Ceph Storage Administration Guide:
