Red Hat Training
A Red Hat training course is available for Red Hat OpenStack Platform
Chapter 10. Scaling the Ceph Cluster
10.1. Scaling Up the Ceph Cluster
You can scale up the number of Ceph Storage nodes in your overcloud by re-running the deployment with the number of Ceph Storage nodes you need.
Before doing so, ensure that you have enough nodes for the updated deployment. These nodes must be registered with the director and tagged accordingly.
Registering New Ceph Storage Nodes
To register new Ceph storage nodes with the director, follow these steps:
Log into the director host as the
stack
user and initialize your director configuration:$ source ~/stackrc
-
Define the hardware and power management details for the new nodes in a new node definition template; for example,
instackenv-scale.json
. Import this file to the OpenStack director:
$ openstack overcloud node import ~/instackenv-scale.json
Importing the node definition template registers each node defined there to the director.
Assign the kernel and ramdisk images to all nodes:
$ openstack overcloud node configure
For more information about registering new nodes, see Section 2.2, “Registering nodes”.
Manually Tagging New Nodes
After registering each node, you will need to inspect the hardware and tag the node into a specific profile. Profile tags match your nodes to flavors, and in turn the flavors are assigned to a deployment role.
To inspect and tag new nodes, follow these steps:
Trigger hardware introspection to retrieve the hardware attributes of each node:
$ openstack overcloud node introspect --all-manageable --provide
-
The
--all-manageable
option introspects only nodes in a managed state. In this example, it is all of them. The
--provide
option resets all nodes to anactive
state after introspection.ImportantMake sure this process runs to completion. This process usually takes 15 minutes for bare metal nodes.
-
The
Retrieve a list of your nodes to identify their UUIDs:
$ openstack baremetal node list
Add a profile option to the
properties/capabilities
parameter for each node to manually tag a node to a specific profile.For example, the following commands tag three additional nodes with the
ceph-storage
profile:$ ironic node-update 551d81f5-4df2-4e0f-93da-6c5de0b868f7 add properties/capabilities='profile:ceph-storage,boot_option:local' $ ironic node-update 5e735154-bd6b-42dd-9cc2-b6195c4196d7 add properties/capabilities='profile:ceph-storage,boot_option:local' $ ironic node-update 1a2b090c-299d-4c20-a25d-57dd21a7085b add properties/capabilities='profile:ceph-storage,boot_option:local'
If the nodes you just tagged and registered use multiple disks, you can set the director to use a specific root disk on each node. See Section 2.5, “Defining the root disk” for instructions on how to do so.
Re-deploying the Overcloud with Additional Ceph Storage Nodes
After registering and tagging the new nodes, you can now scale up the number of Ceph Storage nodes by re-deploying the overcloud. When you do, set the CephStorageCount
parameter in the parameter_defaults
of your environment file (in this case, ~/templates/storage-config.yaml
). In Section 7.1, “Assigning nodes and flavors to roles”, the overcloud is configured to deploy with 3 Ceph Storage nodes. To scale it up to 6 nodes instead, use:
parameter_defaults:
ControllerCount: 3
OvercloudControlFlavor: control
ComputeCount: 3
OvercloudComputeFlavor: compute
CephStorageCount: 6
OvercloudCephStorageFlavor: ceph-storage
CephMonCount: 3
OvercloudCephMonFlavor: ceph-mon
Upon re-deployment with this setting, the overcloud should now have 6 Ceph Storage nodes instead of 3.
10.2. Scaling down and replacing Ceph Storage nodes
In some cases, you might need to scale down your Ceph cluster, or even replace a Ceph Storage node, for example, if a Ceph Storage node is faulty. In either situation, you must disable and rebalance any Ceph Storage node that you want to remove from the overcloud to avoid data loss.
This procedure uses steps from the Red Hat Ceph Storage Administration Guide to manually remove Ceph Storage nodes. For more in-depth information about manual removal of Ceph Storage nodes, see Administering Ceph clusters that run in Containers and Removing a Ceph OSD using the command-line interface.
Procedure
-
Log in to a Controller node as the
heat-admin
user. The directorstack
user has an SSH key to access theheat-admin
user. List the OSD tree and find the OSDs for your node. For example, the node you want to remove might contain the following OSDs:
-2 0.09998 host overcloud-cephstorage-0 0 0.04999 osd.0 up 1.00000 1.00000 1 0.04999 osd.1 up 1.00000 1.00000
Disable the OSDs on the Ceph Storage node. In this case, the OSD IDs are 0 and 1.
[heat-admin@overcloud-controller-0 ~]$ sudo docker exec ceph-mon-<HOSTNAME> ceph osd out 0 [heat-admin@overcloud-controller-0 ~]$ sudo docker exec ceph-mon-<HOSTNAME> ceph osd out 1
The Ceph Storage cluster begins rebalancing. Wait for this process to complete. Follow the status by using the following command:
[heat-admin@overcloud-controller-0 ~]$ sudo docker exec ceph-mon-<HOSTNAME> ceph -w
After the Ceph cluster completes rebalancing, log in to the Ceph Storage node you are removing, in this case,
overcloud-cephstorage-0
, as theheat-admin
user, and stop and disable the node.[heat-admin@overcloud-cephstorage-0 ~]$ sudo systemctl stop ceph-osd@0 [heat-admin@overcloud-cephstorage-0 ~]$ sudo systemctl stop ceph-osd@1 [heat-admin@overcloud-cephstorage-0 ~]$ sudo systemctl disable ceph-osd@0 [heat-admin@overcloud-cephstorage-0 ~]$ sudo systemctl disable ceph-osd@1
Stop the OSDs.
[heat-admin@overcloud-cephstorage-0 ~]$ sudo systemctl stop ceph-osd@0 [heat-admin@overcloud-cephstorage-0 ~]$ sudo systemctl stop ceph-osd@1
While logged in to the Controller node, remove the OSDs from the CRUSH map so that they no longer receive data.
[heat-admin@overcloud-controller-0 ~]$ sudo docker exec ceph-mon-<HOSTNAME> ceph osd crush remove osd.0 [heat-admin@overcloud-controller-0 ~]$ sudo docker exec ceph-mon-<HOSTNAME> ceph osd crush remove osd.1
Remove the OSD authentication key.
[heat-admin@overcloud-controller-0 ~]$ sudo docker exec ceph-mon-<HOSTNAME> ceph auth del osd.0 [heat-admin@overcloud-controller-0 ~]$ sudo docker exec ceph-mon-<HOSTNAME> ceph auth del osd.1
Remove the OSD from the cluster.
[heat-admin@overcloud-controller-0 ~]$ sudo docker exec ceph-mon-<HOSTNAME> ceph osd rm 0 [heat-admin@overcloud-controller-0 ~]$ sudo docker exec ceph-mon-<HOSTNAME> ceph osd rm 1
Remove the Storage node from the CRUSH map:
[heat-admin@overcloud-controller-0 ~]$ sudo docker exec ceph-mon-<HOSTNAME> ceph osd crush rm <NODE> [heat-admin@overcloud-controller-0 ~]$ sudo ceph osd crush remove <NODE>
You can confirm the <NODE> name as defined in the CRUSH map by searching the CRUSH tree:
[heat-admin@overcloud-controller-0 ~]$ sudo docker exec ceph-mon-<HOSTNAME> ceph osd crush tree | grep overcloud-osd-compute-3 -A 4 "name": "overcloud-osd-compute-3", "type": "host", "type_id": 1, "items": [] }, [heat-admin@overcloud-controller-0 ~]$
In the CRUSH tree, ensure that the items list is empty. If the list is not empty, revisit step 7.
Leave the node and return to the undercloud as the
stack
user.[heat-admin@overcloud-controller-0 ~]$ exit [stack@director ~]$
Disable the Ceph Storage node so the director does not reprovision it.
[stack@director ~]$ openstack baremetal node list [stack@director ~]$ openstack baremetal node maintenance set UUID
Removing a Ceph Storage node requires an update to the
overcloud
stack in director with the local template files. First identify the UUID of the overcloud stack:$ openstack stack list
Identify the UUIDs of the Ceph Storage node you want to delete:
$ openstack server list
Delete the node from the stack and update the plan accordingly:
ImportantIf you passed any extra environment files when you created the overcloud, pass them again here by using the
-e
option to avoid making undesired changes to the overcloud. For more information, see Modifying the Overcloud Environment in the Director Installation and Usage guide.$ openstack overcloud node delete / --stack <stack-name> / --templates / -e <other-environment-files> / <node_UUID>
-
Wait until the stack completes its update. Use the
heat stack-list --show-nested
command to monitor the stack update. Add new nodes to the director node pool and deploy them as Ceph Storage nodes. Use the
CephStorageCount
parameter inparameter_defaults
of your environment file, in this case,~/templates/storage-config.yaml
, to define the total number of Ceph Storage nodes in the overcloud.parameter_defaults: ControllerCount: 3 OvercloudControlFlavor: control ComputeCount: 3 OvercloudComputeFlavor: compute CephStorageCount: 3 OvercloudCephStorageFlavor: ceph-storage CephMonCount: 3 OvercloudCephMonFlavor: ceph-mon
NoteFor more information about how to define the number of nodes per role, see Section 7.1, “Assigning nodes and flavors to roles”.
After you update your environment file, re-deploy the overcloud:
$ openstack overcloud deploy --templates -e <ENVIRONMENT_FILES>
Director provisions the new node and updates the entire stack with the details of the new node.
Log in to a Controller node as the
heat-admin
user and check the status of the Ceph Storage node:[heat-admin@overcloud-controller-0 ~]$ sudo ceph status
-
Confirm that the value in the
osdmap
section matches the number of nodes in your cluster that you want. The Ceph Storage node that you removed is replaced with a new node.
10.3. Adding an OSD to a Ceph Storage node
This procedure demonstrates how to add an OSD to a node. For more information about Ceph OSDs, see Ceph OSDs in the Red Hat Ceph Storage Operations Guide.
Procedure
Notice the following heat template deploys Ceph Storage with three OSD devices:
parameter_defaults: CephAnsibleDisksConfig: devices: - /dev/sdb - /dev/sdc - /dev/sdd osd_scenario: lvm osd_objectstore: bluestore
To add an OSD, update the node disk layout as described in Section 5.2, “Mapping the Ceph Storage node disk layout”. In this example, add
/dev/sde
to the template:parameter_defaults: CephAnsibleDisksConfig: devices: - /dev/sdb - /dev/sdc - /dev/sdd - /dev/sde osd_scenario: lvm osd_objectstore: bluestore
-
Run
openstack overcloud deploy
to update the overcloud.
This example assumes that all hosts with OSDs have a new device called /dev/sde
. If you do not want all nodes to have the new device, update the heat template as shown and see Section 5.2.5, “Mapping the Disk Layout to Non-Homogeneous Ceph Storage Nodes” for information about how to define hosts with a differing devices
list.
10.4. Removing an OSD from a Ceph Storage node
This procedure demonstrates how to remove an OSD from a node. It assumes the following about the environment:
-
A server (
ceph-storage0
) has an OSD (ceph-osd@4
) running on/dev/sde
. -
The Ceph monitor service (
ceph-mon
) is running oncontroller0
. - There are enough available OSDs to ensure the storage cluster is not at its near-full ratio.
For more information about Ceph OSDs, see Ceph OSDs in the Red Hat Ceph Storage Operations Guide.
Procedure
-
SSH into
ceph-storage0
and log in asroot
. Disable and stop the OSD service:
[root@ceph-storage0 ~]# systemctl disable ceph-osd@4 [root@ceph-stoarge0 ~]# systemctl stop ceph-osd@4
-
Disconnect from
ceph-storage0
. -
SSH into
controller0
and log in asroot
. Identify the name of the Ceph monitor container:
[root@controller0 ~]# docker ps | grep ceph-mon ceph-mon-controller0 [root@controller0 ~]#
Enable the Ceph monitor container to mark the undesired OSD as
out
:[root@controller0 ~]# docker exec ceph-mon-controller0 ceph osd out 4
NoteThis command causes Ceph to rebalance the storage cluster and copy data to other OSDs in the cluster. The cluster temporarily leaves the
active+clean
state until rebalancing is complete.Run the following command and wait for the storage cluster state to become
active+clean
:[root@controller0 ~]# docker exec ceph-mon-controller0 ceph -w
Remove the OSD from the CRUSH map so that it no longer receives data:
[root@controller0 ~]# docker exec ceph-mon-controller0 ceph osd crush remove osd.4
Remove the OSD authentication key:
[root@controller0 ~]# docker exec ceph-mon-controller0 ceph auth del osd.4
Remove the OSD:
[root@controller0 ~]# docker exec ceph-mon-controller0 ceph osd rm 4
-
Disconnect from
controller0
. -
SSH into the undercloud as the
stack
user and locate the heat environment file in which you defined theCephAnsibleDisksConfig
parameter. Notice the heat template contains four OSDs:
parameter_defaults: CephAnsibleDisksConfig: devices: - /dev/sdb - /dev/sdc - /dev/sdd - /dev/sde osd_scenario: lvm osd_objectstore: bluestore
Modify the template to remove
/dev/sde
.parameter_defaults: CephAnsibleDisksConfig: devices: - /dev/sdb - /dev/sdc - /dev/sdd osd_scenario: lvm osd_objectstore: bluestore
Run
openstack overcloud deploy
to update the overcloud.NoteThis example assumes that you removed the
/dev/sde
device from all hosts with OSDs. If you do not remove the same device from all nodes, update the heat template as shown and see Section 5.2.5, “Mapping the Disk Layout to Non-Homogeneous Ceph Storage Nodes” for information about how to define hosts with a differingdevices
list.