Chapter 4. Removing a Node From the Overcloud

The Red Hat OpenStack Platform director (RHOSP-d) does not support the removal of a Red Hat Ceph Storage (RHCS) node automatically.

4.1. Prerequisites

  • Verify there will be enough CPU and RAM to service the workloads.
  • Migrate the compute workloads off of the node being removed.
  • Verify that the storage cluster has enough reserve storage capacity to maintain a status of HEALTH_OK.

4.2. Removing the Ceph OSD services from the storage cluster

This procedure removes the Ceph OSD services from being a member of the storage cluster.

Prerequisite

  • A healthy Ceph storage cluster.

Procedure

Do the following steps on one of the Controller/Monitor nodes, and as the root user, unless otherwise stated.

  1. Verify the health status of the Ceph storage cluster:

    [root@controller ~]# ceph health

    The health status must be HEALTH_OK before continuing on with this procedure.

    Warning

    If the ceph health command reports that the storage cluster is near full, then removing a Ceph OSD could result in exceeding or reaching the full ratio limit. This could cause data loss. If the storage cluster is near full, then contact Red Hat Support before proceeding.

  2. Determine the number of Ceph OSDs for removal:

    [root@controller ~]# ceph osd tree

    Example Output

    ID WEIGHT   TYPE NAME                        UP/DOWN REWEIGHT PRIMARY-AFFINITY
    -1 52.37256 root default
    -2 13.09314     host overcloud-osd-compute-3
     0  1.09109         osd.0                         up  1.00000          1.00000
     4  1.09109         osd.4                         up  1.00000          1.00000
     8  1.09109         osd.8                         up  1.00000          1.00000
    12  1.09109         osd.12                        up  1.00000          1.00000
    16  1.09109         osd.16                        up  1.00000          1.00000
    20  1.09109         osd.20                        up  1.00000          1.00000
    24  1.09109         osd.24                        up  1.00000          1.00000
    28  1.09109         osd.28                        up  1.00000          1.00000
    32  1.09109         osd.32                        up  1.00000          1.00000
    36  1.09109         osd.36                        up  1.00000          1.00000
    40  1.09109         osd.40                        up  1.00000          1.00000
    44  1.09109         osd.44                        up  1.00000          1.00000
    ...

    To view the total number of OSDs up and in:

    [root@controller ~]# ceph osd stat

    Example Output

    osdmap e173: 48 osds: 48 up, 48 in
                flags sortbitwise

  3. Monitor the Ceph storage cluster from a new terminal session:

    [root@controller ~]# ceph -w

    In this terminal session, you can watch as the OSD is removed from the storage cluster. Go back to the original terminal session for the next step.

  4. Mark the OSD out:

    ceph osd out $OSD_NUM
    Replace…​
    • $OSD_NUM with the number portion of the OSD name.

      Example

      [root@controller ~]# ceph osd out 0
      marked out osd.0.

      Set all OSDs on the node to out.

      Note

      If scripting this step to handle multiple OSDs sequentially, then set a sleep command of at least 10 seconds in between the running of each ceph osd out command.

  5. Wait for all the placement groups to become active+clean and the storage cluster is in a HEALTH_OK state. You can watch the placement group migration from the new terminal session from step 3. This rebalancing of data can take some time to complete.
  6. Verify the health status of the Ceph storage cluster:

    [root@controller ~]# ceph health
  7. From the Compute/OSD node, and as the root user, disable and stop all OSD daemons:

    [root@osdcompute ~]# systemctl disable ceph-osd.target
    [root@osdcompute ~]# systemctl stop ceph-osd.target
  8. Remove the OSD from the CRUSH map:

    ceph osd crush remove osd.$OSD_NUM
    Replace…​
    • $OSD_NUM with the number portion of the OSD name.

      Example

      [root@controller ~]# ceph osd crush remove osd.0
      removed item id 0 name 'osd.0' from crush map

      Note

      Removing an OSD from the CRUSH map, causes CRUSH to recompute which OSDs get the placement groups, and rebalances the data accordingly.

  9. Remove the OSD authentication key:

    ceph auth del osd.$OSD_NUM
    Replace…​
    • $OSD_NUM with the number portion of the OSD name.

      Example

      [root@controller ~]# ceph osd auth del osd.0
      updated

  10. Remove the OSD:

    ceph osd rm $OSD_NUM
    Replace…​
    • $OSD_NUM with the number portion of the OSD name.

      Example

      [root@controller ~]# ceph osd rm 0
      removed osd.0

4.3. Removing the nova compute services from the overcloud

This procedure removes the Nova compute services from being a member of the overcloud, and powers off the hardware.

Prerequisite

  • Migrate any running instances to another compute node in the overcloud.

Procedure

Do the following steps on the Red Hat OpenStack Platform director (RHOSP-d) node, as the stack user.

  1. Verify the status of the compute node:

    [stack@director ~]$ nova service-list
  2. Disable the compute service:

    nova service-disable $HOST_NAME nova-compute
    Replace…​
    • $HOST_NAME with the compute’s host name.

      Example

      [stack@director ~]$ nova service-disable overcloud-osd-compute-3.localdomain nova-compute
      +-------------------------------------+--------------+----------+
      | Host                                | Binary       | Status   |
      +-------------------------------------+--------------+----------+
      | overcloud-osd-compute-3.localdomain | nova-compute | disabled |
      +-------------------------------------+--------------+----------+

  3. Collect the Nova ID of the compute node:

    [stack@director ~]$ openstack server list

    Write down the Nova UUID, which is in the first column of the command output.

  4. Collect the OpenStack Platform name:

    [stack@director ~]$ heat stack-list

    Write down the stack_name, which is the second column of the command output.

  5. Delete the compute node by UUID from the overcloud:

    openstack overcloud node delete --stack OSP_NAME NOVA_UUID
    Replace…​
    • OSP_NAME with the `stack_name`from the previous step.
    • NOVA_UUID with the Nova UUID from the previous step.

      Example

      [stack@director ~]$ openstack overcloud node delete --stack overcloud 6b2a2e71-f9c8-4d5b-aaf8-dada97c90821
      deleting nodes [u'6b2a2e71-f9c8-4d5b-aaf8-dada97c90821'] from stack overcloud
      Started Mistral Workflow. Execution ID: 396f123d-df5b-4f37-b137-83d33969b52b

  6. Verify that the compute node was removed from the overcloud:

    [stack@director ~]$ openstack server list

    If the compute node was successfully removed, then it will not be listed in the above command output.

    [stack@director ~]$ nova service-list

    The removed Nova compute node’s status will be disabled and down.

  7. Verify that Ironic has powered off the node:

    [stack@director ~]$ openstack baremetal node list

    The compute node’s power state and availability will be power off and available respectively. Write down the Nova compute service ID, which is the value in the first column of the above command output.

  8. Remove the compute node from the nova-compute service from the Nova scheduler:

    nova service-delete COMPUTE_SERVICE_ID
    Replace…​
    • COMPUTE_SERVICE_ID with the Nova compute service ID from the previous step.

      Example

      [stack@director ~]$ nova service-delete 145

4.4. Addtional Resources

  • The Red Hat Ceph Storage Administration Guide.