Chapter 14. Rebooting the environment

It might become necessary to reboot the environment. For example, when you need to modify physical servers or recover from a power outage. In these types of situations, it is important to make sure your Ceph Storage nodes boot correctly.

You must boot the nodes in the following order:

  1. Boot all Ceph Monitor nodes first - This ensures the Ceph Monitor service is active in your high availability Ceph cluster. By default, the Ceph Monitor service is installed on the Controller node. If the Ceph Monitor is separate from the Controller in a custom role, make sure this custom Ceph Monitor role is active.
  2. Boot all Ceph Storage nodes - This ensures the Ceph OSD cluster can connect to the active Ceph Monitor cluster on the Controller nodes.

14.1. Rebooting a Ceph Storage (OSD) cluster

Complete the following steps to reboot a cluster of Ceph Storage (OSD) nodes.

Prerequisites

  • On a Ceph Monitor or Controller node that is running the ceph-mon service, check that the Red Hat Ceph Storage cluster status is healthy and the pg status is active+clean:

    $ sudo cephadm -- shell ceph status

    If the Ceph cluster is healthy, it returns a status of HEALTH_OK.

    If the Ceph cluster status is unhealthy, it returns a status of HEALTH_WARN or HEALTH_ERR. For troubleshooting guidance, see the Red Hat Ceph Storage 5 Troubleshooting Guide.

Procedure

  1. Log in to a Ceph Monitor or Controller node that is running the ceph-mon service, and disable Ceph Storage cluster rebalancing temporarily:

    $ sudo cephadm shell -- ceph osd set noout
    $ sudo cephadm shell -- ceph osd set norebalance
    Note

    If you have a multistack or distributed compute node (DCN) architecture, you must specify the Ceph cluster name when you set the noout and norebalance flags. For example: sudo cephadm shell -c /etc/ceph/<cluster>.conf -k /etc/ceph/<cluster>.client.keyring.

  2. Select the first Ceph Storage node that you want to reboot and log in to the node.
  3. Reboot the node:

    $ sudo reboot
  4. Wait until the node boots.
  5. Log in to the node and check the Ceph cluster status:

    $ sudo cephadm -- shell ceph status

    Check that the pgmap reports all pgs as normal (active+clean).

  6. Log out of the node, reboot the next node, and check its status. Repeat this process until you have rebooted all Ceph Storage nodes.
  7. When complete, log in to a Ceph Monitor or Controller node that is running the ceph-mon service and enable Ceph cluster rebalancing:

    $ sudo cephadm shell -- ceph osd unset noout
    $ sudo cephadm shell -- ceph osd unset norebalance
    Note

    If you have a multistack or distributed compute node (DCN) architecture, you must specify the Ceph cluster name when you unset the noout and norebalance flags. For example: sudo cephadm shell -c /etc/ceph/<cluster>.conf -k /etc/ceph/<cluster>.client.keyring

  8. Perform a final status check to verify that the cluster reports HEALTH_OK:

    $ sudo cephadm shell ceph status

14.2. Rebooting Ceph Storage OSDs to enable connectivity to the Ceph Monitor service

If a situation occurs where all overcloud nodes boot at the same time, the Ceph OSD services might not start correctly on the Ceph Storage nodes. In this situation, reboot the Ceph Storage OSDs so they can connect to the Ceph Monitor service.

Procedure

  • Verify a HEALTH_OK status of the Ceph Storage node cluster:

    $ sudo ceph status