Chapter 2. Understanding process management for Ceph

As a storage administrator, you can manipulate the various Ceph daemons by type or instance, on bare-metal or in containers. Manipulating these daemons allows you to start, stop and restart all of the Ceph services as needed.

2.1. Prerequisites

  • Installation of the Red Hat Ceph Storage software.

2.2. Ceph process management

In Red Hat Ceph Storage, all process management is done through the Systemd service. Each time you want to start, restart, and stop the Ceph daemons, you must specify the daemon type or the daemon instance.

Additional Resources

  • For more information about using Systemd, see the chapter Managing services with systemd in the Red Hat Enterprise Linux System Administrator’s Guide.

2.3. Starting, stopping, and restarting all Ceph daemons

Start, stop, and restart all Ceph daemons as an admin from the node.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Having root access to the node.

Procedure

  1. Starting all Ceph daemons:

    [root@admin ~]# systemctl start ceph.target
  2. Stopping all Ceph daemons:

    [root@admin ~]# systemctl stop ceph.target
  3. Restarting all Ceph daemons:

    [root@admin ~]# systemctl restart ceph.target

2.4. Starting, stopping, and restarting the Ceph daemons by type

To start, stop, or restart all Ceph daemons of a particular type, follow these procedures on the node running the Ceph daemons.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Having root access to the node.

Procedure

  • On Ceph Monitor nodes:

    Starting:

    [root@mon ~]# systemctl start ceph-mon.target

    Stopping:

    [root@mon ~]# systemctl stop ceph-mon.target

    Restarting:

    [root@mon ~]# systemctl restart ceph-mon.target

  • On Ceph Manager nodes:

    Starting:

    [root@mgr ~]# systemctl start ceph-mgr.target

    Stopping:

    [root@mgr ~]# systemctl stop ceph-mgr.target

    Restarting:

    [root@mgr ~]# systemctl restart ceph-mgr.target

  • On Ceph OSD nodes:

    Starting:

    [root@osd ~]# systemctl start ceph-osd.target

    Stopping:

    [root@osd ~]# systemctl stop ceph-osd.target

    Restarting:

    [root@osd ~]# systemctl restart ceph-osd.target

  • On Ceph Object Gateway nodes:

    Starting:

    [root@rgw ~]# systemctl start ceph-radosgw.target

    Stopping:

    [root@rgw ~]# systemctl stop ceph-radosgw.target

    Restarting:

    [root@rgw ~]# systemctl restart ceph-radosgw.target

2.5. Starting, stopping, and restarting the Ceph daemons by instance

To start, stop, or restart a Ceph daemon by instance, follow these procedures on the node running the Ceph daemons.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Having root access to the node.

Procedure

  • On a Ceph Monitor node:

    Starting:

    [root@mon ~]# systemctl start ceph-mon@MONITOR_HOST_NAME

    Stopping:

    [root@mon ~]# systemctl stop ceph-mon@MONITOR_HOST_NAME

    Restarting:

    [root@mon ~]# systemctl restart ceph-mon@MONITOR_HOST_NAME

    Replace

    • MONITOR_HOST_NAME with the name of the Ceph Monitor node.
  • On a Ceph Manager node:

    Starting:

    [root@mgr ~]# systemctl start ceph-mgr@MANAGER_HOST_NAME

    Stopping:

    [root@mgr ~]# systemctl stop ceph-mgr@MANAGER_HOST_NAME

    Restarting:

    [root@mgr ~]# systemctl restart ceph-mgr@MANAGER_HOST_NAME

    Replace

    • MANAGER_HOST_NAME with the name of the Ceph Manager node.
  • On a Ceph OSD node:

    Starting:

    [root@osd ~]# systemctl start ceph-osd@OSD_NUMBER

    Stopping:

    [root@osd ~]# systemctl stop ceph-osd@OSD_NUMBER

    Restarting:

    [root@osd ~]# systemctl restart ceph-osd@OSD_NUMBER

    Replace

    • OSD_NUMBER with the ID number of the Ceph OSD.

      For example, when looking at the ceph osd tree command output, osd.0 has an ID of 0.

  • On a Ceph Object Gateway node:

    Starting:

    [root@rgw ~]# systemctl start ceph-radosgw@rgw.OBJ_GATEWAY_HOST_NAME

    Stopping:

    [root@rgw ~]# systemctl stop ceph-radosgw@rgw.OBJ_GATEWAY_HOST_NAME

    Restarting:

    [root@rgw ~]# systemctl restart ceph-radosgw@rgw.OBJ_GATEWAY_HOST_NAME

    Replace

    • OBJ_GATEWAY_HOST_NAME with the name of the Ceph Object Gateway node.

2.6. Starting, stopping, and restarting Ceph daemons that run in containers

Use the systemctl command start, stop, or restart Ceph daemons that run in containers.

Prerequisites

  • Installation of the Red Hat Ceph Storage software.
  • Root-level access to the node.

Procedure

  1. To start, stop, or restart a Ceph daemon running in a container, run a systemctl command as root composed in the following format:

    systemctl ACTION ceph-DAEMON@ID
    Replace
    • ACTION is the action to perform; start, stop, or restart.
    • DAEMON is the daemon; osd, mon, mds, or rgw.
    • ID is either:

      • The short host name where the ceph-mon, ceph-mds, or ceph-rgw daemons are running.
      • The ID of the ceph-osd daemon if it was deployed.

    For example, to restart a ceph-osd daemon with the ID osd01:

    [root@osd ~]# systemctl restart ceph-osd@osd01

    To start a ceph-mon demon that runs on the ceph-monitor01 host:

    [root@mon ~]# systemctl start ceph-mon@ceph-monitor01

    To stop a ceph-rgw daemon that runs on the ceph-rgw01 host:

    [root@rgw ~]# systemctl stop ceph-radosgw@ceph-rgw01
  2. Verify that the action was completed successfully.

    systemctl status ceph-DAEMON@ID

    For example:

    [root@mon ~]# systemctl status ceph-mon@ceph-monitor01

Additional Resources

2.7. Viewing the logs of Ceph daemons that run in containers

Use the journald daemon from the container host to view the logs of a Ceph daemon from a container.

Prerequisites

  • Installation of the Red Hat Ceph Storage software.
  • Root-level access to the node.

Procedure

  1. To view the entire Ceph log, run a journalctl command as root composed in the following format:

    journalctl -u ceph-DAEMON@ID
    Replace
    • DAEMON is the Ceph daemon; osd, mon, or rgw.
    • ID is either:

      • The short host name where the ceph-mon, ceph-mds, or ceph-rgw daemons are running.
      • The ID of the ceph-osd daemon if it was deployed.

    For example, to view the entire log for the ceph-osd daemon with the ID osd01:

    [root@osd ~]# journalctl -u ceph-osd@osd01
  2. To show only the recent journal entries, use the -f option.

    journalctl -fu ceph-DAEMON@ID

    For example, to view only recent journal entries for the ceph-mon daemon that runs on the ceph-monitor01 host:

    [root@mon ~]# journalctl -fu ceph-mon@ceph-monitor01
Note

You can also use the sosreport utility to view the journald logs. For more details about SOS reports, see the What is an sosreport and how to create one in Red Hat Enterprise Linux? solution on the Red Hat Customer Portal.

Additional Resources

  • The journalctl(1) manual page.

2.8. Enabling logging to a file for containerized Ceph daemons

By default, containerized Ceph daemons do not log to files. You can use centralized configuration management to enable containerized Ceph daemons to log to files.

Prerequisites

  • Installation of the Red Hat Ceph Storage software.
  • Root-level access to the node where the containerized daemon runs.

Procedure

  1. Navigate to the var/log/ceph directory:

    Example

    [root@host01 ~]# cd /var/log/ceph

  2. Note any existing log files.

    Syntax

    ls -l /var/log/ceph/

    Example

    [root@host01 ceph]# ls -l /var/log/ceph/
    total 396
    -rw-r--r--. 1 ceph ceph 107230 Feb  5 14:42 ceph-osd.0.log
    -rw-r--r--. 1 ceph ceph 107230 Feb  5 14:42 ceph-osd.3.log
    -rw-r--r--. 1 root root 181641 Feb  5 14:42 ceph-volume.log

    In the example, logging to files for OSD.0 and OSD.3 are already enabled.

  3. Fetch the container name of the daemon for which you want to enable logging:

    Red Hat Enterprise Linux 7

    [root@host01 ceph]# docker ps -a

    Red Hat Enterprise Linux 8

    [root@host01 ceph]# podman ps -a

  4. Use centralized configuration management to enable logging to a file for a Ceph daemon.

    Red Hat Enterprise Linux 7

    docker exec CONTAINER_NAME ceph config set DAEMON_NAME log_to_file true

    Red Hat Enterprise Linux 8

    podman exec CONTAINER_NAME ceph config set DAEMON_NAME log_to_file true

    The DAEMON_NAME is derived from the CONTAINER_NAME. Remove ceph- and replace the hyphen between the daemon and daemon ID with a period.

    Red Hat Enterprise Linux 7

    [root@host01 ceph]# docker exec ceph-mon-host01 ceph config set mon.host01 log_to_file true

    Red Hat Enterprise Linux 8

    [root@host01 ceph]# podman exec ceph-mon-host01 ceph config set mon.host01 log_to_file true

  5. Optional: To enable logging to a file for the cluster log, use the mon_cluster_log_to_file option:

    Red Hat Enterprise Linux 7

    docker exec CONTAINER_NAME ceph config set DAEMON_NAME mon_cluster_log_to_file true

    Red Hat Enterprise Linux 8

    podman exec CONTAINER_NAME ceph config set DAEMON_NAME mon_cluster_log_to_file true

    Red Hat Enterprise Linux 7

    [root@host01 ceph]# docker exec ceph-mon-host01 ceph config set mon.host01 mon_cluster_log_to_file true

    Red Hat Enterprise Linux 8

    [root@host01 ceph]# podman exec ceph-mon-host01 ceph config set mon.host01 mon_cluster_log_to_file true

  6. Validate the updated configuration:

    Red Hat Enterprise Linux 7

    docker exec CONTAINER_NAME ceph config show-with-defaults DAEMON_NAME | grep log_to_file

    Red Hat Enterprise Linux 8

    podman exec CONTAINER_NAME ceph config show-with-defaults DAEMON_NAME | grep log_to_file

    Example

    [root@host01 ceph]# podman exec ceph-mon-host01 ceph config show-with-defaults mon.host01 | grep log_to_file
    log_to_file                                                true                                                                                                                                                                                                                                      mon      default[false]
    mon_cluster_log_to_file                                    true                                                                                                                                                                                                                                      mon      default[false]

  7. Optional: Restart the Ceph daemon:

    Syntax

    systemctl restart ceph-DAEMON@DAEMON_ID

    Example

    [root@host01 ceph]# systemctl restart ceph-mon@host01

  8. Validate that the new log files exist:

    Syntax

    ls -l /var/log/ceph/

    Example

    [root@host01 ceph]# ls -l /var/log/ceph/
    total 408
    -rw-------. 1 ceph ceph    202 Feb  5 16:06 ceph.audit.log
    -rw-------. 1 ceph ceph   3182 Feb  5 16:06 ceph.log
    -rw-r--r--. 1 ceph ceph   2049 Feb  5 16:06 ceph-mon.host01.log
    -rw-r--r--. 1 ceph ceph 107230 Feb  5 14:42 ceph-osd.0.log
    -rw-r--r--. 1 ceph ceph 107230 Feb  5 14:42 ceph-osd.3.log
    -rw-r--r--. 1 root root 181641 Feb  5 14:42 ceph-volume.log

    Two new files were created, ceph-mon.host01.log for a Monitor daemon and ceph.log for the cluster log.

Additional Resources

  • For more information, see Configuring logging section in the Red Hat Ceph Storage Troubleshooting Guide.

2.9. Gathering log files of Ceph daemons

To gather log files of Ceph daemons, run the gather-ceph-logs.yml Ansible playbook. Currently, Red Hat Ceph Storage supports gathering logs for non-containerized deployments only.

Prerequisites

  • A running Red Hat Ceph Storage cluster deployed.
  • Admin-level access to the Ansible node.

Procedure

  1. Navigate to the /usr/share/ceph-ansible directory:

    [ansible@admin ~]# cd /usr/share/ceph-ansible
  2. Run the playbook:

    [ansible@admin ~]# ansible-playbook infrastructure-playbooks/gather-ceph-logs.yml -i hosts
  3. Wait for the logs to be collected on the Ansible administration node.

Additional Resources

2.10. Powering down and rebooting Red Hat Ceph Storage cluster

Follow the below procedure for powering down and rebooting the Ceph cluster.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Having root access.

Procedure

Powering down the Red Hat Ceph Storage cluster

  1. Stop the clients from using the RBD images and RADOS Gateway on this cluster and any other clients.
  2. The cluster must be in healthy state (Health_OK and all PGs active+clean) before proceeding. Run ceph status on a node with the client keyrings, for example, the Ceph Monitor or OpenStack controller nodes, to ensure the cluster is healthy.
  3. If you use the Ceph File System (CephFS), the CephFS cluster must be brought down. Taking a CephFS cluster down is done by reducing the number of ranks to 1, setting the cluster_down flag, and then failing the last rank.

    Example:

    [root@osd ~]# ceph fs set FS_NAME max_mds 1
    [root@osd ~]# ceph mds deactivate FS_NAME:1 # rank 2 of 2
    [root@osd ~]# ceph status # wait for rank 1 to finish stopping
    [root@osd ~]# ceph fs set FS_NAME cluster_down true
    [root@osd ~]# ceph mds fail FS_NAME:0

    Setting the cluster_down flag prevents standbys from taking over the failed rank.

  4. Set the noout, norecover, norebalance, nobackfill, nodown and pause flags. Run the following on a node with the client keyrings. For example, the Ceph Monitor or OpenStack controller node:

    [root@mon ~]# ceph osd set noout
    [root@mon ~]# ceph osd set norecover
    [root@mon ~]# ceph osd set norebalance
    [root@mon ~]# ceph osd set nobackfill
    [root@mon ~]# ceph osd set nodown
    [root@mon ~]# ceph osd set pause
  5. Shut down the OSD nodes one by one:

    [root@osd ~]# systemctl stop ceph-osd.target
  6. Shut down the monitor nodes one by one:

    [root@mon ~]# systemctl stop ceph-mon.target

Rebooting the Red Hat Ceph Storage cluster

  1. Power on the administration node.
  2. Power on the monitor nodes:

    [root@mon ~]# systemctl start ceph-mon.target
  3. Power on the OSD nodes:

    [root@osd ~]# systemctl start ceph-osd.target
  4. Wait for all the nodes to come up. Verify all the services are up and the connectivity is fine between the nodes.
  5. Unset the noout, norecover, norebalance, nobackfill, nodown and pause flags. Run the following on a node with the client keyrings. For example, the Ceph Monitor or OpenStack controller node:

    [root@mon ~]# ceph osd unset noout
    [root@mon ~]# ceph osd unset norecover
    [root@mon ~]# ceph osd unset norebalance
    [root@mon ~]# ceph osd unset nobackfill
    [root@mon ~]# ceph osd unset nodown
    [root@mon ~]# ceph osd unset pause
  6. If you use the Ceph File System (CephFS), the CephFS cluster must be brought back up by setting the cluster_down flag to false:

    [root@admin~]# ceph fs set FS_NAME cluster_down false
  7. Verify the cluster is in healthy state (Health_OK and all PGs active+clean). Run ceph status on a node with the client keyrings. For example, the Ceph Monitor or OpenStack controller nodes, to ensure the cluster is healthy.

2.11. Additional Resources