Chapter 1. Managing the storage cluster size

As a storage administrator, you can manage the storage cluster size by adding or removing Ceph Monitors or OSDs as storage capacity expands or shrinks.

Note

If you are bootstrapping a storage cluster for the first time, see the Red Hat Ceph Storage 3 Installation Guide for Red Hat Enterprise Linux or Ubuntu.

1.1. Prerequisites

  • A running Red Hat Ceph Storage cluster.

1.2. Ceph Monitors

Ceph monitors are light-weight processes that maintain a master copy of the cluster map. All Ceph clients contact a Ceph monitor and retrieve the current copy of the cluster map, enabling clients to bind to a pool and read and write data.

Ceph monitors use a variation of the Paxos protocol to establish consensus about maps and other critical information across the cluster. Due to the nature of Paxos, Ceph requires a majority of monitors running to establish a quorum thus establishing consensus.

Important

Red Hat requires at least three monitors on separate hosts to receive support for a production cluster.

Red Hat recommends deploying an odd number of monitors. An odd number of monitors has a higher resiliency to failures than an even number of monitors. For example, to maintain a quorum on a two monitor deployment, Ceph cannot tolerate any failures; with three monitors, one failure; with four monitors, one failure; with five monitors, two failures. This is why an odd number is advisable. Summarizing, Ceph needs a majority of monitors to be running and to be able to communicate with each other, two out of three, three out of four, and so on.

For an initial deployment of a multi-node Ceph storage cluster, Red Hat requires three monitors, increasing the number two at a time if a valid need for more than three monitors exists.

Since monitors are light-weight, it is possible to run them on the same host as OpenStack nodes. However, Red Hat recommends running monitors on separate hosts.

Important

Collocating monitors and OSDs on the same node can impair performance and is not supported.

When you remove monitors from a storage cluster, consider that Ceph monitors use the Paxos protocol to establish a consensus about the master storage cluster map. You must have a sufficient number of monitors to establish a quorum.

1.2.1. Preparing a new Ceph Monitor node

When adding a new Ceph Monitor to a storage cluster, deploy them on a separate node. The node hardware must be uniform for all monitor nodes in the storage cluster.

Prerequisites

Procedure

  1. Add the new node to the server rack.
  2. Connect the new node to the network.
  3. Install either Red Hat Enterprise Linux 7 or Ubuntu 16.04 on the new node.
  4. Install NTP and configure a reliable time source:

    [root@monitor ~]# yum install ntp
  5. If using a firewall, open TCP port 6789:

    Red Hat Enterprise Linux

    [root@monitor ~]# firewall-cmd --zone=public --add-port=6789/tcp
    [root@monitor ~]# firewall-cmd --zone=public --add-port=6789/tcp --permanent

    Ubuntu

    iptables -I INPUT 1 -i $NIC_NAME -p tcp -s $IP_ADDR/$NETMASK_PREFIX --dport 6789 -j ACCEPT

    Ubuntu example

    [user@monitor ~]$ sudo iptables -I INPUT 1 -i enp6s0 -p tcp -s 192.168.0.11/24 --dport 6789 -j ACCEPT

1.2.2. Adding a Ceph Monitor using Ansible

Red Hat recommends adding two monitors at a time to maintain an odd number of monitors. For example, if you have a three monitors in the storage cluster, Red Hat recommends expanding it to a five monitors.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Having root access to the new nodes.

Procedure

  1. Add the new Ceph Monitor nodes to the /etc/ansible/hosts Ansible inventory file, under a [mons] section:

    Example

    [mons]
    monitor01
    monitor02
    monitor03
    $NEW_MONITOR_NODE_NAME
    $NEW_MONITOR_NODE_NAME

  2. Verify that Ansible can contact the Ceph nodes:

    # ansible all -m ping
  3. Change directory to the Ansible configuration directory:

    # cd /usr/share/ceph-ansible
  4. Run the Ansible playbook:

    $ ansible-playbook site.yml

    If adding new monitors to a containerized deployment of Ceph, run the site-docker.yml playbook:

    $ ansible-playbook site-docker.yml
  5. After the Ansible playbook is finish, the new monitor nodes will be in the storage cluster.

1.2.3. Adding a Ceph monitor using the command-line interface

Red Hat recommends adding two monitors at a time to maintain an odd number of monitors. For example, if you have a three monitors in the storage cluster, Red Hat recommends expanding it to a five monitors.

Important

Red Hat recommends only running one Ceph monitor daemon per node.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Having root access to the new nodes.

Procedure

  1. Add the Red Hat Ceph Storage 3 monitor repository.

    Red Hat Enterprise Linux

    [root@monitor ~]# subscription-manager repos --enable=rhel-7-server-rhceph-3-mon-rpms

    Ubuntu

    [user@monitor ~]$ sudo bash -c 'umask 0077; echo deb https://$CUSTOMER_NAME:$CUSTOMER_PASSWORD@rhcs.download.redhat.com/3-updates/Tools $(lsb_release -sc) main | tee /etc/apt/sources.list.d/Tools.list'
    [user@monitor ~]$ sudo bash -c 'wget -O - https://www.redhat.com/security/fd431d51.txt | apt-key add -'

  2. Install the ceph-mon package on the new Ceph Monitor nodes:

    Red Hat Enterprise Linux

    [root@monitor ~]# yum install ceph-mon

    Ubuntu

    [user@monitor ~]$ sudo apt-get install ceph-mon

  3. To ensure the storage cluster identifies the monitor on start or restart, add the monitor’s IP address to the Ceph configuration file.

    To add the new monitors to the [mon] or [global] section of the Ceph configuration file on an existing monitor node in the storage cluster. The mon_host setting, which is a list of DNS-resolvable host names or IP addresses, separated by "," or ";" or " ". Optionally, you can also create a specific section in the Ceph configuration file for the new monitor nodes:

    Syntax

    [mon]
    mon host = $MONITOR_IP:$PORT $MONITOR_IP:$PORT ... $NEW_MONITOR_IP:$PORT

    or

    [mon.$MONITOR_ID]
    host = $MONITOR_ID
    mon addr = $MONITOR_IP

    To make the monitors part of the initial quorum group, you must also add the host name to the mon_initial_members parameter in the [global] section of the Ceph configuration file.

    Example

    [global]
    mon initial members = node1 node2 node3 node4 node5
    ...
    [mon]
    mon host = 192.168.0.1:6789 192.168.0.2:6789 192.168.0.3:6789 192.168.0.4:6789 192.168.0.5:6789
    ...
    [mon.node4]
    host = node4
    mon addr = 192.168.0.4
    
    [mon.node5]
    host = node5
    mon addr = 192.168.0.5

    Important

    Production storage clusters REQUIRE at least three monitors set in mon_initial_members and mon_host to ensure high availability. If a storage cluster with only one initial monitor adds two more monitors, but does not add them to mon_initial_members and mon_host, the failure of the initial monitor will cause the storage cluster to lock up. If the monitors you are adding are replacing monitors that are part of mon_initial_members and mon_host, the new monitors must be added to mon_initial_members and mon_host too.

  4. Copy updated Ceph configuration file to the Ceph nodes and Ceph clients:

    Syntax

    scp /etc/ceph/$CLUSTER_NAME.conf $TARGET_NODE_NAME:/etc/ceph

    Example

    [root@monitor ~]# scp /etc/ceph/ceph.conf node4:/etc/ceph

  5. Create the default monitor directory on the new nodes:

    Syntax

    mkdir /var/lib/ceph/mon/$CLUSTER_NAME-$MONITOR_ID

    Example

    [root@monitor ~]# mkdir /var/lib/ceph/mon/ceph-node4

  6. Create a temporary directory to keep the files needed during this process. This directory should be different from the monitor’s default directory created in the previous step, and can be removed after all the steps are completed:

    Syntax

    mkdir $TEMP_DIRECTORY

    Exmaple

    [root@monitor ~]# mkdir /tmp/ceph

  7. Copy the admin key from a running monitor node to the new monitor node so that you can run ceph commands:

    Syntax

    scp /etc/ceph/$CLUSTER_NAME.client.admin.keyring $TARGET_NODE_NAME:/etc/ceph

    Example

    [root@monitor ~]# scp /etc/ceph/ceph.client.admin.keyring node4:/etc/ceph

  8. Retrieve the monitor keyring:

    Syntax

    ceph auth get mon. -o /$TEMP_DIRECTORY/$KEY_FILE_NAME

    Example

    [root@monitor ~]# ceph auth get mon. -o /tmp/ceph/ceph_keyring.out

  9. Retrieve the monitor map:

    Syntax

    ceph mon getmap -o /$TEMP_DIRECTORY/$MONITOR_MAP_FILE

    Example

    [root@monitor ~]# ceph mon getmap -o /tmp/ceph/ceph_mon_map.out

  10. Prepare the monitor’s data directory created in the fourth step. You must specify the path to the monitor map so that you can retrieve the information about a quorum of monitors and their fsid. You must also specify a path to the monitor keyring:

    Syntax

    ceph-mon -i $MONITOR_ID --mkfs --monmap /$TEMP_DIRECTORY/$MONITOR_MAP_FILE --keyring /$TEMP_DIRECTORY/$KEY_FILE_NAME

    Example

    [root@monitor ~]# ceph-mon -i node4 --mkfs --monmap /tmp/ceph/ceph_mon_map.out --keyring /tmp/ceph/ceph_keyring.out

  11. For storage clusters with custom names, add the the following line to the /etc/sysconfig/ceph file:

    Red Hat Enterprise Linux

    [root@monitor ~]# echo "CLUSTER=<custom_cluster_name>" >> /etc/sysconfig/ceph

    Ubuntu

    [user@monitor ~]$ sudo echo "CLUSTER=<custom_cluster_name>" >> /etc/default/ceph

  12. Update the owner and group permissions:

    Syntax

    chown -R $OWNER:$GROUP $DIRECTORY_PATH

    Example

    [root@monitor ~]# chown -R ceph:ceph /var/lib/ceph/mon
    [root@monitor ~]# chown -R ceph:ceph /var/log/ceph
    [root@monitor ~]# chown -R ceph:ceph /var/run/ceph
    [root@monitor ~]# chown -R ceph:ceph /etc/ceph

  13. Enable and start the ceph-mon process on the new monitor nodes:

    Syntax

    systemctl enable ceph-mon.target
    systemctl enable ceph-mon@$MONITOR_ID
    systemctl start ceph-mon@$MONITOR_ID

    Example

    [root@monitor ~]# systemctl enable ceph-mon.target
    [root@monitor ~]# systemctl enable ceph-mon@node4
    [root@monitor ~]# systemctl start ceph-mon@node4

Additional Resources

1.2.4. Removing a Ceph Monitor using Ansible

To remove a Ceph Monitor with Ansible, use the shrink-mon.yml playbook.

Prerequisites

  • An Ansible administration node.
  • A running Red Hat Ceph Storage cluster deployed by Ansible.

Procedure

  1. Change to the /usr/share/ceph-ansible/ directory.

    [user@admin ~]$ cd /usr/share/ceph-ansible
  2. Copy the shrink-mon.yml playbook from the infrastructure-playbooks directory to the current directory.

    [root@admin ceph-ansible]# cp infrastructure-playbooks/shrink-mon.yml .
  3. Use the playbook.

    [user@admin ceph-ansible]$ ansible-playbook shrink-mon.yml -e mon_to_kill=<hostname> -u <ansible-user>

    Replace:

    • <hostname> with the short host name of the Monitor node. To remove more Monitors, separate their host names with comma.
    • <ansible-user> with the name of the Ansible user

    For example, to remove a Monitor that is located on a node with monitor1 host name:

    [user@admin ceph-ansible]$ ansible-playbook shrink-mon.yml -e mon_to_kill=monitor1 -u user
  4. Remove the Monitor entry from all Ceph configuration files in the cluster.
  5. Ensure that the Monitor has been successfully removed.

    [root@monitor ~]# ceph -s

Additional Resources

1.2.5. Removing a Ceph Monitor using the command-line interface

Removing a Ceph Monitor involves removing a ceph-mon daemon from the storage cluster and updating the storage cluster map.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Having root access to the monitor node.

Procedure

  1. Stop the monitor service:

    Syntax

    systemctl stop ceph-mon@$MONITOR_ID

    Example

    [root@monitor ~]# systemctl stop ceph-mon@node3

  2. Remove the monitor from the storage cluster:

    Syntax

    ceph mon remove $MONITOR_ID

    Example

    [root@monitor ~]# ceph mon remove node3

  3. Remove the monitor entry from the Ceph configuration file, by default /etc/ceph/ceph.conf.
  4. Redistribute the Ceph configuration file to all remaining Ceph nodes in the storage cluster:

    Syntax

    scp /etc/ceph/$CLUSTER_NAME.conf $USER_NAME@$TARGET_NODE_NAME:/etc/ceph/

    Example

    [root@monitor ~]# scp /etc/ceph/ceph.conf root@$node1:/etc/ceph/

  5. Optionally, you can archive the monitor data:

    Syntax

    mv /var/lib/ceph/mon/$CLUSTER_NAME-$MONITOR_ID /var/lib/ceph/mon/removed-$CLUSTER_NAME-$MONITOR_ID

    Example

    [root@monitor ~]# mv /var/lib/ceph/mon/ceph-node3 /var/lib/ceph/mon/removed-ceph-node3

  6. Optionally, you can delete the monitor data:

    Syntax

    rm -r /var/lib/ceph/mon/$CLUSTER_NAME-$MONITOR_ID

    Example

    [root@monitor ~]# rm -r /var/lib/ceph/mon/ceph-node3

1.2.6. Removing a Ceph Monitor from an unhealthy storage cluster

This procedure removes a ceph-mon daemon from an unhealthy storage cluster. An unhealthy storage cluster that has placement groups persistently not active + clean.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Having root access to the monitor node.
  • At least one running Ceph Monitor node.

Procedure

  1. Identify a surviving monitor and log in to that node:

    [root@monitor ~]# ceph mon dump
    [root@monitor ~]# ssh $MONITOR_HOST_NAME
  2. Stop the ceph-mon daemon and extract a copy of the monmap file. :

    Syntax

    systemctl stop ceph-mon@$MONITOR_ID
    ceph-mon -i $MONITOR_ID --extract-monmap $TEMPORARY_PATH

    Example

    [root@monitor ~]# systemctl stop ceph-mon@node1
    [root@monitor ~]# ceph-mon -i node1 --extract-monmap /tmp/monmap

  3. Remove the non-surviving monitor(s):

    Syntax

    monmaptool $TEMPORARY_PATH --rm $MONITOR_ID

    Example

    [root@monitor ~]# monmaptool /tmp/monmap --rm node2

  4. Inject the surviving monitor map with the removed monitor(s) into the surviving monitor:

    Syntax

    ceph-mon -i $MONITOR_ID --inject-monmap $TEMPORARY_PATH

    Example

    [root@monitor ~]# ceph-mon -i node1 --inject-monmap /tmp/monmap

1.3. Ceph OSDs

When a Red Hat Ceph Storage cluster is up and running, you can add OSDs to the storage cluster at runtime.

A Ceph OSD generally consists of one ceph-osd daemon for one storage drive and its associated journal within a node. If a node has multiple storage drives, then map one ceph-osd daemon for each drive.

Red Hat recommends checking the capacity of a cluster regularly to see if it is reaching the upper end of its storage capacity. As a storage cluster reaches its near full ratio, add one or more OSDs to expand the storage cluster’s capacity.

When you want to reduce the size of a Red Hat Ceph Storage cluster or replace the hardware, you can also remove an OSD at runtime. If the node has multiple storage drives, you might also need to remove one of the ceph-osd daemon for that drive. Generally, it’s a good idea to check the capacity of the storage cluster to see if you are reaching the upper end of its capacity. Ensure that when you remove an OSD that the storage cluster is not at its near full ratio.

Important

Do not let a storage cluster reach the full ratio before adding an OSD. OSD failures that occur after the storage cluster reaches the near full ratio can cause the storage cluster to exceed the full ratio. Ceph blocks write access to protect the data until you resolve the storage capacity issues. Do not remove OSDs without considering the impact on the full ratio first.

1.3.1. Ceph OSD node configuration

Ceph OSDs and their supporting hardware should be similarly configured as a storage strategy for the pool(s) that will use the OSDs. Ceph prefers uniform hardware across pools for a consistent performance profile. For best performance, consider a CRUSH hierarchy with drives of the same type or size. See the Storage Strategies guide for more details.

If you add drives of dissimilar size, then you will need to adjust their weights accordingly. When you add the OSD to the CRUSH map, consider the weight for the new OSD. Hard drive capacity grows approximately 40% per year, so newer OSD nodes might have larger hard drives than older nodes in the storage cluster, that is, they might have a greater weight.

Before doing a new installation, review the Requirements for Installing Red Hat Ceph Storage chapter in the Installation Guide for Red Hat Enterprise Linux or Ubuntu.

1.3.2. Adding a Ceph OSD using Ansible with the same disk topology

For Ceph OSDs with the same disk topology, Ansible will add the same number of OSDs as other OSD nodes using the same device paths specified in the devices: section of the /usr/share/ceph-ansible/group_vars/osds file.

Note

The new Ceph OSD node(s) will have the same configuration as the rest of the OSDs.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Review the Requirements for Installing Red Hat Ceph Storage chapter in the Installation Guide for Red Hat Enterprise Linux or Ubuntu.
  • Having root access to the new nodes.
  • The same number of OSD data drives as other OSD nodes in the storage cluster.

Procedure

  1. Add the Ceph OSD node(s) to the /etc/ansible/hosts file, under the [osds] section:

    Example

    [osds]
    ...
    osd06
    $NEW_OSD_NODE_NAME

  2. Verify that Ansible can reach the Ceph nodes:

    [user@admin ~]$ ansible all -m ping
  3. Navigate to the Ansible configuration directory:

    [user@admin ~]$ cd /usr/share/ceph-ansible
  4. Run the Ansible playbook:

    [user@admin ~]$ ansible-playbook site.yml
  5. For containerized deployments of Ceph, run the following playbook:

    [user@admin ~]$ ansible-playbook site-docker.yml
    Note

    When adding an OSD, if the playbook fails with PGs were not reported as active+clean, configure the following variables in the all.yml file to adjust the retries and delay:

    # OSD handler checks
    handler_health_osd_check_retries: 40
    handler_health_osd_check_delay: 30

1.3.3. Adding a Ceph OSD using Ansible with different disk topologies

For Ceph OSDs with different disk topologies, there are two approaches for adding the new OSD node(s) to an existing storage cluster.

Prerequisites

Procedure

  1. First Approach

    1. Add the new Ceph OSD node(s) to the /etc/ansible/hosts file, under the [osds] section:

      Example

      [osds]
      ...
      osd06
      $NEW_OSD_NODE_NAME

    2. Create a new file for each new Ceph OSD node added to the storage cluster, under the /etc/ansible/host_vars/ directory:

      Syntax

      touch /etc/ansible/host_vars/$NEW_OSD_NODE_NAME

      Example

      [root@admin ~]# touch /etc/ansible/host_vars/osd07

    3. Edit the new file, and add the devices: and dedicated_devices: sections to the file. Under each of these sections add a -, space, then the full path to the block device names for this OSD node:

      Example

      devices:
        - /dev/sdc
        - /dev/sdd
        - /dev/sde
        - /dev/sdf
      
      dedicated_devices:
        - /dev/sda
        - /dev/sda
        - /dev/sdb
        - /dev/sdb

    4. Verify that Ansible can reach all the Ceph nodes:

      [user@admin ~]$ ansible all -m ping
    5. Change directory to the Ansible configuration directory:

      [user@admin ~]$ cd /usr/share/ceph-ansible
    6. Run the Ansible playbook:

      [user@admin ceph-ansible]$ ansible-playbook site.yml
  2. Second Approach

    1. Add the new OSD node name to the /etc/ansible/hosts file, and use the devices and dedicated_devices options, specifying the different disk topology:

      Example

      [osds]
      ...
      osd07 devices="['/dev/sdc', '/dev/sdd', '/dev/sde', '/dev/sdf']" dedicated_devices="['/dev/sda', '/dev/sda', '/dev/sdb', '/dev/sdb']"

    2. Verify that Ansible can reach the all Ceph nodes:

      [user@admin ~]$ ansible all -m ping
    3. Change directory to the Ansible configuration directory:

      [user@admin ~]$ cd /usr/share/ceph-ansible
    4. Run the Ansible playbook:

      [user@admin ceph-ansible]$ ansible-playbook site.yml

1.3.4. Adding a Ceph OSD using the command-line interface

Here is the high-level workflow for manually adding an OSD to a Red Hat Ceph Storage:

  1. Install the ceph-osd package and create a new OSD instance
  2. Prepare and mount the OSD data and journal drives
  3. Add the new OSD node to the CRUSH map
  4. Update the owner and group permissions
  5. Enable and start the ceph-osd daemon
Important

The ceph-disk command is deprecated. The ceph-volume command is now the preferred method for deploying OSDs from the command-line interface. Currently, the ceph-volume command only supports the lvm plugin. Red Hat will provide examples throughout this guide using both commands as a reference, allowing time for storage administrators to convert any custom scripts that rely on ceph-disk to ceph-volume instead.

See the Red Hat Ceph Storage Administration Guide, for more information on using the ceph-volume command.

Note

For custom storage cluster names, use the --cluster $CLUSTER_NAME option with the ceph and ceph-osd commands.

Prerequisites

Procedure

  1. Enable the Red Hat Ceph Storage 3 OSD software repository.

    Red Hat Enterprise Linux

    [root@osd ~]# subscription-manager repos --enable=rhel-7-server-rhceph-3-osd-rpms

    Ubuntu

    [user@osd ~]$ sudo bash -c 'umask 0077; echo deb https://customername:customerpasswd@rhcs.download.redhat.com/3-updates/Tools $(lsb_release -sc) main | tee /etc/apt/sources.list.d/Tools.list'
    [user@osd ~]$ sudo bash -c 'wget -O - https://www.redhat.com/security/fd431d51.txt | apt-key add -'

  2. Create the /etc/ceph/ directory:

    # mkdir /etc/ceph
  3. On the new OSD node, copy the Ceph administration keyring and configuration files from one of the Ceph Monitor nodes:

    Syntax

    scp $USER_NAME@$MONITOR_HOST_NAME:/etc/ceph/$CLUSTER_NAME.client.admin.keyring /etc/ceph
    scp $USER_NAME@$MONITOR_HOST_NAME:/etc/ceph/$CLUSTER_NAME.conf /etc/ceph

    [root@osd ~]# scp root@node1:/etc/ceph/ceph.client.admin.keyring /etc/ceph/
    [root@osd ~]# scp root@node1:/etc/ceph/ceph.conf /etc/ceph/
  4. Install the ceph-osd package on the new Ceph OSD node:

    [root@osd ~]# yum install ceph-osd
    [user@osd ~]$ sudo apt-get install ceph-osd
  5. Decide if you want to collocate a journal or use a dedicated journal for the new OSDs.

    Note

    The --filestore option is required.

    1. For OSDs with a collocated journal:

      Syntax

      [root@osd ~]# ceph-disk --setuser ceph --setgroup ceph prepare  --filestore /dev/$DEVICE_NAME

      Examples

      [root@osd ~]# ceph-disk --setuser ceph --setgroup ceph prepare  --filestore /dev/sda

    2. For OSDs with a dedicated journal:

      Syntax

      [root@osd ~]# ceph-disk --setuser ceph --setgroup ceph prepare  --filestore /dev/$DEVICE_NAME /dev/$JOURNAL_DEVICE_NAME

      or

      [root@osd ~]# ceph-volume lvm prepare  --filestore --data /dev/$DEVICE_NAME --journal /dev/$JOURNAL_DEVICE_NAME

      Examples

      [root@osd ~]# ceph-disk --setuser ceph --setgroup ceph prepare  --filestore /dev/sda /dev/sdb

      [root@osd ~]# ceph-volume lvm prepare  --filestore --data /dev/vg00/lvol1 --journal /dev/sdb
  6. Activate the new OSD:

    Syntax

    [root@osd ~]# ceph-disk activate /dev/$DEVICE_NAME

    or

    [root@osd ~]# ceph-volume lvm activate --filestore $OSD_ID $OSD_FSID

    Example

    [root@osd ~]# ceph-disk activate /dev/sda

    [root@osd ~]# ceph-volume lvm activate --filestore 0 6cc43680-4f6e-4feb-92ff-9c7ba204120e
  7. Add the OSD to the CRUSH map:

    Syntax

    ceph osd crush add $OSD_ID $WEIGHT [$BUCKET_TYPE=$BUCKET_NAME ...]

    [root@osd ~]# ceph osd crush add 4 1 host=node4
    Note

    If you specify more than one bucket, the command places the OSD into the most specific bucket out of those you specified, and it moves the bucket underneath any other buckets you specified.

    Note

    You can also edit the CRUSH map manually. See the Editing a CRUSH map section in the Storage Strategies guide for Red Hat Ceph Storage 3.

    Important

    If you specify only the root bucket, then the OSD attaches directly to the root, but the CRUSH rules expect OSDs to be inside of the host bucket.

  8. Update the owner and group permissions for the newly created directories:

    Syntax

    chown -R $OWNER:$GROUP $PATH_TO_DIRECTORY

    Example

    [root@osd ~]# chown -R ceph:ceph /var/lib/ceph/osd
    [root@osd ~]# chown -R ceph:ceph /var/log/ceph
    [root@osd ~]# chown -R ceph:ceph /var/run/ceph
    [root@osd ~]# chown -R ceph:ceph /etc/ceph

  9. If you use clusters with custom names, then add the following line to the appropriate file:

    [root@osd ~]# echo "CLUSTER=$CLUSTER_NAME" >> /etc/sysconfig/ceph
    [user@osd ~]$ sudo echo "CLUSTER=$CLUSTER_NAME" >> /etc/default/ceph

    Replace $CLUSTER_NAME with the custom cluster name.

  10. To ensure that the new OSD is up and ready to receive data, enable and start the OSD service:

    Syntax

    systemctl enable ceph-osd@$OSD_ID
    systemctl start ceph-osd@$OSD_ID

    [root@osd ~]# systemctl enable ceph-osd@4
    [root@osd ~]# systemctl start ceph-osd@4

1.3.5. Removing a Ceph OSD using Ansible

At times, you might need to scale down the capacity of a Red Hat Ceph Storage cluster. To remove an OSD from a Red Hat Ceph Storage cluster using Ansible, run the shrink-osd.yml playbook.

Important

Removing an OSD from the storage cluster will destroy all the data contained on that OSD.

Prerequisites

  • A running Red Hat Ceph Storage deployed by Ansible.
  • A running Ansible administration node.

Procedure

  1. Change to the /usr/share/ceph-ansible/ directory.

    [user@admin ~]$ cd /usr/share/ceph-ansible
  2. Copy the shrink-osd.yml playbook from the infrastructure-playbooks directory to the current directory.

    [root@admin ceph-ansible]# cp infrastructure-playbooks/shrink-osd.yml .
  3. Run the Ansible playbook:

    ansible-playbook shrink-osd.yml -e osd_to_kill=$ID -u $ANSIBLE_USER

    Replace:

    • $ID with the ID of the OSD node. To remove more OSDs, separate the OSD IDs with a comma.
    • $ANSIBLE_USER with the name of the Ansible user

    Example

    [user@admin ceph-ansible]$ ansible-playbook shrink-osd.yml -e osd_to_kill=1 -u user

  4. Verify that the OSD has been successfully removed:

    [root@monitor ~]# ceph osd tree

Additional Resources

1.3.6. Removing a Ceph OSD using the command-line interface

Removing an OSD from a storage cluster involves updating the cluster map, removing its authentication key, removing the OSD from the OSD map, and removing the OSD from the ceph.conf file. If the node has multiple drives, you might need to remove an OSD for each drive by repeating this procedure.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Enough available OSDs so that the storage cluster is not at its near full ratio.
  • Having root access to the OSD node.

Procedure

  1. Disable and stop the OSD service:

    Syntax

    systemctl disable ceph-osd@$OSD_ID
    systemctl stop ceph-osd@$OSD_ID

    Example

    [root@osd ~]# systemctl disable ceph-osd@4
    [root@osd ~]# systemctl stop ceph-osd@4

    Once the OSD is stopped, it is down.

  2. Remove the OSD from the storage cluster:

    Syntax

    ceph osd out $OSD_ID

    Example

    [root@osd ~]# ceph osd out 4

    Important

    Once the OSD is out, Ceph will start rebalancing and copying data to other OSDs in the storage cluster. Red Hat recommends waiting until the storage cluster becomes active+clean before proceeding to the next step. To observe the data migration, run the following command:

    [root@monitor ~]# ceph -w
  3. Remove the OSD from the CRUSH map so that it no longer receives data.

    Syntax

    ceph osd crush remove $OSD_NAME

    Example

    [root@osd ~]# ceph osd crush remove osd.4

    Note

    You can also decompile the CRUSH map, remove the OSD from the device list, remove the device as an item in the host bucket or remove the host bucket. If it is in the CRUSH map and you intend to remove the host, recompile the map and set it. See the Storage Strategies Guide for details.

  4. Remove the OSD authentication key:

    Syntax

    ceph auth del osd.$OSD_ID

    Example

    [root@osd ~]# ceph auth del osd.4

  5. Remove the OSD:

    Syntax

    ceph osd rm $OSD_ID

    Example

    [root@osd ~]# ceph osd rm 4

  6. Edit the storage cluster’s configuration file, by default /etc/ceph.conf, and remove the OSD entry, if it exists:

    Example

    [osd.4]
    host = $HOST_NAME

  7. Remove the reference to the OSD in the /etc/fstab file, if the OSD was added manually.
  8. Copy the updated configuration file to the /etc/ceph/ directory of all other nodes in the storage cluster.

    Syntax

    scp /etc/ceph/$CLUSTER_NAME.conf $USER_NAME@$HOST_NAME:/etc/ceph/

    [root@osd ~]# scp /etc/ceph/ceph.conf root@node4:/etc/ceph/

1.3.7. Observing the data migration

When you add or remove an OSD to the CRUSH map, Ceph begins rebalancing the data by migrating placement groups to the new or existing OSD(s).

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Recently added or removed an OSD.

Procedure

  1. To observe the data migration:

    [root@monitor ~]# ceph -w
  2. Watch as the placement group states change from active+clean to active, some degraded objects, and finally active+clean when migration completes.
  3. To exit the utility, press Ctrl + C.

1.4. Recalculating the placement groups

Placement groups (PGs) define the spread of any pool data across the available OSDs. A placement group is build upon the given redundancy algorithm to be used. For a 3-way replication, the redundancy is defined to use three different OSDs. For erasure-coded pools, the number of OSDs to use is defined by the number of chunks.

When defining a pool the number of placement groups defines the grade of granularity the data is spread with across all available OSDs. The higher the number the better the equalization of capacity load can be. However, since handling the placement groups is also important in case of reconstruction of data, the number is significant to be carefully chosen upfront. To support calculation a tool is available to produce agile environments.

During lifetime of a storage cluster a pool may grow above the initially anticipated limits. With the growing number of drives a recalculation is recommended. The number of placement groups per OSD should be around 100. When adding more OSDs to the storage cluster the number of PGs per OSD will lower over time. Starting with 120 drives initially in the storage cluster and setting the pg_num of the pool to 4000 will end up in 100 PGs per OSD, given with the replication factor of three. Over time, when growing to ten times the number of OSDs, the number of PGs per OSD will go down to ten only. Because small number of PGs per OSD will tend to an unevenly distributed capacity, consider adjusting the PGs per pool.

Adjusting the number of placement groups can be done online. Recalculating is not only a recalculation of the PG numbers, but will involve data relocation, which will be a lengthy process. However, the data availability will be maintained at any time.

Very high numbers of PGs per OSD should be avoided, because reconstruction of all PGs on a failed OSD will start at once. A high number of IOPS is required to perform reconstruction in a timely manner, which might not be available. This would lead to deep I/O queues and high latency rendering the storage cluster unusable or will result in long healing times.

Additional Resources

  • See the PG calculator for calculating the values by a given use case.
  • See the Erasure Code Pools chapter in the Red Hat Ceph Storage Strategies Guide for more information.

1.5. Additional Resources