Chapter 6. Troubleshooting Ceph MDSs

As a storage administrator, you can troubleshoot the most common issues that can occur when using the Ceph Metadata Server (MDS). Some of the common errors that you might encounter:

  • An MDS node failure requiring a new MDS deployment.
  • An MDS node issue requiring redeployment of an MDS node.

6.1. Redeploying a Ceph MDS

Ceph Metadata Server (MDS) daemons are necessary for deploying a Ceph File System. If an MDS node in your cluster fails, you can redeploy a Ceph Metadata Server by removing an MDS server and adding a new or existing server. You can use the command-line interface or Ansible playbook to add or remove an MDS server.

6.1.1. Prerequisites

  • A running Red Hat Ceph Storage cluster.

6.1.2. Removing a Ceph MDS using Ansible

To remove a Ceph Metadata Server (MDS) using Ansible, use the shrink-mds playbook.

Note

If there is no replacement MDS to take over once the MDS is removed, the file system will become unavailable to clients. If that is not desirable, consider adding an additional MDS before removing the MDS you would like to take offline.

Prerequisites

  • At least one MDS node.
  • A running Red Hat Ceph Storage cluster deployed by Ansible.
  • Root or sudo access to an Ansible administration node.

Procedure

  1. Log in to the Ansible administration node.
  2. Change to the /usr/share/ceph-ansible directory:

    Example

    [ansible@admin ~]$ cd /usr/share/ceph-ansible

  3. Run the Ansible shrink-mds.yml playbook, and when prompted, type yes to confirm shrinking the cluster:

    Syntax

    ansible-playbook infrastructure-playbooks/shrink-mds.yml -e mds_to_kill=ID -i hosts

    Replace ID with the ID of the MDS node you want to remove. You can remove only one Ceph MDS each time the playbook runs.

    Example

    [ansible @admin ceph-ansible]$ ansible-playbook infrastructure-playbooks/shrink-mds.yml -e mds_to_kill=node02 -i hosts

  4. As root or with sudo access, open and edit the /usr/share/ceph-ansible/hosts inventory file and remove the MDS node under the [mdss] section:

    Syntax

    [mdss]
    MDS_NODE_NAME
    MDS_NODE_NAME

    Example

    [mdss]
    node01
    node03

    In this example, node02 was removed from the [mdss] list.

Verification

  • Check the status of the MDS daemons:

    Syntax

    ceph fs dump

    Example

    [ansible@admin ceph-ansible]$ ceph fs dump
    
    [mds.node01 {0:115304} state up:active seq 5 addr [v2:172.25.250.10:6800/695510951,v1:172.25.250.10:6801/695510951]]
    
    Standby daemons:
    [mds.node03 {-1:144437} state up:standby seq 2 addr [v2:172.25.250.11:6800/172950087,v1:172.25.250.11:6801/172950087]]

Additional Resources

6.1.3. Removing a Ceph MDS using the command-line interface

You can manually remove a Ceph Metadata Server (MDS) using the command-line interface.

Note

If there is no replacement MDS to take over once the current MDS is removed, the file system will become unavailable to clients. If that is not desirable, consider adding an MDS before removing the existing MDS.

Prerequisites

  • The ceph-common package is installed.
  • A running Red Hat Ceph Storage cluster.
  • Root or sudo access to the MDS nodes.

Procedure

  1. Log into the Ceph MDS node that you want to remove the MDS daemon from.
  2. Stop the Ceph MDS service:

    Syntax

    sudo systemctl stop ceph-mds@HOST_NAME

    Replace HOST_NAME with the short name of the host where the daemon is running.

    Example

    [admin@node02 ~]$ sudo systemctl stop ceph-mds@node02

  3. Disable the MDS service if you are not redeploying MDS to this node:

    Syntax

    sudo systemctl disable ceph-mds@HOST_NAME

    Replace HOST_NAME with the short name of the host to disable the daemon.

    Example

    [admin@node02 ~]$ sudo systemctl disable ceph-mds@node02

  4. Remove the /var/lib/ceph/mds/ceph-MDS_ID directory on the MDS node:

    Syntax

    sudo rm -fr /var/lib/ceph/mds/ceph-MDS_ID

    Replace MDS_ID with the ID of the MDS node that you want to remove the MDS daemon from.

    Example

    [admin@node02 ~]$ sudo rm -fr /var/lib/ceph/mds/ceph-node02

Verification

  • Check the status of the MDS daemons:

    Syntax

    ceph fs dump

    Example

    [ansible@admin ceph-ansible]$ ceph fs dump
    
    [mds.node01 {0:115304} state up:active seq 5 addr [v2:172.25.250.10:6800/695510951,v1:172.25.250.10:6801/695510951]]
    
    Standby daemons:
    [mds.node03 {-1:144437} state up:standby seq 2 addr [v2:172.25.250.11:6800/172950087,v1:172.25.250.11:6801/172950087]]

Additional Resources

6.1.4. Adding a Ceph MDS using Ansible

Use the Ansible playbook to add a Ceph Metadata Server (MDS).

Prerequisites

  • A running Red Hat Ceph Storage cluster deployed by Ansible.
  • Root or sudo access to an Ansible administration node.
  • New or existing servers that can be provisioned as MDS nodes.

Procedure

  1. Log in to the Ansible administration node
  2. Change to the /usr/share/ceph-ansible directory:

    Example

    [ansible@admin ~]$ cd /usr/share/ceph-ansible

  3. As root or with sudo access, open and edit the /usr/share/ceph-ansible/hosts inventory file and add the MDS node under the [mdss] section:

    Syntax

    [mdss]
    MDS_NODE_NAME
    NEW_MDS_NODE_NAME

    Replace NEW_MDS_NODE_NAME with the host name of the node where you want to install the MDS server.

    Alternatively, you can colocate the MDS daemon with the OSD daemon on one node by adding the same node under the [osds] and [mdss] sections.

    Example

    [mdss]
    node01
    node03

  4. As the ansible user, run the Ansible playbook to provision the MDS node:

    • Bare-metal deployments:

      [ansible@admin ceph-ansible]$ ansible-playbook site.yml --limit mdss -i hosts
    • Container deployments:

      [ansible@admin ceph-ansible]$ ansible-playbook site-container.yml --limit mdss -i hosts

      After the Ansible playbook has finished running, the new Ceph MDS node appears in the storage cluster.

Verification

  • Check the status of the MDS daemons:

    Syntax

    ceph fs dump

    Example

    [ansible@admin ceph-ansible]$ ceph fs dump
    
    [mds.node01 {0:115304} state up:active seq 5 addr [v2:172.25.250.10:6800/695510951,v1:172.25.250.10:6801/695510951]]
    
    Standby daemons:
    [mds.node03 {-1:144437} state up:standby seq 2 addr [v2:172.25.250.11:6800/172950087,v1:172.25.250.11:6801/172950087]]

  • Alternatively, you can use the ceph mds stat command to check if the MDS is in an active state:

    Syntax

    ceph mds stat

    Example

    [ansible@admin ceph-ansible]$ ceph mds stat
    cephfs:1 {0=node01=up:active} 1 up:standby

Additional Resources

6.1.5. Adding a Ceph MDS using the command-line interface

You can manually add a Ceph Metadata Server (MDS) using the command-line interface.

Prerequisites

  • The ceph-common package is installed.
  • A running Red Hat Ceph Storage cluster.
  • Root or sudo access to the MDS nodes.
  • New or existing servers that can be provisioned as MDS nodes.

Procedure

  1. Add a new MDS node by logging into the node and creating an MDS mount point:

    Syntax

    sudo mkdir /var/lib/ceph/mds/ceph-MDS_ID

    Replace MDS_ID with the ID of the MDS node that you want to add the MDS daemon to.

    Example

    [admin@node03 ~]$ sudo mkdir /var/lib/ceph/mds/ceph-node03

  2. If this is a new MDS node, create the authentication key if you are using Cephx authentication:

    Syntax

    sudo ceph auth get-or-create mds.MDS_ID mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-MDS_ID/keyring

    Replace MDS_ID with the ID of the MDS node to deploy the MDS daemon on.

    Example

    [admin@node03 ~]$ sudo ceph auth get-or-create mds.node03 mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-node03/keyring

    Note

    Cephx authentication is enabled by default. See the Cephx authentication link in the Additional Resources section for more information about Cephx authentication.

  3. Start the MDS daemon:

    Syntax

    sudo systemctl start ceph-mds@HOST_NAME

    Replace HOST_NAME with the short name of the host to start the daemon.

    Example

    [admin@node03 ~]$ sudo systemctl start ceph-mds@node03

  4. Enable the MDS service:

    Syntax

    systemctl enable ceph-mds@HOST_NAME

    Replace HOST_NAME with the short name of the host to enable the service.

    Example

    [admin@node03 ~]$ sudo systemctl enable ceph-mds@node03

Verification

  • Check the status of the MDS daemons:

    Syntax

    ceph fs dump

    Example

    [admin@mon]$ ceph fs dump
    
    [mds.node01 {0:115304} state up:active seq 5 addr [v2:172.25.250.10:6800/695510951,v1:172.25.250.10:6801/695510951]]
    
    Standby daemons:
    [mds.node03 {-1:144437} state up:standby seq 2 addr [v2:172.25.250.11:6800/172950087,v1:172.25.250.11:6801/172950087]]

  • Alternatively, you can use the ceph mds stat command to check if the MDS is in an active state:

    Syntax

    ceph mds stat

    Example

    [ansible@admin ceph-ansible]$ ceph mds stat
    cephfs:1 {0=node01=up:active} 1 up:standby

Additional Resources