Chapter 5. Ceph File System administration

As a storage administrator, you can perform common Ceph File System (CephFS) administrative tasks, such as:

Prerequisites

  • A running, and healthy Red Hat Ceph Storage cluster.
  • Installation and configuration of the Ceph Metadata Server daemons (ceph-mds).
  • Create and mount a Ceph File System.

5.1. Using the cephfs-top utility

The Ceph File System (CephFS) provides a top-like utility to display metrics on Ceph File Systems in realtime. The cephfs-top utility is a curses-based Python script that uses the Ceph Manager stats module to fetch and display client performance metrics.

Currently, the cephfs-top utility supports nearly 10k clients.

Note

Currently, not all of the performance stats are available in the Red Hat Enterprise Linux 8 kernel. cephfs-top is supported on Red Hat Enterprise Linux 8 and above and uses one of the standard terminals in Red Hat Enterprise Linux.

Important

The minimum compatible python version for cephfs-top utility is 3.6.0.

Prerequisites

  • A healthy and running Red Hat Ceph Storage cluster.
  • Deployment of a Ceph File System.
  • Root-level access to a Ceph client node.
  • Installation of the cephfs-top package.

Procedure

  1. Enable the Red Hat Ceph Storage 6 tools repository, if it is not already enabled:

    Red Hat Enterprise Linux 8

    [root@client ~]# subscription-manager repos --enable=rhceph-6-tools-for-rhel-8-x86_64-rpms

    Red Hat Enterprise Linux 9

    [root@client ~]# subscription-manager repos --enable=rhceph-6-tools-for-rhel-9-x86_64-rpms

  2. Install the cephfs-top package:

    Example

    [root@client ~]# dnf install cephfs-top

  3. Enable the Ceph Manager stats plugin:

    Example

    [root@client ~]# ceph mgr module enable stats

  4. Create the client.fstop Ceph user:

    Example

    [root@client ~]# ceph auth get-or-create client.fstop mon 'allow r' mds 'allow r' osd 'allow r' mgr 'allow r' > /etc/ceph/ceph.client.fstop.keyring

    Note

    Optionally, use the --id argument to specify a different Ceph user, other than client.fstop.

  5. Start the cephfs-top utility:

    Example

    [root@client ~]# cephfs-top
    cephfs-top - Wed Nov 30 15:26:05 2022
    
    All Filesystem Info
    Total Client(s): 4 - 3 FUSE, 1 kclient, 0 libcephfs
    COMMANDS: m - select a filesystem | s - sort menu | l - limit number of clients | r - reset to default | q - quit
    
      client_id mount_root chit(%) dlease(%) ofiles oicaps oinodes rtio(MB) raio(MB) rsp(MB/s) wtio(MB) waio(MB) wsp(MB/s) rlatavg(ms) rlatsd(ms) wlatavg(ms) wlatsd(ms) mlatavg(ms) mlatsd(ms) mount_point@host/addr
    
    Filesystem: cephfs1 - 2 client(s)
    
    
      4500     /          100.0   100.0	     0	    751    0       0.0	    0.0	     0.0	   578.13   0.03     0.0       N/A         N/A        N/A         N/A       N/A        N/A       N/A@example/192.168.1.4
      4501     /          100.0   0.0      0	    1    0       0.0	    0.0	     0.0	   0.0   0.0     0.0       0.0         0.0        0.0         0.0        0.41        0.0       /mnt/cephfs2@example/192.168.1.4
    
    Filesystem: cephfs2 - 2 client(s)
    
    
      4512     /          100.0   0.0	     0	    1      0       0.0	    0.0	     0.0	   0.0      0.0      0.0       0.0         0.0        0.0         0.0        0.4        0.0        /mnt/cephfs3@example/192.168.1.4
      4518     /          100.0   0.0	     0	    1      0       0.0	    0.0	     0.0	   0.0      0.0      0.0       0.0         0.0        0.0         0.0        0.52        0.0        /mnt/cephfs4@example/192.168.1.4

5.1.1. The cephfs-top utility interactive commands

Select a particular file system and view the metrics related to that file system with the cephfs-top utility interactive commands.

m
Description
Filesystem selection: Displays a menu of file systems for selection.
q
Description
Quit: Exits the utility if you are at the home screen with all file system information. If you are not at the home screen, it redirects you back to the home screen.
s
Description
Sort field selection: Designates the sort field. ‘cap_hit’ is the default.
l
Description
Client limit: Sets the limit on the number of clients to be displayed.
r
Description
Reset: Resets the sort field and limit value to the default.

The metrics display can be scrolled using the Arrow Keys, PgUp/PgDn, Home/End and mouse.

Example of entering and exiting the file system selection menu

[root@client ~]# m

                       Filesystems
Press "q" to go back to home (all filesystem info) screen



                        cephfs01
                        cephfs02
[root@client ~]# q

cephfs-top - Thu Oct 20 07:29:35 2022
Total Client(s): 3 - 2 FUSE, 1 kclient, 0 libcephfs

5.1.2. The cephfs-top utility options

You can use the cephfs-top utility command with various options.

Example

[root@client ~]# cephfs-top --selftest
selftest ok

--cluster NAME_OF_THE_CLUSTER
Description
With this option, you can connect to the non-default cluster name. The default name is ceph.
--id USER
Description
This is a client which connects to the Ceph cluster and is fstop by default.
--selftest
Description
With this option, you can perform a selftest. This mode performs a sanity check of stats module.
--conffile PATH_TO_THE_CONFIGURATION_FILE
Description
With this option, you can provide a path to the Ceph cluster configuration file.
-d/--delay INTERVAL_IN_SECONDS
Description

The cephfs-top utility refreshes statistics every second by default. With this option, you can change a refresh interval.

Note

Interval should be greater than or equal to 1 seconds. Fractional seconds are honored.

--dump
Description
With this option, you can dump the metrics to stdout without creating a curses display use.
--dumpfs FILESYSTEM_NAME
Description
With this option, you can dump the metrics of the given filesystem to stdout without creating a curses display use.

5.2. Using the MDS autoscaler module

The MDS Autoscaler Module monitors the Ceph File System (CephFS) to ensure sufficient MDS daemons are available. It works by adjusting the placement specification for the Orchestrator backend of the MDS service.

The module monitors the following file system settings to inform placement count adjustments:

  • max_mds file system setting
  • standby_count_wanted file system setting

The Ceph monitor daemons are still responsible for promoting or stopping MDS according to these settings. The mds_autoscaler simply adjusts the number of MDS which are spawned by the orchestrator.

Prerequisites

  • A healthy and running Red Hat Ceph Storage cluster.
  • Deployment of a Ceph File System.
  • Root-level access to a Ceph Monitor node.

Procedure

  • Enable the MDS autoscaler module:

    Example

    [ceph: root@host01 /]# ceph mgr module enable mds_autoscaler

5.3. Unmounting Ceph File Systems mounted as kernel clients

How to unmount a Ceph File System that is mounted as a kernel client.

Prerequisites

  • Root-level access to the node doing the mounting.

Procedure

  • To unmount a Ceph File System mounted as a kernel client:

    Syntax

    umount MOUNT_POINT

    Example

    [root@client ~]# umount /mnt/cephfs

Additional Resources

  • The umount(8) manual page

5.4. Unmounting Ceph File Systems mounted as FUSE clients

Unmounting a Ceph File System that is mounted as a File System in User Space (FUSE) client.

Prerequisites

  • Root-level access to the FUSE client node.

Procedure

  • To unmount a Ceph File System mounted in FUSE:

    Syntax

    fusermount -u MOUNT_POINT

    Example

    [root@client ~]# fusermount -u /mnt/cephfs

Additional Resources

  • The ceph-fuse(8) manual page

5.5. Mapping directory trees to Metadata Server daemon ranks

You can map a directory and its subdirectories to a particular active Metadata Server (MDS) rank so that its metadata is only managed by the MDS daemon holding that rank. This approach enables you to evenly spread application load or the limit impact of users' metadata requests to the entire storage cluster.

Important

An internal balancer already dynamically spreads the application load. Therefore, only map directory trees to ranks for certain carefully chosen applications.

In addition, when a directory is mapped to a rank, the balancer cannot split it. Consequently, a large number of operations within the mapped directory can overload the rank and the MDS daemon that manages it.

Prerequisites

  • At least two active MDS daemons.
  • User access to the CephFS client node.
  • Verify that the attr package is installed on the CephFS client node with a mounted Ceph File System.

Procedure

  1. Add the p flag to the Ceph user’s capabilities:

    Syntax

    ceph fs authorize FILE_SYSTEM_NAME client.CLIENT_NAME /DIRECTORY CAPABILITY [/DIRECTORY CAPABILITY] ...

    Example

    [user@client ~]$ ceph fs authorize cephfs_a client.1 /temp rwp
    
    client.1
      key: AQBSdFhcGZFUDRAAcKhG9Cl2HPiDMMRv4DC43A==
      caps: [mds] allow r, allow rwp path=/temp
      caps: [mon] allow r
      caps: [osd] allow rw tag cephfs data=cephfs_a

  2. Set the ceph.dir.pin extended attribute on a directory:

    Syntax

    setfattr -n ceph.dir.pin -v RANK DIRECTORY

    Example

    [user@client ~]$ setfattr -n ceph.dir.pin -v 2 /temp

    This example assigns the /temp directory and all of its subdirectories to rank 2.

Additional Resources

5.6. Disassociating directory trees from Metadata Server daemon ranks

Disassociate a directory from a particular active Metadata Server (MDS) rank.

Prerequisites

  • User access to the Ceph File System (CephFS) client node.
  • Ensure that the attr package is installed on the client node with a mounted CephFS.

Procedure

  • Set the ceph.dir.pin extended attribute to -1 on a directory:

    Syntax

    setfattr -n ceph.dir.pin -v -1 DIRECTORY

    Example

    [user@client ~]$ setfattr -n ceph.dir.pin -v -1 /home/ceph-user

    Note

    Any separately mapped subdirectories of /home/ceph-user/ are not affected.

Additional Resources

5.7. Adding data pools

The Ceph File System (CephFS) supports adding more than one pool to be used for storing data. This can be useful for:

  • Storing log data on reduced redundancy pools.
  • Storing user home directories on an SSD or NVMe pool.
  • Basic data segregation.

Before using another data pool in the Ceph File System, you must add it as described in this section.

By default, for storing file data, CephFS uses the initial data pool that was specified during its creation. To use a secondary data pool, you must also configure a part of the file system hierarchy to store file data in that pool or optionally within a namespace of that pool, using file and directory layouts.

Prerequisites

  • Root-level access to the Ceph Monitor node.

Procedure

  1. Create a new data pool:

    Syntax

    ceph osd pool create POOL_NAME

    Replace:

    • POOL_NAME with the name of the pool.

    Example

    [ceph: root@host01 /]# ceph osd pool create cephfs_data_ssd
    
    pool 'cephfs_data_ssd' created

  2. Add the newly created pool under the control of the Metadata Servers:

    Syntax

    ceph fs add_data_pool FS_NAME POOL_NAME

    Replace:

    • FS_NAME with the name of the file system.
    • POOL_NAME with the name of the pool.

    Example:

    [ceph: root@host01 /]# ceph fs add_data_pool cephfs cephfs_data_ssd
    
    added data pool 6 to fsmap

  3. Verify that the pool was successfully added:

    Example

    [ceph: root@host01 /]# ceph fs ls
    
    name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data cephfs_data_ssd]

  4. Optional: Remove a data pool from the file system:

    Syntax

    ceph fs rm_data_pool FS_NAME POOL_NAME

    Example:

    [ceph: root@host01 /]# ceph fs rm_data_pool cephfs cephfs_data_ssd
    
    removed data pool 6 from fsmap

    1. Verify that the pool was successfully removed:

      Example

      [ceph: root@host01 /]# ceph fs ls
      
      name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs.cephfs.data]

  5. If you use the cephx authentication, make sure that clients can access the new pool.

Additional Resources

5.8. Taking down a Ceph File System cluster

You can take down Ceph File System (CephFS) cluster by setting the down flag to true. Doing this gracefully shuts down the Metadata Server (MDS) daemons by flushing journals to the metadata pool and stopping all client I/O.

You can also take the CephFS cluster down quickly to test the deletion of a file system and bring the Metadata Server (MDS) daemons down, for example, when practicing a disaster recovery scenario. Doing this sets the jointable flag to prevent the MDS standby daemons from activating the file system.

Prerequisites

  • Root-level access to a Ceph Monitor node.

Procedure

  1. To mark the CephFS cluster down:

    Syntax

    ceph fs set FS_NAME down true

    Example

    [ceph: root@host01 /]# ceph fs set cephfs down true

    1. To bring the CephFS cluster back up:

      Syntax

      ceph fs set FS_NAME down false

      Example

      [ceph: root@host01 /]# ceph fs set cephfs down false

    or

  1. To quickly take down a CephFS cluster:

    Syntax

    ceph fs fail FS_NAME

    Example

    [ceph: root@host01 /]# ceph fs fail cephfs

    Note

    To get the CephFS cluster back up, set cephfs to joinable:

    Syntax

    ceph fs set FS_NAME joinable true

    Example

    [ceph: root@host01 /]# ceph fs set cephfs joinable true
    
    cephfs marked joinable; MDS may join as newly active.

5.9. Removing a Ceph File System

You can remove a Ceph File System (CephFS). Before doing so, consider backing up all the data and verifying that all clients have unmounted the file system locally.

Warning

This operation is destructive and will make the data stored on the Ceph File System permanently inaccessible.

Prerequisites

  • Back up your data.
  • Root-level access to a Ceph Monitor node.

Procedure

  1. Mark the storage cluster as down:

    Syntax

    ceph fs set FS_NAME down true

    Replace
    • FS_NAME with the name of the Ceph File System you want to remove.

    Example

    [ceph: root@host01 /]# ceph fs set cephfs down true
    
    cephfs marked down.

  2. Display the status of the Ceph File System:

    ceph fs status

    Example

    [ceph: root@host01 /]# ceph fs status
    
    cephfs - 0 clients
    ======
    +-------------------+----------+-------+-------+
    |       POOL        |   TYPE   |  USED | AVAIL |
    +-----------------+------------+-------+-------+
    |cephfs.cephfs.meta | metadata | 31.5M |  52.6G|
    |cephfs.cephfs.data |   data   |    0  |  52.6G|
    +-----------------+----------+-------+---------+
                   STANDBY MDS
    cephfs.ceph-host01
    cephfs.ceph-host02
    cephfs.ceph-host03

  3. Remove the Ceph File System:

    Syntax

    ceph fs rm FS_NAME --yes-i-really-mean-it

    Replace
    • FS_NAME with the name of the Ceph File System you want to remove.

    Example

    [ceph: root@host01 /]# ceph fs rm cephfs --yes-i-really-mean-it

  4. Verify that the file system has been successfully removed:

    Example

    [ceph: root@host01 /]# ceph fs ls

  5. Optional. Remove data and metadata pools associated with the removed file system.

Additional Resources

  • See the Delete a Pool section in the Red Hat Ceph Storage Storage Strategies Guide.

5.10. Using the ceph mds fail command

Use the ceph mds fail command to:

  • Mark a MDS daemon as failed. If the daemon was active and a suitable standby daemon was available, and if the standby daemon was active after disabling the standby-replay configuration, using this command forces a failover to the standby daemon. By disabling the standby-replay daemon, this prevents new standby-replay daemons from being assigned.
  • Restart a running MDS daemon. If the daemon was active and a suitable standby daemon was available, the "failed" daemon becomes a standby daemon.

Prerequisites

  • Installation and configuration of the Ceph MDS daemons.

Procedure

  • To fail a daemon:

    Syntax

    ceph mds fail MDS_NAME

    Where MDS_NAME is the name of the standby-replay MDS node.

    Example

    [ceph: root@host01 /]# ceph mds fail example01

    Note

    You can find the Ceph MDS name from the ceph fs status command.

Additional Resources

5.11. Client features

At times you might want to set Ceph File System (CephFS) features that clients must support to enable them to use Ceph File Systems. Clients without these features might disrupt other CephFS clients, or behave in unexpected ways. Also, you might want to require new features to prevent older, and possibly buggy clients from connecting to a Ceph File System.

Important

CephFS clients missing newly added features are evicted automatically.

You can list all the CephFS features by using the fs features ls command. You can add or remove requirements by using the fs required_client_features command.

Syntax

fs required_client_features FILE_SYSTEM_NAME add FEATURE_NAME
fs required_client_features FILE_SYSTEM_NAME rm FEATURE_NAME

Feature Descriptions

reply_encoding
Description
The Ceph Metadata Server (MDS) encodes reply requests in extensible format, if the client supports this feature.
reclaim_client
Description
The Ceph MDS allows a new client to reclaim another, perhaps a dead, client’s state. This feature is used by NFS Ganesha.
lazy_caps_wanted
Description
When a stale client resumes, the Ceph MDS only needs to re-issue the capabilities that are explicitly wanted, if the client supports this feature.
multi_reconnect
Description
After a Ceph MDS failover event, the client sends a reconnect message to the MDS to reestablish cache states. A client can split large reconnect messages into multiple messages.
deleg_ino
Description
A Ceph MDS delegates inode numbers to a client, if the client supports this feature. Delegating inode numbers is a prerequisite for a client to do async file creation.
metric_collect
Description
CephFS clients can send performance metrics to a Ceph MDS.
alternate_name
Description
CephFS clients can set and understand alternate names for directory entries. This feature allows for encrypted file names.

5.12. Ceph File System client evictions

When a Ceph File System (CephFS) client is unresponsive or misbehaving, it might be necessary to forcibly terminate, or evict it from accessing the CephFS. Evicting a CephFS client prevents it from communicating further with Metadata Server (MDS) daemons and Ceph OSD daemons. If a CephFS client is buffering I/O to the CephFS at the time of eviction, then any un-flushed data will be lost. The CephFS client eviction process applies to all client types: FUSE mounts, kernel mounts, NFS gateways, and any process using libcephfs API library.

You can evict CephFS clients automatically, if they fail to communicate promptly with the MDS daemon, or manually.

Automatic Client Eviction

These scenarios cause an automatic CephFS client eviction:

  • If a CephFS client has not communicated with the active MDS daemon for over the default of 300 seconds, or as set by the session_autoclose option.
  • If the mds_cap_revoke_eviction_timeout option is set, and a CephFS client has not responded to the cap revoke messages for over the set amount of seconds. The mds_cap_revoke_eviction_timeout option is disabled by default.
  • During MDS startup or failover, the MDS daemon goes through a reconnect phase waiting for all the CephFS clients to connect to the new MDS daemon. If any CephFS clients fail to reconnect within the default time window of 45 seconds, or as set by the mds_reconnect_timeout option.

Additional Resources

5.13. Blocklist Ceph File System clients

Ceph File System (CephFS) client blocklisting is enabled by default. When you send an eviction command to a single Metadata Server (MDS) daemon, it propagates the blocklist to the other MDS daemons. This is to prevent the CephFS client from accessing any data objects, so it is necessary to update the other CephFS clients, and MDS daemons with the latest Ceph OSD map, which includes the blocklisted client entries.

An internal “osdmap epoch barrier” mechanism is used when updating the Ceph OSD map. The purpose of the barrier is to verify the CephFS clients receiving the capabilities have a sufficiently recent Ceph OSD map, before any capabilities are assigned that might allow access to the same RADOS objects, as to not race with canceled operations, such as, from ENOSPC or blocklisted clients from evictions.

If you are experiencing frequent CephFS client evictions due to slow nodes or an unreliable network, and you cannot fix the underlying issue, then you can ask the MDS to be less strict. It is possible to respond to slow CephFS clients by simply dropping their MDS sessions, but permit the CephFS client to re-open sessions and to continue talking to Ceph OSDs. By setting the mds_session_blocklist_on_timeout and mds_session_blocklist_on_evict options to false enables this mode.

Note

When blocklisting is disabled, the evicted CephFS client has only an effect on the MDS daemon you send the command to. On a system with multiple active MDS daemons, you need to send an eviction command to each active daemon.

5.14. Manually evicting a Ceph File System client

You might want to manually evict a Ceph File System (CephFS) client, if the client is misbehaving and you do not have access to the client node, or if a client dies, and you do not want to wait for the client session to time out.

Prerequisites

  • Root-level access to the Ceph Monitor node.

Procedure

  1. Review the client list:

    Syntax

    ceph tell DAEMON_NAME client ls

    Example

    [ceph: root@host01 /]# ceph tell mds.0 client ls
    [
        {
            "id": 4305,
            "num_leases": 0,
            "num_caps": 3,
            "state": "open",
            "replay_requests": 0,
            "completed_requests": 0,
            "reconnecting": false,
            "inst": "client.4305 172.21.9.34:0/422650892",
            "client_metadata": {
                "ceph_sha1": "79f0367338897c8c6d9805eb8c9ad24af0dcd9c7",
                "ceph_version": "ceph version 16.2.8-65.el8cp (79f0367338897c8c6d9805eb8c9ad24af0dcd9c7)",
                "entity_id": "0",
                "hostname": "senta04",
                "mount_point": "/tmp/tmpcMpF1b/mnt.0",
                "pid": "29377",
                "root": "/"
            }
        }
    ]

  2. Evict the specified CephFS client:

    Syntax

    ceph tell DAEMON_NAME client evict id=ID_NUMBER

    Example

    [ceph: root@host01 /]# ceph tell mds.0 client evict id=4305

5.15. Removing a Ceph File System client from the blocklist

In some situations, it can be useful to allow a previously blocklisted Ceph File System (CephFS) client to reconnect to the storage cluster.

Important

Removing a CephFS client from the blocklist puts data integrity at risk, and does not guarantee a fully healthy, and functional CephFS client as a result. The best way to get a fully healthy CephFS client back after an eviction, is to unmount the CephFS client and do a fresh mount. If other CephFS clients are accessing files that the blocklisted CephFS client was buffering I/O to, it can result in data corruption.

Prerequisites

  • Root-level access to the Ceph Monitor node.

Procedure

  1. Review the blocklist:

    Example

    [ceph: root@host01 /]# ceph osd blocklist ls
    
    listed 1 entries
    127.0.0.1:0/3710147553 2022-05-09 11:32:24.716146

  2. Remove the CephFS client from the blocklist:

    Syntax

    ceph osd blocklist rm CLIENT_NAME_OR_IP_ADDR

    Example

    [ceph: root@host01 /]# ceph osd blocklist rm 127.0.0.1:0/3710147553
    
    un-blocklisting 127.0.0.1:0/3710147553

  3. Optionally, you can have kernel-based CephFS clients automatically reconnect when removing them from the blocklist. On the kernel-based CephFS client, set the following option to clean either when doing a manual mount, or automatically mounting with an entry in the /etc/fstab file:

    recover_session=clean
  4. Optionally, you can have FUSE-based CephFS clients automatically reconnect when removing them from the blocklist. On the FUSE client, set the following option to true either when doing a manual mount, or automatically mounting with an entry in the /etc/fstab file:

    client_reconnect_stale=true

Additional Resources