Chapter 4. Bug fixes

This section describes bugs with significant user impact, which were fixed in this release of Red Hat Ceph Storage. In addition, the section includes descriptions of fixed known issues found in previous versions.

4.1. The Cephadm utility

The ceph-volume commands do not block OSDs and devices and runs as expected

Previously, the ceph-volume commands like ceph-volume lvm list and ceph-volume inventory were not completed thereby preventing the execution of other ceph-volume commands for creating OSDs, listing devices, and listing OSDs.

With this update, the default output of these commands are not added to the Cephadm log resulting in completion of all ceph-volume commands run in a container launched by the cephadm binary.

(BZ#1948717)

Searching Ceph OSD id claim matches a host’s fully-qualified domain name to a host name

Previously, when replacing a failed Ceph OSD, the name in the CRUSH map appeared only as a host name, and searching for the Ceph OSD id claim was using the fully-qualified domain name (FQDN) instead. As a result, the Ceph OSD id claim was not found. With this release, the Ceph OSD id claim search functionality correctly matches a FQDN to a host name, and replacing the Ceph OSD works as expected.

(BZ#1954503)

The ceph orch ls command correctly displays the number of daemons running for a given service

Previously, the ceph orch ls --service-type SERVICE_TYPE command incorrectly reported 0 daemons running for a service that had running daemons, and users were unable to see how many daemons were running for a specific service. With this release, the ceph orch ls --service-type SERVICE_TYPE command now correctly displays how many daemons are running for that given service.

(BZ#1964951)

Users are no longer able to remove the Ceph Manager service using cephadm

Previously, if a user ran a ceph orch rm mgr command, it would cause cephadm to remove all the Ceph Manager daemons in the storage cluster, making the storage cluster inaccessible.

With this release, attempting to remove the Ceph Manager, a Ceph Monitor, or a Ceph OSD service using the ceph orch rm SERVICE_NAME command displays a warning message stating that it is not safe to remove these services, and results in no actions taken.

(BZ#1976820)

The node-exporter and alert-manager container versions have been updated

Previously, the Red Hat Ceph Storage 5.0 node-exporter and alert-manager container versions defaulted to version 4.5, when version 4.6 was available, and in use in Red Hat Ceph Storage 4.2.

With this release, using the cephadm command to upgrade from Red Hat Ceph Storage 5.0 to Red Hat Ceph Storage 5.0z1 results in the node-exporter and alert-manager container versions being updated to version 4.6.

(BZ#1996090)

4.2. Ceph Dashboard

Secure cookie-based sessions are enabled for accessing the Red Hat Ceph Storage Dashboard

Previously, storing information in LocalStorage made the Red Hat Ceph Storage dashboard accessible to all sessions running in a browser, making the dashboard vulnerable to XSS attacks. With this release, LocalStorage is replaced with secure cookie-based sessions and thereby the session secret is available only to the current browser instance.

(BZ#1889435)

4.3. Ceph File System

The MDS daemon no longer crashes when receiving unsupported metrics

Previously, the MDS daemon could not handle the new metrics from the kernel client causing the MDS daemons to crash on receiving any unsupported metrics.

With this release, the MDS discards any unsupported metrics and works as expected.

(BZ#2030451)

Deletion of data is allowed when the storage cluster is full

Previously, when the storage cluster was full, the Ceph Manager hung on checking pool permissions while reading the configuration file. The Ceph Metadata Server (MDS) did not allow write operations to occur when the Ceph OSD was full, resulting in an ENOSPACE error. When the storage cluster hit full ratio, users could not delete data to free space using the Ceph Manager volume plugin.

With this release, the new FULL capability is introduced. With the FULL capability, the Ceph Manager bypasses the Ceph OSD full check. The client_check_pool_permission option is disabled by default whereas, in previous releases, it was enabled. With the Ceph Manager having FULL capabilities, the MDS no longer blocks Ceph Manager calls. This results in allowing the Ceph Manager to free up space by deleting subvolumes and snapshots when a storage cluster is full.

(BZ#1910272)

Ceph monitors no longer crash when processing authentication requests from Ceph File System clients

Previously, if a client did not have permission to view a legacy file system, the Ceph monitors would crash when processing authentication requests from clients. This caused the Ceph monitors to become unavailable. With this release, the code update fixes the handling of legacy file system authentication requests and authentication requests work as expected.

(BZ#1976915)

Fixes KeyError appearing every few milliseconds in the MGR log

Previously, KeyError was logged to the Ceph Manager log every few milliseconds. This was due to an attempt to remove an element from client_metadata[in_progress] dictionary with a non-existent key, resulting in a KeyError. As a result, locating other stack traces in the logs was difficult. This release fixes the code logic in the Ceph File System performance metrics and KeyError messages in the Ceph Manager log.

(BZ#1979520)

Deleting a subvolume clone is no longer allowed for certain clone states

Previously, if you tried to remove a subvolume clone with the force option when the clone was not in a COMPLETED or CANCELLED state, the clone was not removed from the index tracking the ongoing clones. This caused the corresponding cloner thread to retry the cloning indefinitely, eventually resulting in an ENOENT failure. With the default number of cloner threads set to four, attempts to delete four clones resulted in all four threads entering a blocked state allowing none of the pending clones to complete.

With this release, unless a clone is either in a COMPLETED or CANCELLED state, it is not removed. The cloner threads no longer block because the clones are deleted, along with their entry from the index tracking the ongoing clones. As a result, pending clones continue to complete as expected.

(BZ#1980920)

The ceph fs snapshot mirror daemon status command no longer requires a file system name

Previously, users were required to give at least one file system name to the ceph fs snapshot mirror daemon status command. With this release, the user no longer needs to specify a file system name as a command argument, and daemon status displays each file system separately.

(BZ#1988338)

Stopping the cephfs-mirror daemon can result in an unclean shutdown

Previously, the cephfs-mirror process would terminate uncleanly due to a race condition during cephfs-mirror shutdown process. With this release, the race condition was resolved, and as a result, the cephfs-mirror daemon shuts down gracefully.

(BZ#2002140)

The Ceph Metadata Server no longer falsely reports metadata damage, and failure warnings

Previously, the Ceph Monitor assigned a rank to standby-replay daemons during creation. This behavior can lead to the Ceph Metadata Servers (MDS) reporting false metadata damage, and failure warnings. With this release, Ceph Monitors no longer assign rank to standby-replay daemons during creation, eliminating false metadata damage, and failure warnings.

(BZ#2002398)

4.4. Ceph Manager plugins

The pg_autoscaler module no longer reports failed op error

Previously, the pg-autoscaler module reported KeyError for op when trying to get the pool status if any pool had the CRUSH rule step set_chooseleaf_vary_r 1. As a result, the Ceph cluster health displayed HEALTH_ERR with Module ’pg_autoscaler’ has failed: op error. With this release,only steps with op are iterated for a CRUSH rule while getting the pool status and the pg_autoscaler module no longer reports the failed op error.

(BZ#1874866)

4.5. Ceph Object Gateway

S3 lifecycle expiration header feature identifies the objects as expected

Previously, some objects without a lifecycle expiration were incorrectly identified in GET or HEAD requests as having a lifecycle expiration due to an error in the logic of the feature when comparing object names to stored lifecycle policy. With this update, the S3 lifecycle expiration header feature works as expected and identifies the objects correctly.

(BZ#1786226)

The radosgw-admin user list command no longer takes a long time to execute in Red Hat Ceph Storage cluster 4

Previously, in Red Hat Ceph Storage cluster 4, the performance of many radosgw-admin commands were affected because the value of rgw_gc_max_objs config variable ,which controls the number of GC shards, was increased significantly. This included radosgw-admin commands that were not related to GC. With this release, after an upgrade from Red Hat Ceph Storage cluster 3 to Red Hat Ceph Storage cluster 4 , the radosgw-admin user list command does not take a longer time to execute. Only the performance of radosgw-admin commands that require GC to operate is affected by the value of the rgw_gc_max_objs configuration.

(BZ#1927940)

Policies with invalid Amazon resource name elements no longer lead to privilege escalations

Previously, incorrect handling of invalid Amazon resource name (ARN) elements in IAM policy documents, such as bucket policies, can cause unintentional permissions granted to users who are not part of the policy. With this release, this fix prevents storing policies with invalid ARN elements, or if already stored, correctly evaluates the policies.

(BZ#2007451)

4.6. RADOS

Setting bluestore_cache_trim_max_skip_pinned to 10000 enables trimming of the object’s metadata

The least recently used (LRU) cache is used for the object’s metadata. Trimming of the cache is done from the least recently accessed objects. Objects that are pinned are exempted from eviction, which means they are still being used by Bluestore..

Previously, the configuration variable bluestore_cache_trim_max_skip_pinned controlled how many pinned objects were visited, thereby the scrubbing process caused objects to be pinned for a long time. When the number of objects pinned on the bottom of the LRU metadata cache became larger than bluestore_cache_trim_max_skip_pinned , then trimming of cache was not completed.

With this release, you can set bluestore_cache_trim_max_skip_pinned to 10000 which is larger than the possible count of metadata cache. This enables trimming and the metadata cache size adheres to the configuration settings.

(BZ#1931504)

Upgrading storage cluster from Red Hat Ceph Storage 4 to 5 completes with HEALTH_WARN state

When upgrading a Red Hat Ceph Storage cluster from a previously supported version to Red Hat Ceph Storage 5, the upgrade completes with the storage cluster in a HEALTH_WARN state stating that monitors are allowing insecure global_id reclaim. This is due to a patched CVE, the details of which are available in the CVE-2021-20288.

Recommendations to mute health warnings:

  1. Identify clients that are not updated by checking the ceph health detail output for the AUTH_INSECURE_GLOBAL_ID_RECLAIM alert.
  2. Upgrade all clients to Red Hat Ceph Storage 5.0 release.
  3. If all the clients are not upgraded immediately, mute health alerts temporarily:

    Syntax

    ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM 1w  # 1 week
    ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1w  # 1 week

  4. After validating all clients have been updated and the AUTH_INSECURE_GLOBAL_ID_RECLAIM alert is no longer present for a client, set auth_allow_insecure_global_id_reclaim to false

    Syntax

    ceph config set mon auth_allow_insecure_global_id_reclaim false

  5. Ensure that no clients are listed with the AUTH_INSECURE_GLOBAL_ID_RECLAIM alert.

(BZ#1953494)

The trigger condition for RocksDB flush and compactions works as expected

BlueStore organizes data into chunks called blobs, the size of which is 64K by default. For large writes, it is split into a sequence of 64K blob writes.

Previously, when the deferred size was equal to or more than the blob size, all the data was deferred and they were placed under the “L” column family. A typical example is the case for HDD configuration where the value is 64K for both bluestore_prefer_deferred_size_hdd and bluestore_max_blob_size_hdd parameters. This consumed the “L” column faster resulting in the RocksDB flush count and the compactions becoming more frequent. The trigger condition for this scenario was data size in blobminimum deferred size.

With this release, the deferred trigger condition checks the size of extents on disks and not blobs. Extents smaller than deferred_size go to a deferred mechanism and larger extents are written to the disk immediately. The trigger condition is changed to data size in extent < minimum deferred size.

The small writes are placed under the “L” column and the growth of this column is slow with no extra compactions.

The bluestore_prefer_deferred_size parameter controls the deferred without any interference from the blob size and works as per it’s description of “writes smaller than this size”.

(BZ#1991677)

The Ceph Manager no longer crashes during large increases to pg_num and pgp_num

Previously, the code that adjusts placement groups did not handle large increases to pg_num and pgp_num parameters correctly, and led to an integer underflow that can crash the Ceph Manager.

With this release, the code that adjusts placement groups was fixed. As a result, large increases to placement groups do not cause the Ceph Manager to crash.

(BZ#2001152)

4.7. RADOS Block Devices (RBD)

The librbd code honors the CEPH_OSD_FLAG_FULL_TRY flag

Previously, you could set the CEPH_OSD_FLAG_FULL_TRY with the rados_set_pool_full_try() API function. In Red Hat Ceph Storage 5, librbd stopped honoring this flag. This resulted in write operations stalling on waiting for space when a pool became full or reached a quota limit, even if the CEPH_OSD_FLAG_FULL_TRY was set.

With this release, librbd now honors the CEPH_OSD_FLAG_FULL_TRY flag, and when set, and a pool becomes full or reaches quota, the write operations either succeed or fail with ENOSPC or EDQUOT message. The ability to remove RADOS Block Device (RBD) images from a full or at-quota pool is restored.

(BZ#1969301)

4.8. RBD Mirroring

Improvements to the rbd mirror pool peer bootstrap import command

Previously, running the rbd mirror pool peer bootstrap import command caused librados to log errors about a missing key ring file in cases where a key ring was not required. This can confuse site administrators, because it appears as though the command failed due to a missing key ring. With this release, librados no longer log errors in cases where a remote storage cluster’s key ring is not required, such as when the bootstrap token contains the key.

(BZ#1981186)

4.9. iSCSI Gateway

The gwcli tool now shows the correct erasure coded pool profile

Previously, the gwcli tool would show the incorrect k+m values of the erasure coded pool.

With this release, the gwcli tool pulls the information from the erasure coded pool settings from the associated erasure coded profile and the Red Hat Ceph Storage cluster shows the correct erasure coded pool profile.

(BZ#1840721)

The upgrade of the storage cluster with iSCSI configured now works as expected

Previously, the upgrade of the storage cluster with iSCSI configured would fail as the latest ceph-iscsi packages would not have the ceph-iscsi-tools packages that were deprecated.

With this release, the ceph-iscsi-tools package is marked as obsolete in the RPM specification file and the upgrade succeeds as expected.

(BZ#2026582)

The tcmu-runner no longer fails to remove “blocklist” entries

Previously, the tcmu-runner would execute incorrect commands to remove the “blocklist” entries resulting in a degradation in performance for iSCSI LUNs.

With this release, the tcmu-runner was updated to execute the correct command when removing blocklist entries. The blocklist entries are cleaned up by tcmu-runner and the iSCSI LUNs work as expected.

(BZ#2041127)

The tcmu-runner process now closes normally

Previously, the tcmu-runner process incorrectly handled a failed path, causing the release of uninitialized g_object memory. This can cause the tcmu-runner process to terminate unexpectedly. The source code has been modified to skip the release of uninitialized g_object memory, resulting in the tcmu-runner process exiting normally.

(BZ#2007683)

The RADOS Block Device handler correctly parses configuration strings

Previously, the RADOS Block Device (RBD) handler used the strtok() function while parsing configuration strings, which is not thread-safe. This caused incorrect parsing of the configuration string of image names when creating or reopening an image. This resulted in the image failing to open. With this release, the RBD handler uses the thread-safe strtok_r() function, allowing for the correct parsing of configuration strings.

(BZ#2007687)

4.10. The Ceph Ansible utility

The cephadm-adopt playbook now enables the pool application on the pool when creating a new nfs-ganesha pool

Previously, when the cephadm-adopt playbook created a new nfs-ganesha pool, it did not enable the pool application on the pool. This resulted in a warning that one pool did not have the pool application enabled. With this update, the cephadm-adopt playbook sets the pool application on the created pool, and a warning after the adoption no longer occurs.

(BZ#1956840)

The cephadm-adopt playbook does not create default realms for multisite configuration

Previously, it was required for the cephadm-adopt playbook to create the default realms during the adoption process, even when there was no multisite configuration present.

With this release, the cephadm-adopt playbook does not enforce the creation of default realms when there is no multisite configuration deployed.

(BZ#1988404)

The Ceph Ansible cephadm-adopt.yml playbook can add nodes with a host’s fully-qualified domain name

Previously, the task that adds nodes in cephadm using the Ceph Ansible cephadm-adopt.yml playbook, was using the short host name, and was not matching the current fully-qualified domain name (FQDN) of a node. As a result, the adoption playbook failed because no match to the FQDN host name was found.

With this release, the playbook uses the ansible_nodename fact instead of the ansble_hostname fact, allowing the adoption playbook to add nodes configured with a FQDN.

(BZ#1997083)

The Ceph Ansible cephadm-adopt playbook now pulls container images successfully

Previously, the Ceph Ansible cephadm-adopt playbook was not logging into the container registry on storage clusters that were being adopted. With this release, the Ceph Ansible cephadm-adopt playbook logs into the container registry, and pulls container images as expected.

(BZ#2000103)