Chapter 4. Bug fixes

This section describes bugs with significant user impact, which were fixed in this release of Red Hat Ceph Storage. In addition, the section includes descriptions of fixed known issues found in previous versions.

4.1. The Cephadm utility

The ceph-volume commands do not block OSDs and devices and runs as expected

Previously, the ceph-volume commands like ceph-volume lvm list and ceph-volume inventory were not completed thereby preventing the execution of other ceph-volume commands for creating OSDs, listing devices, and listing OSDs.

With this update, the default output of these commands are not added to the Cephadm log resulting in completion of all ceph-volume commands run in a container launched by the cephadm binary.


4.2. The Ceph Ansible utility

The cephadm-adopt playbook does not create default realms for multisite configuration

Previously, it was required for the cephadm-adopt playbook to create the default realms during the adoption process, even when there was no multisite configuration present.

With this release, the cephadm-adopt playbook does not enforce the creation of default realms when there is no multisite configuration deployed.


4.3. Ceph Dashboard

Secure cookie-based sessions are enabled for accessing the Red Hat Ceph Storage Dashboard

Previously, storing information in LocalStorage made the Red Hat Ceph Storage dashboard accessible to all sessions running in a browser, making the dashboard vulnerable to XSS attacks. With this release, LocalStorage is replaced with secure cookie-based sessions and thereby the session secret is available only to the current browser instance.


4.4. Ceph Manager plugins

The pg_autoscaler module no longer reports failed op error

Previously, the pg-autoscaler module reported KeyError for op when trying to get the pool status if any pool had the CRUSH rule step set_chooseleaf_vary_r 1. As a result, the Ceph cluster health displayed HEALTH_ERR with Module ’pg_autoscaler’ has failed: op error. With this release,only steps with op are iterated for a CRUSH rule while getting the pool status and the pg_autoscaler module no longer reports the failed op error.


4.5. Ceph Object Gateway

S3 lifecycle expiration header feature identifies the objects as expected

Previously, some objects without a lifecycle expiration were incorrectly identified in GET or HEAD requests as having a lifecycle expiration due to an error in the logic of the feature when comparing object names to stored lifecycle policy. With this update, the S3 lifecycle expiration header feature works as expected and identifies the objects correctly.


The radosgw-admin user list command no longer takes a long time to execute in Red Hat Ceph Storage cluster 4

Previously, in Red Hat Ceph Storage cluster 4, the performance of many radosgw-admin commands were affected because the value of rgw_gc_max_objs config variable ,which controls the number of GC shards, was increased significantly. This included radosgw-admin commands that were not related to GC. With this release, after an upgrade from Red Hat Ceph Storage cluster 3 to Red Hat Ceph Storage cluster 4 , the radosgw-admin user list command does not take a longer time to execute. Only the performance of radosgw-admin commands that require GC to operate is affected by the value of the rgw_gc_max_objs configuration.


4.6. RADOS

Setting bluestore_cache_trim_max_skip_pinned to 10000 enables trimming of the object’s metadata

The least recently used (LRU) cache is used for the object’s metadata. Trimming of the cache is done from the least recently accessed objects. Objects that are pinned are exempted from eviction, which means they are still being used by Bluestore..

Previously, the configuration variable bluestore_cache_trim_max_skip_pinned controlled how many pinned objects were visited, thereby the scrubbing process caused objects to be pinned for a long time. When the number of objects pinned on the bottom of the LRU metadata cache became larger than bluestore_cache_trim_max_skip_pinned , then trimming of cache was not completed.

With this release, you can set bluestore_cache_trim_max_skip_pinned to 10000 which is larger than the possible count of metadata cache. This enables trimming and the metadata cache size adheres to the configuration settings.


Upgrading storage cluster from Red Hat Ceph Storage 4 to 5 completes with HEALTH_WARN state

When upgrading a Red Hat Ceph Storage cluster from a previously supported version to Red Hat Ceph Storage 5, the upgrade completes with the storage cluster in a HEALTH_WARN state stating that monitors are allowing insecure global_id reclaim. This is due to a patched CVE, the details of which are available in the CVE-2021-20288.

Recommendations to mute health warnings:

  1. Identify clients that are not updated by checking the ceph health detail output for the AUTH_INSECURE_GLOBAL_ID_RECLAIM alert.
  2. Upgrade all clients to Red Hat Ceph Storage 5.0 release.
  3. If all the clients are not upgraded immediately, mute health alerts temporarily:


    ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM 1w  # 1 week
    ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1w  # 1 week

  4. After validating all clients have been updated and the AUTH_INSECURE_GLOBAL_ID_RECLAIM alert is no longer present for a client, set auth_allow_insecure_global_id_reclaim to false


    ceph config set mon auth_allow_insecure_global_id_reclaim false

  5. Ensure that no clients are listed with the AUTH_INSECURE_GLOBAL_ID_RECLAIM alert.


The trigger condition for RocksDB flush and compactions works as expected

BlueStore organizes data into chunks called blobs, the size of which is 64K by default. For large writes, it is split into a sequence of 64K blob writes.

Previously, when the deferred size was equal to or more than the blob size, all the data was deferred and they were placed under the “L” column family. A typical example is the case for HDD configuration where the value is 64K for both bluestore_prefer_deferred_size_hdd and bluestore_max_blob_size_hdd parameters. This consumed the “L” column faster resulting in the RocksDB flush count and the compactions becoming more frequent. The trigger condition for this scenario was data size in blobminimum deferred size.

With this release, the deferred trigger condition checks the size of extents on disks and not blobs. Extents smaller than deferred_size go to a deferred mechanism and larger extents are written to the disk immediately. The trigger condition is changed to data size in extent < minimum deferred size.

The small writes are placed under the “L” column and the growth of this column is slow with no extra compactions.

The bluestore_prefer_deferred_size parameter controls the deferred without any interference from the blob size and works as per it’s description of “writes smaller than this size”.