Chapter 4. Bug fixes
This section describes bugs with significant user impact, which were fixed in this release of Red Hat Ceph Storage. In addition, the section includes descriptions of fixed known issues found in previous versions.
4.1. The Cephadm utility
ceph-volume commands do not block OSDs and devices and runs as expected
ceph-volume commands like
ceph-volume lvm list and
ceph-volume inventory were not completed thereby preventing the execution of other
ceph-volume commands for creating OSDs, listing devices, and listing OSDs.
With this update, the default output of these commands are not added to the Cephadm log resulting in completion of all
ceph-volume commands run in a container launched by the cephadm binary.
4.2. The Ceph Ansible utility
cephadm-adopt playbook does not create default realms for multisite configuration
Previously, it was required for the
cephadm-adopt playbook to create the default realms during the adoption process, even when there was no multisite configuration present.
With this release, the
cephadm-adopt playbook does not enforce the creation of default realms when there is no multisite configuration deployed.
4.3. Ceph Dashboard
Secure cookie-based sessions are enabled for accessing the Red Hat Ceph Storage Dashboard
Previously, storing information in LocalStorage made the Red Hat Ceph Storage dashboard accessible to all sessions running in a browser, making the dashboard vulnerable to XSS attacks. With this release, LocalStorage is replaced with secure cookie-based sessions and thereby the session secret is available only to the current browser instance.
4.4. Ceph Manager plugins
pg_autoscaler module no longer reports failed op error
pg-autoscaler module reported KeyError for
op when trying to get the pool status if any pool had the CRUSH rule
step set_chooseleaf_vary_r 1. As a result, the Ceph cluster health displayed HEALTH_ERR with
Module ’pg_autoscaler’ has failed: op error. With this release,only steps with
op are iterated for a CRUSH rule while getting the pool status and the
pg_autoscaler module no longer reports the failed
4.5. Ceph Object Gateway
S3 lifecycle expiration header feature identifies the objects as expected
Previously, some objects without a lifecycle expiration were incorrectly identified in GET or HEAD requests as having a lifecycle expiration due to an error in the logic of the feature when comparing object names to stored lifecycle policy. With this update, the S3 lifecycle expiration header feature works as expected and identifies the objects correctly.
radosgw-admin user list command no longer takes a long time to execute in Red Hat Ceph Storage cluster 4
Previously, in Red Hat Ceph Storage cluster 4, the performance of many
radosgw-admin commands were affected because the value of
rgw_gc_max_objs config variable ,which controls the number of GC shards, was increased significantly. This included
radosgw-admin commands that were not related to GC. With this release, after an upgrade from Red Hat Ceph Storage cluster 3 to Red Hat Ceph Storage cluster 4 , the
radosgw-admin user list command does not take a longer time to execute. Only the performance of
radosgw-admin commands that require GC to operate is affected by the value of the
10000 enables trimming of the object’s metadata
The least recently used (LRU) cache is used for the object’s metadata. Trimming of the cache is done from the least recently accessed objects. Objects that are pinned are exempted from eviction, which means they are still being used by Bluestore..
Previously, the configuration variable
bluestore_cache_trim_max_skip_pinned controlled how many pinned objects were visited, thereby the scrubbing process caused objects to be pinned for a long time. When the number of objects pinned on the bottom of the LRU metadata cache became larger than
bluestore_cache_trim_max_skip_pinned , then trimming of cache was not completed.
With this release, you can set
10000 which is larger than the possible count of metadata cache. This enables trimming and the metadata cache size adheres to the configuration settings.
Upgrading storage cluster from Red Hat Ceph Storage 4 to 5 completes with HEALTH_WARN state
When upgrading a Red Hat Ceph Storage cluster from a previously supported version to Red Hat Ceph Storage 5, the upgrade completes with the storage cluster in a HEALTH_WARN state stating that monitors are allowing insecure
global_id reclaim. This is due to a patched CVE, the details of which are available in the CVE-2021-20288.
Recommendations to mute health warnings:
Identify clients that are not updated by checking the
ceph health detailoutput for the
- Upgrade all clients to Red Hat Ceph Storage 5.0 release.
If all the clients are not upgraded immediately, mute health alerts temporarily:
ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM 1w # 1 week ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1w # 1 week
After validating all clients have been updated and the AUTH_INSECURE_GLOBAL_ID_RECLAIM alert is no longer present for a client, set
ceph config set mon auth_allow_insecure_global_id_reclaim false
Ensure that no clients are listed with the
The trigger condition for RocksDB flush and compactions works as expected
BlueStore organizes data into chunks called blobs, the size of which is 64K by default. For large writes, it is split into a sequence of 64K blob writes.
Previously, when the deferred size was equal to or more than the blob size, all the data was deferred and they were placed under the “L” column family. A typical example is the case for HDD configuration where the value is 64K for both
bluestore_max_blob_size_hdd parameters. This consumed the “L” column faster resulting in the RocksDB flush count and the compactions becoming more frequent. The trigger condition for this scenario was
data size in blob ⇐
minimum deferred size.
With this release, the deferred trigger condition checks the size of extents on disks and not blobs. Extents smaller than
deferred_size go to a deferred mechanism and larger extents are written to the disk immediately. The trigger condition is changed to
data size in extent <
minimum deferred size.
The small writes are placed under the “L” column and the growth of this column is slow with no extra compactions.
bluestore_prefer_deferred_size parameter controls the deferred without any interference from the blob size and works as per it’s description of “writes smaller than this size”.