Chapter 6. Bug fixes

This section describes bugs with significant user impact, which were fixed in this release of Red Hat Ceph Storage. In addition, the section includes descriptions of fixed known issues found in previous versions.

6.1. The Cephadm utility

Users can upgrade to a local repo image without any issues

Previously, in cephadm, docker.io would be added to the start of the image name by default, if the image name was not a qualified domain name. Due to this, users were unable to upgrade to images on local repositories.

With this fix, care has been taken to identify the images to which docker.io is added by default. Users using a local repo image can upgrade to that image without encountering issues.

(BZ#2100553)

6.2. Ceph File System

snap-schedules are no longer lost on restarts of Ceph Manager services

Previously, in-memory databases were not written to persistent storage on every change to the schedule. This caused snap-schedules to get lost on restart of Ceph Manager services.

With this fix, the in-memory databases are dumped into persistent storage on every change or addition to the snap-schedules. Retention now continues to work across restarts of Ceph Manager services.

(BZ#2102934)

The standby-replay Metadata Server daemon is no longer unexpectedly removed

Previously, the Ceph Monitor would remove a standby-replay Metadata Server (MDS) daemon from the MDS map under certain conditions. This caused the standby-replay MDS daemon to be removed from the Metadata Server cluster, which generated cluster warnings.

With this fix, the logic used in Ceph Monitors during the consideration of removal of an MDS daemon from the MDS map, now includes information about the standby-replay MDS daemons holding a rank. This ensures that the standby-replay MDS daemons are no longer unexpectedly removed from the MDS cluster.

(BZ#2130116)

6.3. Ceph Manager plugins

Ceph Manager Alert emails are not tagged as spam anymore

Previously, emails sent by the Ceph Manager Alerts module did not have the “Message-Id” and “Date:headers”. This increased the chances of flagging the emails as spam.

With this fix, both the headers are added to the emails sent by Ceph Manager Alerts module and the messages are not flagged as spam.

(BZ#2064481)

6.4. The Ceph Volume utility

The volume list remains empty when no ceph-osd container is found and cephvolumescan actor no longer fails

Previously, if Ceph containers ran collocated with other containers without a ceph-osd container present among them, the process would try to retrieve the volume list from one non-Ceph container which would not work. Due to this, cephvolumescan actor would fail and the upgrade would not complete.

With this fix, if no ceph-osd container is found, the volume list will remain empty and the cephvolumescan actor does not fail.

(BZ#2141393)

Ceph OSD deployment no longer fails when ceph-volume treats multiple devices.

Previously, ceph-volume computed wrong sizes when there were multiple devices to treat, resulting in failure to deploy OSDs.

With this fix, ceph-volume computes the correct size when multiple devices are to be treated and deployment of OSDs work as expected.

(BZ#2119774)

6.5. Ceph Object Gateway

Users can now set up Kafka connectivity with SASL in a non-TLS environment

Previously, due to a failure in configuring the TLS certificate for Ceph Object Gateway, it was not possible to configure Kafka topic with SASL (user and password).

With this fix, a new configuration parameter, rgw_allow_notification_secrets_in_cleartext, is added. Users can now set up Kafka connectivity with SASL in a non-TLS environment.

(BZ#2014330)

Internal handling of tokens is fixed

Previously, internal handling of tokens in the refresh path of Java-based client authentication provider jar for AWS SDK for Java and Hadoop S3A Connector, would not deal correctly with the large tokens, resulting in improper processing of some tokens and preventing the renewal of client tokens.

With this fix, the internal token handling is fixed and it works as expected.

(BZ#2055137)

The object version access is corrected preventing object lock violation

Previously, inadvertent slicing of version information would occur in some call paths, causing any object version protected by object lock to be deleted contrary to policy.

With this fix, the object version access is corrected, thereby preventing object lock violation.

(BZ#2108394)

Ceph Object Gateway no longer crashes with malformed URLs

Previously, a refactoring abstraction replaced a bucket value with a pointer to a bucket value that was not always initialized. This caused malformed URLs corresponding to bucket operations on no buckets resulting in Ceph Object Gateway crashing.

With this fix, a check on the pointer has been implemented into the call path and Ceph Object Gateway returns a permission error, rather than crashing, if it is uninitialized.

(BZ#2109256)

The code that parses dates z-amz-date format is changed

Previously, the standard format for x-amz-date was changed which caused issues, since the new software uses the new date format. The new software built with the latest go libraries would not talk to the Ceph Object Gateway.

With this fix, the code in the Ceph Object Gateway that parses dates in x-amz-date format is changed to also accept the new date format.

(BZ#2109675)

New logic in processing of lifecycle shards prevents stalling due to deleted buckets

Previously, changes were made to cause lifecycle processing to continuously cycle across days, that is, to not restart from the beginning of the list of eligible buckets each day. However, the changes contained a bug which could stall processing of lifecycle shards that contained deleted buckets, causing the processing of lifecycle shards to stall.

With this fix, a logic is introduced to skip over the deleted buckets, due to which the processing no longer stalls.

(BZ#2118295)

Header processing no longer causes sporadic swift-protocol authentication failures

Previously, a combination of incorrect HTTP header processing and timestamp handling logic would either cause an invalid Keystone admin token to be used for operations, or non-renewal of Keystone’s admin token as required. Due to this, sporadic swift-protocol authentication failures would occur.

With this fix, header processing is corrected and new diagnostics are added. The logic now works as expected.

(BZ#2123335)

Warnings are no longer logged in inappropriate circumstances

Previously, an inverted logic would occasionally report an incorrect warning - unable to find head object, causing the warning to be logged when it was not applicable in a Ceph Object Gateway configuration.

With this fix, the corrected logic no longer logs the warning in inappropriate circumstances.

(BZ#2126787)

PUT object operation writes to the correct bucket index shards

Previously, due to a race condition, a PUT object operation would rarely write to a former bucket index shard. This caused the former bucket index shard to be recreated, and the object would not appear in the proper bucket index. Therefore, the object would not be listed when the bucket was listed.

With this fix, care is taken to prevent various operations from creating bucket index shards and recover when the race condition is encountered. PUT object operations now always write to the correct bucket index shards.

(BZ#2145022)

6.6. Multi-site Ceph Object Gateway

Suspending bucket versioning in the primary zone no longer suspends bucket versioning in the archive zone

Previously, if bucket versioning was suspended in the primary zone, bucket versioning in the archive zone would also be suspended.

With this fix, archive zone versioning is always enabled irrespective of bucket versioning changes on other zones. Bucket versioning in the archive zone no longer gets suspended.

(BZ#1957088)

The radosgw-admin sync status command in multi-site replication works as expected

Previously, in a multisite replication, if one or more participating Ceph Object Gateway nodes are down, you would (5) Input/output error output when running the radosgw-admin sync status command. This status should get resolved after all the Ceph Object Gateway nodes are back online.

With this update, the radosgw-admin sync status command does not get stuck and works as expected.

(BZ#1749627)

Processes trimming retired bucket index entries no longer cause radosgw instance to crash

Previously, under some circumstances, processes trimming retired bucket index entries could access an uninitialized pointer variable resulting in the radosgw instance to crash.

With this fix, code is initialized immediately before use and the radosgw instance no longer crashes.

(BZ#2139258)

Bucket sync run is given control logic to sync all objects

Previously, to support dynamic bucket resharding on multisite clusters, a singular bucket index log was replaced with multiple bucket index log generations. But, due to how bucket sync run was implemented, only the oldest outstanding generation would be sync run.

With this fix, bucket sync run is given control logic which enables it to run the sync from oldest outstanding to current and all objects are now synced as expected.

(BZ#2066453)

Per-bucket replication logical error fix executes policies correctly

Previously, an internal logic error caused failures in per-bucket replication, due to which per-bucket replication policies did not work in some circumstances.

With this fix, the logic error responsible for confusing the source and destination bucket information is corrected and the policies execute correctly.

(BZ#2108886)

Variable access no longer causes undefined program behavior

Previously, a coverity scan would identify two cases, where variables could be used after a move, potentially causing an undefined program behavior to occur.

With this fix, variable access is fixed and the potential fault can no longer occur.

(BZ#2123423)

Requests with a tenant but no bucket no longer cause a crash

Previously, an upstream refactoring replaced uninitialized bucket data fields with uninitialized pointers. Due to this, any bucket request containing a URL referencing no valid bucket caused crashes.

With this fix, requests that access the bucket but do not specify a valid bucket are denied, resulting in an error instead of a crash.

(BZ#2139422)

6.7. RADOS

Performing a DR test with two sites stretch cluster no longer causes Ceph to become unresponsive

Previously, when performing a DR test with two sites stretch-cluster, removing and adding new monitors to the cluster would cause an incorrect rank in ConnectionTracker class. Due to this, the monitor would fail to identify itself in the peer_tracker copy and would never update its correct field, causing a deadlock in the election process which would lead to Ceph becoming unresponsive.

With this fix, the following corrections are made:

  • Added an assert in the function notify_rank_removed(), to compare the expected rank provided by the Monmap against the rank that is manually adjusted as a sanity check.
  • Clear the variable removed_ranks from every Monmap update.
  • Added an action to manually reset peer_tracker.rank when executing the command - ceph connection scores reset for each monitor. The peer_tracker.rank matches the current rank of the monitor.
  • Added functions in the Elector and ConnectionTracker classes to check for clean peer_tracker when upgrading the monitors, including booting up. If found unclean, peer_tracker is cleared.
  • In Red Hat Ceph Storage, the user can choose to manually remove a monitor rank before shutting down the monitor, causing inconsistency in Monmap. Therefore, in Monitor::notify_new_monmap() we prevent the function from removing our rank or ranks that don’t exist in Monmap.

The cluster now works as expected and there is no unwarranted downtime. The cluster no longer becomes unresponsive when performing a DR test with two sites stretch-cluster.

(BZ#2142674)

Rank is removed from the live_pinging and dead_pinging set to mitigate the inconsistent connectivity score issue

Previously, when removing two monitors consecutively, if the rank size is equal to Paxos’s size, the monitor would face a condition and would not remove rank from the dead_pinging set. Due to this, the rank remained in the dead_pinging set which would cause problems, such as inconsistent connectivity score when the stretch-cluster mode was enabled.

With this fix, a case is added where the highest ranked monitor is removed, that is, when the rank is equal to Paxos’s size, remove the rank from the live_pinging and dead_pinging set. The monitor stays healthy with a clean live_pinging and dead_pinging set.

(BZ#2142174)

The Prometheus metrics now reflect the correct Ceph version for all Ceph Monitors whenever requested

Previously, the Prometheus metrics reported mismatched Ceph versions for Ceph Monitors when the monitor was upgraded. As a result, the active Ceph Manager daemon needed to be restarted to resolve this inconsistency.

With this fix, the Ceph Monitors explicitly send metadata update requests with mon metadata to mgr when MON election is over.

(BZ#2008524)

The ceph daemon heap status command shows the heap status

Previously, due to a failure to get heap information through the ceph daemon command, the ceph daemon heap stats command would return empty output instead of returning current heap usage for a Ceph daemon. This was because ceph::osd_cmds::heap() was confusing the stderr and stdout concept which caused the difference in output.

With this fix, the ceph daemon heap stats command returns heap usage information for a Ceph daemon similar to what we get using the ceph tell command.

(BZ#2119100)

Ceph Monitors no longer crash when using ceph orch apply mon <num> command

Previously, when the command ceph orch apply mon <num> was used to decrease monitors in a cluster, the monitors were removed before shutting down in ceph-adm causing the monitors to crash.

With this fix, a sanity check is added to all code paths that check whether the peer rank is more than or equal to the size of the ranks from the monitor map. If the condition is satisfied, then skip certain operations that lead to the monitor crashing. The peer rank eventually resolves itself in the next version of the monitor map. The monitors no longer crash when removed from the monitor map before shutting down.

(BZ#2142141)

End-user can now see the scrub or deep-scrub starts message from the Ceph cluster log

Previously, due to the scrub or deep-scrub starts message missing in the Ceph cluster log, the end-user would fail to know if the PG scrubbing had started for a PG from the Ceph cluster log.

With this fix, the scrub or deep-scrub starts message is reintroduced. The Ceph cluster log now shows the message for a PG, whenever it goes for a scrubbing or deep-scrubbing process.

(BZ#2091773)

No assertion during the Ceph Manager failover

Previously, when activating the Ceph Manager, it would receive several service_map versions sent by the previously active manager. This incorrect check in code would cause assertion failure when the newly activated manager received a map with a higher version sent by the previously active manager.

With this fix, the check in the manager that deals with the initial service map is relaxed and there is no assertion during the Ceph Manager failover.

(BZ#2095062)

Users can remove cloned objects after upgrading a cluster

Previously, after upgrading a cluster from Red Hat Ceph Storage 4 to Red Hat Ceph Storage 5 , removing snapshots of objects created in earlier versions would leave clones, which could not be removed. This was because the SnapMapper keys were wrongly converted.

With this fix, SnapMapper’s legacy conversation is updated to match the new key format. The cloned objects in earlier versions of Ceph can now be easily removed after an upgrade.

(BZ#2107405)

RocksDB error does not occur for small writes

BlueStore employs a strategy of deferring small writes for HDDs and stores data in RocksDB. Cleaning deferred data from RocksDB is a background process which is not synchronized with BlueFS.

With this fix, deferred replay no longer overwrites BlueFS data and some RocksDB errors do not occur, such as:

  • osd_superblock corruption.
  • CURRENT does not end with newline.
  • .sst files checksum error.
Note

Do not write deferred data as the write location might either contain a proper object or be empty. It is not possible to corrupt object data this way. BlueFS is the only entity that can allocate this space.

(BZ#2109886)

Corrupted dups entries of a PG Log can be removed by off-line and on-line trimming

Previously, trimming of PG log dups entries could be prevented during the low-level PG split operation, which is used by the PG autoscaler with far higher frequency than by a human operator. Stalling the trimming of dups resulted in significant memory growth of PG log, leading to OSD crashes as it ran out of memory. Restarting an OSD did not solve the problem as the PG log is stored on disk and reloaded to RAM on startup.

With this fix, both off-line, using the ceph-objectstore-tool command, and on-line, within OSD, trimming can remove corrupted dups entries of a PG log that jammed the on-line trimming machinery and were responsible for the memory growth. A debug improvement is implemented that prints the number of dups entries to the OSD’s log to help future investigations.

(BZ#2119853)

6.8. RADOS Block Devices (RBD)

rbd info command no longer fails if executed when the image is being flattened

Previously, due to an implementation defect, rbd info command would fail, although rarely, if run when the image was being flattened. This caused a transient No such file or directory error to occur, although, upon rerun, the command always succeeded.

With this fix, the implementation defect is fixed and rbd info command no longer fails even if executed when the image is being flattened.

(BZ#1989527)

Removing a pool with pending Block Device tasks no longer causes all the tasks to hang

Previously, due to an implementation defect, removing a pool with pending Block Device tasks caused all Block Device tasks, including other pools, to hang. To resume hung Block Device tasks, the administrator had to restart the ceph-mgr daemon.

With this fix, the implementation defect is fixed and removing a pool with pending RBD tasks no longer causes any hangs. Block Device tasks for the removed pool are cleaned up. Block Device tasks for other pools continue executing uninterrupted.

(BZ#2150968)

6.9. RBD Mirroring

The image replayer shuts down as expected

Previously, due to an implementation defect, a request to shut down a particular image replayer would cause the rbd-mirror daemon to hang indefinitely, especially in cases where the daemon was blocklisted on the remote storage cluster.

With this fix, the implementation defect is fixed and a request to shut down a particular image replayer no longer causes the rbd-mirror daemon to hang and the image replayer shuts down as expected.

(BZ#2086471)

The rbd mirror pool peer bootstrap create command guarantees correct monitor addresses in the bootstrap token

Previously, a bootstrap token generated with the rbd mirror pool peer bootstrap create command contained monitor addresses as specified by the mon_host option in the ceph.conf file. This was fragile and caused issues to users, such as causing confusion between V1 and V2 endpoints, specifying only one of them, grouping them incorrectly, and the like.

With this fix, the rbd mirror pool peer bootstrap create command is changed to extract monitor address from the cluster itself, guaranteeing the monitor addresses contained in a bootstrap token to be correct.

(BZ#2122130)

6.10. iSCSI Gateway

Upgrade from Red Hat Ceph Storage 4.x to 5.x with iSCSI works as expected

Previously, due to version conflict between some of the ceph-iscsi dependent libraries, upgrades from Red Hat Ceph Storage 4.x to 5.x would lead to a persistent HTTP 500 error.

With this fix, the versioning conflict is resolved and the upgrade works as expected. However, as a result of this fix, iSCSI REST API responses aren’t pretty-printed.

(BZ#2121462)

6.11. The Ceph Ansible utility

Upgrade workflow with Ceph Object Gateway configuration is fixed

Previously, whenever set_radosgw_address.yml was called from the dashboard playbook execution, the fact is_rgw_instances_defined was expected to be set if rgw_instances was defined in group_vars/host_vars by the user. Otherwise, the next task that sets the fact rgw_instances will be executed under the assumption that it wasn’t user defined. This caused the upgrade workflow to break when deploying the Ceph Object Gateway multisite and Ceph Dashboard.

With this fix, ceph-ansible sets the parameter when set_radosgw_address.yml playbook is called from the dashboard playbook and the upgrade workflow works as expected.

(BZ#2117672)

The fact condition is updated to execute only on the Ceph Object Gateway nodes

Previously, due to set_fact _radosgw_address to radosgw_address_block ipv4 being executed on all nodes, including the ones where no Ceph Object Gateway network range was present, playbooks failed to work.

With this fix, the when condition is updated to execute the fact setting only on the Ceph Object Gateway nodes and now works as expected.

(BZ#2136551)