Release Notes

Red Hat Ceph Storage 5.3

Release notes for Red Hat Ceph Storage 5.3.z4

Red Hat Ceph Storage Documentation Team

Abstract

The release notes describe the major features, enhancements, known issues, and bug fixes implemented for the Red Hat Ceph Storage 5 product release. This covers the latest updates from the previous release notes up to the current release notes.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. We are beginning with these four terms: master, slave, blacklist, and whitelist. Because of the enormity of this endeavor, these changes will be implemented gradually over several upcoming releases. For more details, see our CTO Chris Wright’s message.

Providing feedback on Red Hat Ceph Storage documentation

We appreciate your input on our documentation. Please let us know how we could make it better. To do so:

  • For simple comments on specific passages, make sure you are viewing the documentation in the multi-page HTML format. Highlight the part of text that you want to comment on. Then, click the Add Feedback pop-up that appears below the highlighted text, and follow the displayed instructions.
  • For submitting more complex feedback, create a Bugzilla ticket:

    1. Go to the Bugzilla website.
    2. In the Component drop-down, select Documentation.
    3. In the Sub-Component drop-down, select the appropriate sub-component.
    4. Select the appropriate version of the document.
    5. Fill in the Summary and Description field with your suggestion for improvement. Include a link to the relevant part(s) of documentation.
    6. Optional: Add an attachment, if any.
    7. Click Submit Bug.

Chapter 1. Introduction

Red Hat Ceph Storage is a massively scalable, open, software-defined storage platform that combines the most stable version of the Ceph storage system with a Ceph management platform, deployment utilities, and support services.

The Red Hat Ceph Storage documentation is available at https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5.

Chapter 2. Acknowledgments

The Ceph Storage project is seeing amazing growth in the quality and quantity of contributions from individuals and organizations in the Ceph community. We would like to thank all members of the Red Hat Ceph Storage team, all of the individual contributors in the Ceph community, and additionally, but not limited to, the contributions from organizations such as:

  • Intel®
  • Fujitsu ®
  • UnitedStack
  • Yahoo ™
  • Ubuntu Kylin
  • Mellanox ®
  • CERN ™
  • Deutsche Telekom
  • Mirantis ®
  • SanDisk ™
  • SUSE

Chapter 3. Bug fixes

This section describes bugs with significant user impact, which were fixed in this release of Red Hat Ceph Storage. In addition, the section includes descriptions of fixed known issues found in previous versions.

3.1. The Cephadm utility

Cephadm adoption now handles opening the necessary ports

Previously, Cephadm adoption would not handle opening ports. As a result, after adopting a Prometheus daemon from a ceph-ansible cluster as part of the upgrade from Red Hat Ceph Storage 4 to Red Hat Ceph Storage 5, that Prometheus daemon would not be able to function properly as the necessary firewall ports would be closed.

With this fix, Cephadm adoption handles opening the necessary ports and users do not have issues with the standard firewall blocking the ports Prometheus needs to function after adoption from a ceph-ansible controlled cluster.

Bugzilla:2186324

tcmu-runner process is now restarted on failure

Previously, for Cephadm iSCSI deployments, tcmu-runner and rbd-target-api were bundled into a single systemd unit, with rbd-target-api being the primary process and tcmu-runner running in the background. Due to this, if the tcmu-runner process crashed, systemd would not automatically attempt to restart it as it would only monitor the rbd-target-api process.

With this fix, tcmu-runner is automatically restarted with a script on crashing, unless there are too many crashes in short succession.

Bugzilla:2196229

extra_container_args is applied to both rbd-target-api and tcmu-runner container

Previously, tcmu-runner was handled as a background service in Cephadm’s deployment of ceph-iscsi. Due to this, special arguments for the service, such as extra_container_args, were not applied to the tcmu-runner container when deploying iSCSI but only to the rbd-target-api container.

With this fix, the extra_container_args specified in an iSCSI service specification is applied to both rbd-target-api and tcmu-runner container.

Bugzilla:2203150

tcmu-runner no longer stops logging after its log file is rotated

Previously, tcmu-runner was left out of the postrotate actions of the logrotate configuration that Cephadm generated for rotating the logs of Ceph daemons on the host. Due to this, tcmu-runner would eventually stop logging as proper signals were not sent to regenerate its log file, which is done for other Ceph daemons using the previously mentioned postrotate actions in the logrotate configuration.

With this fix, tcmu-runner is added to the postrotate actions in the logrotate file that Cephadm deploys for rotation of Ceph daemons logs. tcmu-runner no longer stops logging after its log file is rotated.

Bugzilla:2204505

Added support to configure retention.size parameter in Prometheus’s specification file

Previously, Cephadm Prometheus’s specification would not support configuring retention.size parameter. A ServiceSpec exception arose whenever the user included this parameter in the specification file. Due to this, the user could not limit the size of Prometheus’s data directory.

With this fix, users can configure the retention.size parameter in Prometheus’s specification file. Cephadm passes this value to the Prometheus daemon allowing it to control the disk space usage of Prometheus by limiting the size of the data directory.

Bugzilla:2207748

3.2. Ceph Dashboard

Dashboard is now accessible when url_prefix configuration is enabled

Previously, the downstream branch missed cherry-picking the changes that were required in index.html from upstream causing the dashboard to be inaccessible when enabling the url_prefix in Red Hat Ceph Storage 5.2 Dashboard configuration.

With this fix, the required changes are cherry-picked from upstream and the Red Hat Ceph Storage Dashboard is now accessible when enabling url_prefix configuration.

Bugzilla:2210212

3.3. Ceph File System

Link request no longer fail with -EXDEV

Previously, if an inode had more than one link and one of its dentries was unlinked, it would be moved to the stray directory. Due to this, if a link request came before the link merge/migrate completed, it would fail with -EXDEV error.

With this fix, waiting for the previous link merge/migrate to finish or purging operations to finish link requests allows link operations to succeed.

Bugzilla:2196403

cephfs-top -d [--delay] command accepts integer values

With this fix, -d/--delay accepts only integer values and the code is fixed accordingly. curses.halfdelay() now accepts only integer values.

Bugzilla:2196706

The minimum compatible python version for cephfs-top is 3.6.0

With this fix, with the minimum compatible python version for cephfs-top utility being 3.6.0, checks are made to analyze if the current python version is greater than or equal to 3.6.0 during the build and thus ensure that the cephfs-top curses display are launched successfully.

Bugzilla:2203168

Deadlock no longer occurs between unlink request and reintegration operation

Previously, there was a race between two unlink requests when operating the same inode. The first unlink request would trigger to reintegrate the inode with existing dentries. But the second one would be a deadlock with the reintegration operation. This caused a deadlock between unlink request and reintegration operation.

With this fix, for the unlink requests, when the previous reintegrating operation finishes, the deadlock is broken.

Bugzilla:2203258

Client programs no longer crash after releasing memory

Previously, when releasing memory, the pointer access would be saved for future reference. This caused the programs to crash on the client-side.

With this fix, memory is allocated and UserPerm contents are copied to newly allocated memory instead of saving pointers for future reference, thereby preventing the programs from crashing.

Bugzilla:2203909

Allocating inodes no longer fails

Previously, when replaying the journals, if the inodetable or the sessionmap versions did not match with the ones in the MDS caches, the corresponding CInodes would be added to the inode_map, but without removing the ino# from the inodetable or the sessions prealloc inos list. This would cause the allocation of CInode to fail as it was already added in the inode map.

With this fix, when allocating new CInode, the corresponding ino# is skipped.

Bugzilla:2207491

3.4. Ceph Manager plugins

Emails generated are not flagged as spam

Previously, the email header created by the Ceph alert manager module would not include the message-id and date fields and the mails would get flagged as spam.

With this release, the email header is modified to include these two fields and the emails generated by the module are no longer flagged as spam.

Bugzilla:2210906

Python tasks no longer wait for the GIL

Previously, the Ceph manager daemon held the Python global interpreter lock (GIL) during some RPCs with the Ceph MDS, due to which, other Python tasks are starved waiting for the GIL.

With this fix, the GIL is released during all libcephfs/librbd calls and other Python tasks may acquire the GIL normally.

Bugzilla:2219093

3.5. The Ceph Volume utility

Re-running ceph-volume lvm batch command against created devices is now possible

Previously, in ceph-volume, lvm membership was not set for mpath devices like it was for other types of supported devices. Due to this, re-running the ceph-volume lvm batch command against already created devices was not possible.

With this fix, the lvm membership is set for mpath devices and re-running ceph-volume lvm batch command against already created devices is now possible.

Bugzilla:2215042

Adding new OSDs with pre-created LVs no longer fails

Previously, due to a bug, ceph-volume did not filter out the devices already used by Ceph. Due to this, adding new OSDs with ceph-volume used to fail when using pre-created LVs.

With this fix, devices already used by Ceph are filtered out as expected and adding new OSDs with pre-created LVs no longer fails.

Bugzilla:2209319

3.6. RADOS

Manager continues to send beacons in the event of an error during authentication check

Previously, if an error was encountered when performing an authentication check with a monitor, the manager would get into a state where it would no longer have an active connection. Due to this, the manager could no longer send beacons and the monitor would mark it as lost.

With this fix, a session (active con) is reopened in the event of an error and the manager is able to continue to send beacons and is no longer marked as lost.

Bugzilla:2192479

3.7. The Ceph Ansible utility

Dependency to ansible-collection-ansible is created when deploying ceph-ansible

Previously, for the cephadm-adopt.yml playbook, an additional ansible library (ansible-utils) was used to work with ipv4 and ipv6 mixed environments. As ansible-utils is not deployed by default, there were missing dependencies.

With this fix, a dependency to ansible-collection-ansible is created when deploying ceph-ansible and cephadm-adopt.yml playbook completes successfully.

Bugzilla:2207872

3.8. Object gateway

Blocksize is changed to 4K

Previously, Ceph Object Gateway GC processing would consume excessive time due to the use of a 1K blocksize that would consume the GC queue. This caused slower processing of large GC queues.

With this fix, blocksize is changed to 4K, which has accelerated the processing of large GC queues.

Bugzilla:2215062

Chapter 4. Known issues

This section documents known issues found in this release of Red Hat Ceph Storage.

4.1. Multi-site Ceph Object Gateway

md5 mismatch of replicated objects when testing Ceph Object gateway’s server-side encryption in multi-site

Presently, a md5 mismatch of replicated objects is observed when testing Ceph Object gateway’s server-side encryption in multi-site. The data corruption is specific to S3 multipart uploads with SSE encryption enabled. The corruption only affects the replicated copy. The original object remains intact.

Encryption of multipart uploads requires special handling around the part boundaries because each part is uploaded and encrypted separately. In multi-site, objects are encrypted, and multipart uploads are replicated as a single part. As a result, the replicated copy loses its knowledge about the original part boundaries required to decrypt the data correctly, which causes this corruption.

As a workaround, multi-site users should not use server-side encryption for multipart uploads. For more detailed information, see the KCS Sever side encryption with RGW multisite configuration might lead to data corruption of multipart objects.

Bugzilla:2214252

Chapter 5. Sources

The updated Red Hat Ceph Storage source code packages are available at the following location:

Legal Notice

Copyright © 2023 Red Hat, Inc.
The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/. In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version.
Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.
Red Hat, Red Hat Enterprise Linux, the Shadowman logo, the Red Hat logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
Java® is a registered trademark of Oracle and/or its affiliates.
XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.
MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.
Node.js® is an official trademark of Joyent. Red Hat is not formally related to or endorsed by the official Joyent Node.js open source or commercial project.
The OpenStack® Word Mark and OpenStack logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation's permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.
All other trademarks are the property of their respective owners.