Red Hat Training

A Red Hat training course is available for Red Hat Ceph Storage

Chapter 5. Notable Bug Fixes

This section describes bugs fixed in this release of Red Hat Ceph Storage that have significant impact on users.

Calamari now correctly handles manually added OSDs that do not have "ceph-osd" running

Previously, when OSD nodes were added manually to the Calamari server but the ceph-osd daemon was not started on the nodes, the Calamari server returned error messages and stopped updating statuses for the rest of the OSD nodes. The underlying source code has been modified, and Calamari now handles such OSDs properly. (BZ#1360467)

OSDs no longer reboot when corrupted snapsets are found during scrubbing

Previously, Ceph incorrectly handled corrupted snapsets that were found during scrubbing. This behavior caused the OSD nodes to terminate unexpectedly every time the snapsets were detected. As a consequence, the OSDs rebooted every few minutes. With this update, the underlying source code has been modified, and OSDs no longer reboots in the described situation. (BZ#1273127)

OSD now deletes old OSD maps as expected

When new OSD maps are received, the OSD daemon marks the unused OSD maps as stale and deletes them to keep up with the changes. Previously, an attempt to delete stale OSD maps could fail for various reasons. As a consequence, certain OSD nodes were sometimes marked as down if it took too long to clean their OSD map caches when booting. With this update, the OSD daemon deletes old OSD maps as expected, thus fixing this bug. (BZ#1291632)

%USED now shows correct value

Previously, the %USED column in the output of the ceph df command erroneously showed the size of a pool divided by the raw space available on the OSD nodes. With this update, the column correctly shows the space used by all replicas divided by the raw space available on the OSD nodes. (BZ#1330643)

SELinux no longer prevents "ceph-mon" and "ceph-osd" from accessing /var/lock/ and /run/lock/

Due to insufficient SELinux policy rules, SELinux denied the ceph-mon and ceph-osd daemons to access files in the /var/lock/ and /run/lock/ directories. With this update, SELinux no longer prevents ceph-mon and ceph-osd from accessing /var/lock/ and /run/lock/. (BZ#1330279)

The QEMU process no longer hangs when creating snapshots on images

When the RADOS Block Device (RBD) cache was enabled, creating a snapshot on an image with active I/O operations could cause the QEMU process to become unresponsive. With this update, the QEMU process no longer hangs in the described scenario. (BZ#1316287)

"ceph-deploy" now correctly removes directories of manually added monitors

Previously, an attempt to remove a manually added monitor node by using the ceph-deploy mon destroy command failed with the following error:

UnboundLocalError: local variable 'status_args' referenced before assignment"

The monitor was removed despite the error, however, ceph-deploy failed to remove the monitor configuration directory located in the /var/lib/ceph/mon/ directory. With this update, ceph-deploy removes the monitor directory as expected. (BZ#1278524)

The least used OSDs are selected for increasing the weight

With this update, the least used OSD nodes are now selected for increasing the weight during the reweight-by-utilization process. (BZ#1333907)

OSDs are now selected properly during "reweight-by-utilization"

During the reweight-by-utilization process, some of the OSD nodes that met the criteria for reweighting were not selected. The underlying algorithm has been modified, and OSDs are now selected properly during reweight-by-utilization. (BZ#1331764)

OSDs no longer receive unreasonably large weight during "reweight-by-utilization"

When the value of the max_change parameter was greater than an OSD weight, an underflow occurred. Consequently, the OSD node could receive an unreasonably large weight during the reweight-by-utilization process. This bug has been fixed, and OSDs no longer receive large weight in the described situation. (BZ#1331523)

OSDs no longer crash when using "rados cppool" to copy an "omap" object

The omap objects cannot be stored in an erasure-coded pool. Previously, copying the omap objects from a replicated pool to an erasure-coded pool by using the rados cppool command caused the OSD nodes to terminate unexpectedly. With this update, the OSD nodes return an error message instead of crashing in the described situation. (BZ#1368402)

Listing versioned buckets no longer hangs

Due to a bug in the bucket listing logic, the radosgw-admin bucket list and radosgw-admin bucket stats commands could become unresponsive while attempting to list versioned buckets or get their statistics. This bug has been fixed, and listing versioned buckets no longer hangs in the described situation. (BZ#1322239)

Ceph Object Gateway now properly uploads files to erasure-coded pools

Under certain conditions, Ceph Object Gateway did not properly upload files to an erasure-coded pool by using the SWIFT API. Consequently, such files were broken and an attempt to download them failed with the following error message:

ERROR: got unexpected error when trying to read object: -2

The underlying source code has been modified, and Ceph Object Gateway now properly uploads files to erasure-coded pools. (BZ#1369013)

The ceph osd tell command now prints correct error message

When the deprecated ceph osd tell command was executed, the command returned a misleading error message. With this update, the error message is correct. (BZ#1193710)

"filestore_merge_threshold" can be set to a negative value as expected

If the filestore_merge_threshold parameter is set to a negative value, merging of subdirectories is disabled. Previously, an attempt to set filestore_merge_threshold to a negative value by using the command line failed and an error message similar to the following one was returned:

"error": "error setting 'filestore_merge_threshold' to '-40': (22) Invalid argument"

As a consequence, it was not possible to disable merging of subdirectories. This bug has been fixed, and filestore_merge_threshold can now be set to a negative value as expected. (BZ#1284696)

"radosgw-admin region-map set" output includes the bucket quota

Previously, the output of the radosgw-admin region-map set command did not include the bucket quota, which led to confusion if the quota was properly set. With this update, the radosgw-admin region-map set output includes the bucket quota as expected. (BZ#1349484)

The form of the "by-parttypeuuid" term is now correct

The ceph-disk(8) manual page and the ceph-disk python script now include the correct form of the by-parttypeuuid term. Previously, they included by-parttype-uuid instead. (BZ#1335564)

Index files are removed as expected after deleting buckets

Previously, when deleting buckets, the buckets' index files remained in the .rgw.buckets.index file. With this update, the index files are removed as expected. (BZ#1340496)

"ceph df" now shows proper value of "MAX AVAIL"

When adding a new OSD node to the cluster by using the ceph-deploy utility with the osd_crush_initial_weight option set to 0, the value of the MAX AVAIL field in the output of the ceph df command was 0 for each pool instead of the proper numerical value. As a consequence, other applications using Ceph, such as OpenStack Cinder, assumed that there is no space available to provision new volumes. This bug has been fixed, and ceph df now shows proper value of MAX AVAIL as expected. (BZ#1306842)

The columns in the "rados bench" command output are now separated correctly

This update ensures that the columns in the rados bench command output are separated correctly. (BZ#1332470)

OSDs now obtain PID files properly during an upgrade

After upgrading from Red Hat Ceph Storage 1.2 to 1.3, some of the OSD daemons did not obtain PID files properly. As a consequence, such OSDs could not be restarted or stopped by using SysVinit commands and therefore could not be upgraded to the newer version. This update ensures that OSDs obtain PID files properly during an upgrade. As a result, OSDs are upgraded to newer versions as expected. (BZ#1299409)

The default value of "osd_scrub_thread_suicide_timeout" is now 300

The osd_scrub_thread_suicide_timeout configuration option ensures that poorly behaving OSD nodes self-terminate instead of running in degraded states and slowing traffic. Previously, the default value of osd_scrub_thread_suicide_timeout was set to 60 seconds. This value was not sufficient when scanning data for objects on extremely large buckets. This update increases the default value of osd_scrub_thread_suicide_timeout 300. (BZ#1300539)

PG collection split no longer produces any orphaned files

Due to a bug in the underlying source code, a placement group (PG) collection split could produce orphaned files. Consequently, the PG could be incorrectly marked as inconsistent during scrubbing, or the OSD nodes could terminate unexpectedly. The bug has been fixed, and PG collection split no longer produces any orphaned files. (BZ#1334534)

The bucket owner is now properly changed

Previously, the bucket owner was not properly changed by using the radosgw-admin bucket unlink and radosgw-admin bucket link commands. As a consequence, the new owner was not able to access the bucket. The underlying source code has been modified, and the bucket owner is now properly changed as expected. (BZ#1324497)

The monitor nodes exit gracefully after authenticating with an incorrect keyring

When a new cluster included monitor nodes that were previously a part of an another cluster, the monitor nodes terminated with a segmentation fault when attempting to authenticate with an incorrect keyring. With this update, the monitor nodes exit gracefully instead of crashing in the described scenario. (BZ#1312587)