Chapter 6. Known issues
This section documents known issues found in this release of Red Hat Ceph Storage.
6.1. Ceph Ansible
The shrink-osd.yml playbook currently has no support for removing OSDs created by ceph-volume
The shrink-osd.yml playbook assumes all OSDs are created by ceph-disk. As a result, OSDs deployed using ceph-volume cannot be shrunk.
As a workaround, OSDs deployed using ceph-volume can be removed manually.
The container does not restart on option changes
When changing an option, for example, ceph_osd_docker_memory_limit, the change will not trigger a restart of the container.
To work around this issue restart the container manually.
Purging the cluster will try to unmount a partition from /var/lib/ceph
If you mount a partition to /var/lib/ceph, running the purge playbook will cause a failure when it tries to unmount it.
To work around this issue, do not mount a partition to /var/lib/ceph.
When putting a dedicated journal on an NVMe device installation can fail
If dedicated_devices contains an NVMe device and it has partitions or signatures on it Ansible installation might fail with an error like the following:
journal check: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected c325f439-6849-47ef-ac43-439d9909d391, invalid (someone else's?) journal
To work around this issue ensure there are no partitions or signatures on the NVMe device.
Running the Ansible playbook, purge-iscsi-gateways.yml does not stop and disable the iSCSI gateway services
When purging the Ceph iSCSI gateways using Ceph Ansible, the iSCSI gateway services are still running. You must manually stop and disable these services by doing the following as root:
systemctl stop rbd-target-api systemctl stop rbd-target-rbd systemctl stop tcmu-runner systemctl disable rbd-target-api systemctl disable rbd-target-rbd systemctl disable tcmu-runner
If you are using the gwcli command to manage the iSCSI gateways, then do not stop or disable these services.
6.2. Ceph Dashboard
The 'iSCSI Overview' page does not disply correctly
When using the Red Hat Ceph Storage Dashboard, the 'iSCSI Overview' page does not display any graphs or values as it is expected to.
Ceph OSD encryption summary is not displayed in the Red Hat Ceph Storage Dashboard
On the Ceph OSD Information dashboard, under the OSD Summary panel, the OSD Encryption Summary information is not displayed. Currently, there is no work around for this issue.
The Prometheus node-exporter service is not removed after doing a purge
When doing a purge of the Red Hat Ceph Storage Dashboard, the node-exporter service is not removed, and is still running. To work around this issue, you must manually stop and remove the node-exporter service.
Do the following as root:
# systemctl stop prometheus-node-exporter # systemctl disable prometheus-node-exporter # rpm -e prometheus-node-exporter # reboot
For Ceph Monitor, OSD, Object Gateway, MDS, and Dashboard nodes, reboot these one at a time.
The OSD node details are not displayed in the Host OSD Breakdown panel
In the Red Hat Ceph Storage Dashboard, the Host OSD Breakdown information is not displayed on the OSD Node Detail panel under All.
Red Hat Ceph Storage Dashboard does not reflect correct OSDs
Currently, in the Ceph Cluster dashboard in some situations the Cluster Configuration tab can show the wrong number of OSDs. To work around this issue open the Ceph OSD Information dashboard and view the OSD Summary tab for the correct number of OSDs.
6.3. ceph-volume Utility
Using custom storage cluster names fails to start OSDs
When using a custom storage cluster name other than ceph, the OSDs might not start after a reboot.
To work around this issue, either do not use custom names when creating a new storage cluster, or create a symbolic link with the same name as the default configuration file name (/etc/ceph/ceph.conf) pointing to the custom named configuration file:
# mv /etc/ceph/ceph.conf /etc/ceph/ceph.conf.backup # ln -s /etc/ceph/<custom-name>.conf /etc/ceph/ceph.conf
As a result, the OSDs will start properly.
6.4. iSCSI Gateway
Using Ceph Ansible to deploy the iSCSI gateway does not allow the user to adjust the max_data_area_mb option
Setting the max_data_area_mb option with Ceph Ansible will set a default value of 8 MB. To adjust this value, you must set it manually using the gwcli command. See the Red Hat Ceph Storage Block Device Guide for details on setting the max_data_area_mb option.
An iSCSI device is busy according to the systemd-udevd service
In Red Hat Enterprise Linux 7.5, the kernel’s ALUA layer reduced the number of times an initiator retries the SCSI sense code ALUA State Transition. This code is returned from the target side by the tcmu-runner service when taking the RBD exclusive lock during a failover or failback scenario and when doing a device discovery. As a consequence, the maximum number of retries occurs before the discovery process has completed, and the SCSI layer will return a failure to the multipath IO layer. The multipath IO layer will try the next available path, and the same problem will occur. This causes a loop of path checking, resulting in failed IO, and management operations to the multipath device to fail. The logs on the initiator node will print messages about devices being removed and then re-added. To workaround this issued, downgrade the initiator’s kernel to Red Hat Enterprise Linux 7.4.
Rebooting an iSCSI initiator with connected devices leads to an error
During device and path setup, the initiator will send commands to all paths at the same time. This will cause the Ceph iSCSI gateways to take the RBD lock from one device and set it on another device. In some cases the iSCSI gateway will interpret the lock being taken away in this manner, as a hard error and escalate its error handler by dropping the iSCSI connection, reopening the RBD devices to clear old states, and then enabling the iSCSI target port group to allow a new iSCSI connection. When disabling and enabling the iSCSI target port group this will cause a disruption to the device and path discovery. In turn, this will cause the multipath IO layer to continually disable and enable all paths and IO is suspended, or device and path discovery can fail and the device is not setup. Currently, there is no workaround for this issue.
6.5. Object Gateway
The Ceph Object Gateway requires applications to write sequentially
The Ceph Object Gateway requires applications to write sequentially from offset 0 to the end of a file. Attempting to write out of order causes the upload operation to fail. To work around this issue, use utilities like cp, cat, or rsync when copying files into NFS space. Always mount with the sync option.
RGW garbage collection fails to keep pace during evenly balanced delete-write workloads
In testing during an evenly balanced delete-write (50% / 50%) workload the cluster fills completely in eleven hours. Object Gateway garbage collection fails to keep pace. This causes the cluster to fill completely and the status switches to HEALTH_ERR state. Aggressive settings for the new parallel/async garbage collection tunables did significantly delay the onset of cluster fill in testing, and can be helpful for many workloads. Typical real world cluster workloads are not likely to cause a cluster fill due primarily to garbage collection.
RGW garbage collection decreases client performance by up to 50% during mixed workload
In testing during a mixed workload of 60% reads, 16% writes, 14% deletes, and 10% lists, at 18 hours into the testing run, client throughput and bandwidth drop to half their earlier levels.
Large objects handled incorrectly on versioned swift containers
During uploads of large objects to versioned swift containers, please use the option --leave-segments in the upload using python-swiftclient. Not using this option will lead to an overwrite of the manifest file in which case an existing object is overwritten, leading to data loss.
6.6. RADOS
High object counts can degrade IO performance
The overhead with directory merging on FileStore can degrade the client’s IO performance for pools with high object counts.
To work around this issue, use the ‘expected_num_objects’ option during pool creation. Creating pools is described in the Red Hat Ceph Storage Object Gateway for Production Guide.
When two or more RADOS Gateway daemons have the same name in a cluster Ceph Manager can crash
Currently, Ceph Manager can crash if some RADOS Gateway daemons have the same name. The following assert will be generated in this case:
DaemonPerfCounters::update(MMgrReport*)
To work around this issue, rename all the RADOS Gateway daemons that have the same name with new unique names.

Where did the comment section go?
Red Hat's documentation publication system recently went through an upgrade to enable speedier, more mobile-friendly content. We decided to re-evaluate our commenting platform to ensure that it meets your expectations and serves as an optimal feedback mechanism. During this redesign, we invite your input on providing feedback on Red Hat documentation via the discussion platform.