Red Hat Training

A Red Hat training course is available for Red Hat Ceph Storage

Chapter 6. Known issues

This section documents known issues found in this release of Red Hat Ceph Storage.

6.1. The ceph-ansible Utility

The shrink-osd.yml playbook currently has no support for removing OSDs created by ceph-volume

The shrink-osd.yml playbook assumes all OSDs are created by the ceph-disk utility. Consequently, OSDs deployed by using the ceph-volume utility cannot be shrunk.

To work around this issue, remove OSDs deployed by using ceph-volume manually.

(BZ#1569413)

Partitions are not removed from NVMe devices by shrink-osd.yml in certain situations

The Ansible playbook infrastructure-playbooks/shrink-osd.yml does not properly remove partitions on NVMe devices when used with osd_scenario: non-collocated in containerized environments.

To work around this issue, manually remove the partitions.

(BZ#1572933)

When putting a dedicated journal on an NVMe device installation can fail

When the dedicated_devices setting contains an NVMe device and it has partitions or signatures on it Ansible installation might fail with an error like the following:

journal check: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected c325f439-6849-47ef-ac43-439d9909d391, invalid (someone else's?) journal

To work around this issue, ensure there are no partitions or signatures on the NVMe device.

(BZ#1619090)

When deploying Ceph NFS Ganesha gateways on Ubuntu IPv6 systems ceph-ansible may fail to start the nfs-ganesha services

This issue causes Ceph NFS Ganesha gateways to fail to deploy.

To work around this issue, rerun ceph-ansible playbook site.yml to deploy only the Ceph NFS Ganesha gateways:

[root@ansible ~]# ansible-playbook /usr/share/ceph-ansible/site.yml --limit nfss

(BZ#1656908)

When using dedicated devices for BlueStore the default sizes for block.db and block.wal might be too small

By default ceph-ansible does not override the default values bluestore block db size and bluestore block wal size. The default sizes are 1 GB and 576 MB respectively. These sizes might be too small when using dedicated devices with BlueStore.

To work around this issue, set bluestore_block_db_size or bluestore_block_wal_size, or both, using ceph_conf_overrides in ceph.conf to override the default values.

(BZ#1657883)

6.2. Ceph Management Dashboard

Ceph OSD encryption summary is not displayed in the Red Hat Ceph Storage Dashboard

On the Ceph OSD Information dashboard, under the OSD Summary panel, the OSD Encryption Summary information is not displayed.

There is no workaround at this time.

(BZ#1605241)

The Prometheus node-exporter service is not removed after purging the Dashboard

When purging the Red Hat Ceph Storage Dashboard, the node-exporter service is not removed, and is still running.

To work around this issue, manually stop and remove the node-exporter service.

Perform the following commands as root:

# systemctl stop prometheus-node-exporter
# systemctl disable prometheus-node-exporter
# rpm -e prometheus-node-exporter
# reboot

For Ceph Monitor, OSD, Object Gateway, MDS, and Dashboard, nodes, reboot these one at a time.

(BZ#1609713)

The OSD down tab shows an incorrect value

When rebooting OSDs, the OSD down tab in the CEPH Backend storage dashboard shows the correct number of OSDs that are down. However, when all OSDs are up again after the reboot, the tab continues showing the number of down OSDs.

There is no workaround at this time.

(BZ#1652233)

The Top 5 pools by Throughput graph lists all pools

The Top 5 pools by Throughput graph in the Ceph Pools tab lists all pools in the cluster instead of listing only the top five pools with the highest throughput.

There is no workaround at this time.

(BZ#1652807)

The MDS Performance dashboard displays the wrong value for Clients after increasing and decreasing the number of active MDS servers and clients multiple times.

This issue causes the Red Hat Ceph Storage dashboard to display the wrong number of CephFS clients. This can be verified by comparing the value in the Red Hat Ceph Storage dashboard with the value printed by the ceph fs status $FILESYSTEM_NAME command.

There is no workaround at this time.

(BZ#1652896)

Request Queue Length displays an incorrect value

In the Ceph RGW Workload dashboard, the Request Queue Length parameter always displays 0 even when running Ceph Object Gateways I/Os from different clients.

There is no workaround at this time.

(BZ#1653725)

Capacity Utilization in Ceph - At Glance dashboard shows the wrong value when an OSD is down

This issue causes the Red Hat Ceph Dashboard to show capacity utilization which is less than what ceph df shows.

There is no workarond at this time.

(BZ#1655589)

Some links on the Ceph - At Glance page do not work after installing ceph-metrics

After installing ceph-metrics, some of the panel links on the Ceph - At Glance page in the Ceph Dashboard do not work.

To work around this issue, clear the browser cache and reload the Ceph Dashboard site.

(BZ#1655630)

The iSCSI Overview dashboard does not display graphs if the [iscsigws] role is included in the Ansible inventory file.

When deploying the Red Hat Ceph Storage Dashboard, the iSCSI Overview dashboard does not display any graphs or values if the Ansible inventory file has the [iscsigws] role included for iSCSI gateways.

To work around this issue, add [iscsis] as a role in the Ansible inventory file and run the Ansible playbook for cephmetrics-ansible. The iSCSI Overview dashboard then displays the graphs and values.

(BZ#1656053)

In the Ceph Cluster dashboard the Pool Capacity graphs display values higher than actual capacity

This issue causes the Pool Capacity graph to display values around one percent higher than what df --cluster shows.

There is no workaround at this time.

(BZ#1656820)

Graphs on the OSD Node Detail dashboard might appear incorrect when used with All

Graphs generated under OSD Node Detail > OSD Host Name > All do not show all OSDs in the cluster. A graph with data for hundreds or thousands of OSDs would not be usable. The ability to set All is intended to show cluster-wide values. For some dashboards it does not make sense and should not be used.

There is no workaround at this time.

(BZ#1659036)

6.3. Ceph File System

The Ceph Metadata Server might crash during scrub with multiple MDS

This issue is triggered when the scrub_path command is run in an environment with multiple Ceph Metadata Servers.

There is no workaround at this time.

(BZ#1642015)

6.4. The ceph-volume Utility

Deploying an OSD on devices with GPT headers fails

Drives with GPT headers will cause an error to be returned by LVM when deploying an OSD on them. The error says the device has been excluded by a filter.

To work around this issue ensure there is no GPT header present on devices to be used by OSDs.

(BZ#1644321)

6.5. iSCSI Gateway

Using ceph-ansible to deploy the iSCSI gateway does not allow the user to adjust the max_data_area_mb option

Using the max_data_area_mb option with the ceph-ansible utility sets a default value of 8 MB. To adjust this value, set it manually using the gwcli command. See the Red Hat Ceph Storage Block Device Guide for details on setting the max_data_area_mb option.

(BZ#1613826)

Ansible fails to purge RBD images with snapshots

The purge-iscsi-gateways.yml Ansible playbook does not purge RBD images with snapshots. To purge the images and their snapshots, use the rbd command-line utility:

  • To purge a snapshot:

    rbd snap purge pool-name/image-name

    For example:

    # rbd snap purge data/image1
  • To delete an image:

    rbd rm image-name

    For example:

    # rbd rm image1

(BZ#1654346)

6.6. Object Gateway

Ceph Object Gateway garbage collection decreases client performance by up to 50% during mixed workload

In testing during a mixed workload of 60% read operations, 16% write operations, 14% delete operations, and 10% list operations, at 18 hours into the testing run, client throughput and bandwidth drop to half their earlier levels.

(BZ#1596401)

Pushing a docker image to the Ceph Object Gateway over s3 does not complete

In certain situations when configuring docker-distribution to use Ceph Object Gateway with the s3 interface the docker push command does not complete. Instead the command fails with an HTTP 500 error.

There is no workaround at this time.

(BZ#1604979)

Delete markers are not removed with a lifecycle configuration

In certain situations after deleting a file and a lifecycle triggers, delete markers are not removed.

There is no workaround at this time.

(BZ#1654820)

The Ceph Object Gateway’s S3 does not always work in FIPS mode

If a secret key of a Ceph Object Gateway user or sub-user is less than 112 bits in length, it can cause the radosgw daemon to exit unexpectedly when a user attempts to authenticate using S3.

This is because the FIPS mode Red Hat Enterprise Linux security policy forbids construction of a cryptographic HMAC based on a key of less than 112 bits, and violation of this constraint yields an exception that is not correctly handled in Ceph Object Gateway.

To work around this issue, ensure that the secret keys of Ceph Object Gateway users and sub-users are at least 112 bits in length.

(BZ#1687567)

6.7. RADOS

Performing I/O in CephFS erasure-coded pools can cause a failure on assertion

This issue is being investigated as a possible latent bug in the messenger layer which could be causing out of order operations on the OSD.

The issue causes the following error:

FAILED assert(repop_queue.front() == repop)

There is no workaround at this time. CephFS with erasure-coded pools are a Technology Preview. For more information see Creating Ceph File Systems with erasure coding in the Ceph File System Guide

(BZ#1637948)