Red Hat Training

A Red Hat training course is available for Red Hat Ceph Storage

Chapter 5. Known Issues

This section documents known issues found in this release of Red Hat Ceph Storage.

Adding an MDS to an existing cluster fails

Adding a Ceph Metadata Server (MDS) to an existing cluster fails with the error:

osd_pool_default_pg_num is undefined\n\nThe error appears to have been in '/usr/share/ceph-ansible/roles/ceph-mon/tasks/create_mds_filesystems.yml

As a consequence, an attempt to create an MDS pool fails.

To work around this issue, add the osd_pool_default_pg_num parameter to ceph_conf_overrides in the /usr/share/ceph-ansible/group_vars/all.yml file, for example:

ceph_conf_overrides:
  global:
 	osd_pool_default_pg_num: 64

(BZ#1461367)

OSD activation fails when running the osd_disk_activate.sh script in the Ceph container image when a cluster name contains numbers

In the Ceph container image, the osd_disk_activate.sh script considers all numbers included in a cluster name as an OSD ID. As a consequence, OSD activation fails when running the script because the script is seeking a keyring on a path based on an OSD ID that does not exist.

To workaround this issue, do not use cluster names that contain numbers. (BZ#1458512)

Multi-site configuration of the Ceph Object Gateway sometimes fails when options are changed at runtime

When the rgw md log max shards and rgw data log num shards options are changed at runtime in multi-site configuration of the Ceph Object Gateway, the radosgw process terminates unexpectedly with a segmentation fault.

To avoid this issue, do not change the aforementioned options at runtime, but set them during the initial configuration of the Ceph Object Gateway. (BZ#1330952)

Simultaneous upload operations to the same file cause I/O errors

Simultaneous upload operations to the same file location by different NFS clients cause I/O errors on both clients. Consequently, no data is updated in the Ceph Object Gateway cluster; if an object already existed in the cluster in the same location, it is unchanged.

To work around this problem, do not simultaneously upload to the same file location. (BZ#1420328)

Old zone group name is sometimes displayed alongside with the new one

In a multi-site configuration when a zone group is renamed, other zones can in some cases continue to display the old zone group name in the output of the radosgw-admin zonegroup list command.

To work around this issue:

  1. Verify that the new zone group name is present on each cluster.
  2. Remove the old zone group name:

    $ rados -p .rgw.root rm zonegroups_names.<old-name>

    (BZ#1423402)

Some OSDs fail to come up after reboot

On a machine with more than five OSDs, some OSDs fail to come up after a reboot because the systemd unit for the ceph-disk utility times out after 120 seconds.

To work around this problem, edit the /usr/lib/systemd/system/ceph-disk\@.service file and replace 120 with 7200. (BZ#1458007)

The GNU tar utility currently cannot extract archives directly into the Ceph Object Gateway NFS mounted file systems

The current version of the GNU tar utility makes overlapping write operations when extracting files. This behavior breaks the strict sequential write restriction in the current version of the Ceph Object Gateway NFS. In addition, GNU tar reports these errors in the usual way, but it also by default continues extracting the files after reporting the errors. As a result, the extracted files can contain incorrect data.

To work around this problem, use alternate programs to copy file hierarchies into the Ceph Object Gateway NFS. Recursive copying by using the cp -r command works correctly. Non-GNU archive utilities might be able to correctly extract the tar archives, but none have been verified. (BZ#1418606)

Updating a Ceph cluster deployed as a container rolling_update.yml fails

After updating a Ceph cluster deployed as a container image by using the rolling_update.yml playbook, the ceph-mon daemons are not restarted. As a consequence, they are unable to join the quorum after the upgrade.

To work around this issue, follow the steps described in the Updating Red Hat Ceph Storage deployed as a Container Image Knowledgebase article on the Red Hat Customer Portal instead of using rolling_update.yml. (BZ#1458024)

The --inconsistent-index option of the radosgw-admin bucket rm should never be used

Using the --inconsistent-index option with radosgw-admin bucket rm can cause corruption of the bucket index if the command fails or is stopped. Do not use this option. (BZ#1464554)

Failover and failback cause data sync issues in multi-site environments

In environments using the Ceph Object Gateway multi-site feature, failover and failback cause data sync to stall. This is because the radosgw-admin sync status command reports that data sync is behind for an extended period of time.

To workaround this issue, run radosgw-admin data sync init and restart gateways. (BZ#1459967)

The container image has incorrect owner and group IDs

In the Red Hat Ceph Storage container image, the owner and group IDs for some processes are incorrect. The group ID of the ceph-osd process is disk when it is supposed to be ceph. The owner and group IDs for the files /etc/ceph/ root:root when they it is supposed to be ceph:ceph. (BZ#1451349)

Using IPv6 addressing is not supported with containerized Ceph clusters

An attempt to deploy a Ceph cluster as a container image fails if IPv6 addressing is used. To work around this issue, use IPv4 addressing only. (BZ#1451786)

Ceph Object Gateway multi-site replication does not work

In the Ceph Object Gateway there is an option to set the hostname in the zonegroup for each gateway. When using multi-site replication, the Ceph Object Gateways responsible for replication (the gateways that are part of primary and secondary site zones as endpoints) should not use this option. This causes the multi-site replication feature to fail.

To workaround this issue, please use the default NULL. Comment out the rgw dns name option for the respective Ceph Object Gateways and restart them. (BZ#1464268)

Ceph Object Gateway crashes with Swift DLO operations

The Ceph Object Gateway crashes when a system user attempts Swift DLO operations. (BZ#1469355)