Chapter 6. Bug fixes

This section describes bugs with significant impact on users that were fixed in this release of Red Hat Ceph Storage. In addition, the section includes descriptions of fixed known issues found in previous versions.

6.1. The Ceph Ansible utility

Alertmanager does not log errors when self-signed or untrusted certificates are used

Previously, when using untrusted CA certificates, Alertmanager generated many errors in the logs.

With this release, the ceph-ansible can set the insecure_skip_verify parameter to true in the alertmanager.yml file by setting alertmanager_dashboard_api_no_ssl_verify: true in the group_vars/all.yml file when using self-signed or untrusted certificates and the Alertmanager does not log those errors anymore and works as expected.


Use a fully-qualified domain name (FQDN) when HTTPS is enabled in a multi-site configuration

Previously, in a multi-site Ceph configuration, ceph-ansible would not differentiate between HTTP and HTTPS and set the zone endpoints with the IP address instead of the host name when HTTPS was enabled.

With this release, ceph-ansible uses the fully-qualified domain name (FQDN) instead of the IP address when HTTPS is enabled and the zone endpoints are set with the FQDN and match the TLS certificate CN.


Add the --pid-limits parameter as -1 for podman and 0 for docker in the systemd file to start the container

Previously, the number of processes allowed to run in containers, 2048 for podman and 4096 for docker, were not sufficient to start some containers which needed to start more processes than these limits.

With this release, you can remove the limit of maximum processes that can be started by adding the --pid-limits parameter as -1 for podman and as 0 for docker in the systemd unit files. As a result, the containers start even if you customize the internal processes which might need to run more processes than the default limits.


ceph-ansible pulls the monitoring container images in a dedicated task behind the proxy

Previously, ceph-ansible would not pull the monitoring container images such as Alertmanager, Prometheus, node-exporter, and Grafana in a dedicated task and would pull images when the systemd service was started.

With this release, ceph-ansible supports pulling monitoring container images behind a proxy.


The ceph-ansible playbook creates the radosgw system user and works as expected

Previously, the ceph-ansible playbook failed to create the radosgw system user and failed to deploy the dashboard when rgw_instances was set at the host_vars or group_vars level in a multi-site deployment. This variable is not set on Ceph Monitor nodes and given that this where the tasks are delegated, it failed.

With this release, ceph-ansible checks all the Ceph Object Gateway instances that are defined and sets a boolean fact to check if at least one instance has the rgw_zonemaster set to ‘True'. The radosgw system user is created and the playbook works as expected.


The Ansible playbook does not fail when used with --limit option

Previously, the dashboard_server_addr parameter was unset when the Ansible playbook was run with the --limit option and the playbook would fail if the play target did not match the Ceph Manager hosts in a non-collocated scenario.

With this release, you have to set the dashboard_server_addr parameter on the Ceph Manager nodes and the playbook works as expected.


6.2. Ceph Management Dashboard

The “Client Connection” panel is replaced with “MGRs” on the Grafana dashboard

Previously, the “Client Connection” panel displayed the Ceph File System information and was not meaningful.

With this release, "MGRs" replaces the "Client Connection" panel and displays the count of the active and standby Ceph Managers.


The Red Hat Ceph Storage Dashboard displays the values for disk IOPS

Previously, the Red Hat Ceph Storage Dashboard would not display the Ceph OSD disk performance in the Hosts tab.

With this release, the Red Hat Ceph Storage Dashboard displays the expected information about the Ceph OSDs, host details, and the Grafana graphs.


6.3. The Ceph Volume utility

The add-osd.yml playbook does not fail anymore while creating new OSDs

Previously, the add-osd.yml playbook would fail when new OSDs were added using ceph-ansible. This was due to the ceph-volume lvm batch limitation which does not allow addition of new OSDs in a non-interactive mode.

With this release, the --yes and --report options are not passed to the command-line interface and the add-osd.yml playbook works as expected when creating new OSDs.


6.4. Ceph Object Gateway

The rgw_bucket_quota_soft_threshold parameter is disabled

Previously, the Ceph Object Gateway fetched utilization information from the bucket index if the cached utilization reached rgw_bucket_quota_soft_threshold causing high operations on the bucket index and slower requests.

This release removes the rgw_bucket_quota_soft_threshold parameter and uses the cached stats resulting in better performance even if the quota limit is almost reached.


The radosgw-admin datalog trim command does not crash while trimming a marker

Previously, the radosgw-admin datalog trim command would crash when trimming a marker in the current generation from radosgw-admin due to a logic error.

This release fixes a logic error and log trimming occurs without the radosgw-admin datalog trim command crashing.


6.5. Ceph Manager plugins

The cluster health changes are no longer committed to persistent storage

Previously, rapid changes to the health of the storage cluster caused excessive logging to the ceph.audit.log.

With this release, the health_history is not logged to the ceph.audit.log and cluster health changes are no longer committed to persistent storage.