Chapter 6. Bug fixes

This section describes notable bug fixes introduced in Red Hat OpenShift Data Foundation 4.9.

Multicloud Object Gateway storage class deleted during uninstall

Previously, Multicloud Object Gateway (MCG) storage class which was deployed as a part of OpenShift Data Foundation deployment was not deleted during uninstall.

With this update, the Multicloud Object Gateway (MCG) storage class gets removed while uninstalling OpenShift Data Foundation.

(BZ#1892709)

OpenShift Container Platform alert when OpenShift Container Storage quorum lost

Previously, CephMonQuorumAtRisk alert was fired when mon quorum was about to be lost, but there was no alert triggered after losing the quorum. This resulted in no notification being sent when the mon quorum was completely lost.

With this release, a new alert, CephMonQuorumLost is introduced. This alert is triggered when you have only one node left and a single mon is running on it. However, at this point the cluster will be in unrecoverable state and the alert serves as a notification of the issue.

(BZ#1944513)

Reduce the mon_data_avail_warn from 30 % to 15%

Previously, the mon_data_avail_warn alert was triggered when the mon store was less than 30% and it did not match the threshold value of OpenShift Container Platform’s garbage collector for images, which is 15%. With this release, you will see the alert when the available storage at the mon store location is less than 15% and not less than 30%.

(BZ#1964055)

OSD pods do not log anything if the initial deployment is OpenShift Container Storage 4.4

Previously, object storage daemon (OSD) logs were not generated when OpenShift Container Storage 4.4 was deployed. With this update, the OSD logs are generated correctly.

(BZ#1974343)

Multicloud Object Gateway was not able to initialize in a fresh deployment

Previously, after the internal database change from MongoDB to PostgreSQL, duplicate entities that should be unique could be added to the database (MongoDB prevented duplicate entities earlier) due to which Multicloud Object Gateway (MCG) was not working. With this release, duplicate entities are prevented.

(BZ#1975645)

PVC is restored when using two different backend paths for the encrypted parent

Previously, when restoring a persistent volume claim (PVC) from a volume snapshot into a different storage class with a different encryption KMSID, the restored PVC went into the Bound state and the restored PVC failed to get attached to a Pod. This was because the encryption passphrase was being copied with the parent PVC’s storage class encryption KMSID config. With this release, the restored PVC’s encryption passphrase is copied with the correct encryption KMSID config from the destination storage class. Hence, the PVC is successfully restored into a storage class with a different encryption KMSID than its parent PVC.

(BZ#1975730)

Deletion of data is allowed when the storage cluster is full

Previously, when the storage cluster was full, the Ceph Manager hung on checking pool permissions while reading the configuration file. The Ceph Metadata Server (MDS) did not allow write operations to occur when the Ceph OSD was full, resulting in an ENOSPACE error. When the storage cluster hit full ratio, users could not delete data to free space using the Ceph Manager and ceph-volume plugin.

With this release, the new FULL feature is introduced. This feature gives the Ceph Manager FULL capability, and bypasses the Ceph OSD full check. Additionally, the client_check_pool_permission option can be disabled. With the Ceph Manager having FULL capabilities, the MDS no longer blocks Ceph Manager calls. This allows the Ceph Manager to free up space by deleting subvolumes and snapshots when a storage cluster is full.

(BZ#1978769)

Keys are completely destroyed in Vault after deleting encrypted persistent volume claims (PVCs) while using the kv-v2 secret engine

HashiCorp Vault added a feature for the key-value store v2 where deletion of the stored keys makes it possible to recover the contents in case the metadata of the deleted key is not removed in a separate step. When using key-value v2 storage for secrets in HashiCorp Vault, deletion of volumes did not remove the metadata of the encryption passphrase from the KMS.

With this update, the keys in HashiCorp Vault is completely destroyed by default when a PVC is deleted. You can set the new configuration option VAULT_DESTROY_KEYS to false to enable the previous behavior. In that case, the metadata of the keys will be kept in HashiCorp Vault so that recovery of the encryption passphrase of the removed PVC is possible.

(BZ#1979244)

Multicloud Object Gateway object bucket creation is going to Pending Phase

Previously, after the internal database change from MongoDB to PostgreSQL, duplicate entries that should be unique could be added to the database (MongoDB prevented duplicate entries earlier). As a result, creation of new resources such as buckets, backing stores, and so on failed. With this release, duplicate entries are prevented.

(BZ#1980299)

Deletion of CephBlockPool gets stuck and blocks the creation of new pools

Previously, in a Multus enabled cluster, the Rook Operator did not have access to the object storage daemon (OSD) network as it did not have the network annotations. As a result, the rbd type commands during a pool cleanup would hang because the OSDs could not be contacted.

With this release, the operator proxies the rbd command through a sidecar container in the mgr pod and runs successfully during the pool cleanup.

(BZ#1983756)

Standalone Multicloud Object Gateway failing to connect

Previously, the Multicloud Object Gateway (MCG) CR was not updated properly because of the change in the internal DB from MongoDB to PostgreSQL. This caused issues in certain flows. As a result, MCG components were not able to communicate with one another and MCG failures occurred on upgrade.

With this release, MCG CR issue is fixed.

(BZ#1984284)

Monitoring spec is getting reset in CephCluster resource in external mode

Previously, when OpenShift Container Storage was upgraded, the monitoring endpoints would get reset in external CephCluster’s monitoring spec. This was not an expected behavior and was due to the way monitoring endpoints were passed to the CephCluster. With this update, the way endpoints are passed is changed. Before the CephCluster is created, the endpoints are accessed directly from the JSON secret, rook-ceph-external-cluster-details and the CephCluster spec is updated. As a result, the monitoring endpoint specs in the CephCluster is updated properly with appropriate values even after the OpenShift Container Storage upgrade.

(BZ#1984735)

CrashLoopBackOff state of noobaa-db-pg-0 pod when enabling hugepages

Previously, enabling hugepages on OpenShift Container Platform cluster caused the Multicloud Object Gateway (MCG) database pod to go into a CrashLoopBackOff state. This was due to wrong initialization of PostgreSQL. With this release, MCG database pod’s initialization of PostgreSQL is fixed.

(BZ#1995271)

Multicloud Object Gateway unable to create new object bucket claims

Previously, performance degradation when working against the Multicloud Object Gateway (MCG) DB caused back pressure on all the MCG components which resulted in failure to execute flows within the system such as configuration flows and I/O flows.

With this update, the most time consuming queries are fixed, the DB is cleared quickly, and no back pressure is created.

(BZ#1998680)

Buckets fail during creation because of an issue with checking attached resources

Previously, because of a problem in checking resources attached to a bucket during its creation, the bucket would fail to be created. The conditions in the resource validation during bucket creation have been fixed, and the buckets are created as expected.

(BZ#2000588)

NooBaa Operator still checks for noobaa-db service after upgrading

Previously, when OpenShift Container Storage was updated from version 4.6, there was a need to retain the old and the new noobaa-db StatefulSets for migration purposes. The code still supports both the names of sets. A failure message was generated on the old noobaa-db StatefulSet due to a small issue in the code which caused the operator to check the status of the old noobaa-db StatefulSet even though it was no longer relevant.

With this update, the operator stops checking the status of the old noobaa-db StatefulSet.

(BZ#2008821)

Changes to the config maps of the Multicloud Object Gateway (MCG) DB pod does not get reconciled after upgrade

Previously, changes to the config maps of the MCG DB pod did not apply after an upgrade. The flow has been fixed to properly take the variables from the config maps for the DB pod.

(BZ#2012930)