Monitoring gets stuck (and/or duplicates PVCs) upgrading to 4.4 when using local storage

Solution Verified - Updated -

Issue

  • When having local-storage-operator configured and upgrading from 4.3.x to 4.4 (<= 4.4.8), the upgrade gets blocked at monitoring level with the following status, for example:
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.23    True        True          31m     Unable to apply 4.4.6: the cluster operator monitoring has not yet successfully rolled out

$ oc get pvc -n openshift-monitoring
NAME                                       STATUS    VOLUME              CAPACITY   ACCESS MODES   STORAGECLASS    AGE
alertmanager-main-db-alertmanager-main-0   Pending                                                 local-storage   15m
alertmanager-main-db-alertmanager-main-1   Pending                                                 local-storage   15m
alertmanager-main-db-alertmanager-main-2   Pending                                                 local-storage   15m
localpvc-alertmanager-main-0               Bound     local-pv-68cdb92    100Gi      RWO            local-storage   8h
localpvc-alertmanager-main-1               Bound     local-pv-efbad35c   100Gi      RWO            local-storage   8h
localpvc-alertmanager-main-2               Bound     local-pv-98fe334e   100Gi      RWO            local-storage   8h
localpvc-prometheus-k8s-0                  Bound     local-pv-f06c680f   100Gi      RWO            local-storage   8h
localpvc-prometheus-k8s-1                  Bound     local-pv-c8e63d2b   100Gi      RWO            local-storage   8h
prometheus-k8s-db-prometheus-k8s-0         Pending                                                 local-storage   15m
prometheus-k8s-db-prometheus-k8s-1         Pending                                                 local-storage   15m

See Upgrade stuck section for this case.

  • When having local-storage-operator configured and upgrading from 4.4.x to 4.4.y (<= 4.4.8), the PVCs get duplicated and previous data is no longer available, for example:
$ oc get pvc -n openshift-monitoring
NAME                                         STATUS   VOLUME             CAPACITY   ACCESS MODES   STORAGECLASS    AGE
alertmanager-main-db-alertmanager-main-0     Bound    pvc-1cb2c2a5-xxx   20Gi       RWO            ocs-ceph-test   14d
alertmanager-main-db-alertmanager-main-1     Bound    pvc-4c8429fe-xxx   20Gi       RWO            ocs-ceph-test   14d
alertmanager-main-db-alertmanager-main-2     Bound    pvc-5c0f0c9d-xxx   20Gi       RWO            ocs-ceph-test   14d
ocs-alertmanager-claim-alertmanager-main-0   Bound    pvc-0ea759d4-xxx   20Gi       RWO            ocs-ceph-test   5d6h
ocs-alertmanager-claim-alertmanager-main-1   Bound    pvc-29a18e3f-xxx   20Gi       RWO            ocs-ceph-test   5d6h
ocs-alertmanager-claim-alertmanager-main-2   Bound    pvc-d8c70ddc-xxx   20Gi       RWO            ocs-ceph-test   5d6h
ocs-prometheus-claim-prometheus-k8s-0        Bound    pvc-a91ae6f1-xxx   100Gi      RWO            ocs-ceph-test   5d6h
ocs-prometheus-claim-prometheus-k8s-1        Bound    pvc-31bf1991-xxx   100Gi      RWO            ocs-ceph-test   5d6h
prometheus-k8s-db-prometheus-k8s-0           Bound    pvc-04f59982-xxx   100Gi      RWO            ocs-ceph-test   14d
prometheus-k8s-db-prometheus-k8s-1           Bound    pvc-a20075bb-xxx   100Gi      RWO            ocs-ceph-test   14d

See Duplicated PVCs section for this case.


NOTE: See Root Cause section for more details if needed.

Environment

  • OpenShift Container Platform
    • 4.3.x -> 4.4.x (<= 4.4.8)
    • 4.4.x -> 4.4.y (<= 4.4.8)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In