Monitoring gets stuck (and/or duplicates PVCs) upgrading to 4.4 when using local storage
Issue
- When having
local-storage-operator
configured and upgrading from 4.3.x to 4.4 (<= 4.4.8), the upgrade gets blocked at monitoring level with the following status, for example:
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.3.23 True True 31m Unable to apply 4.4.6: the cluster operator monitoring has not yet successfully rolled out
$ oc get pvc -n openshift-monitoring
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
alertmanager-main-db-alertmanager-main-0 Pending local-storage 15m
alertmanager-main-db-alertmanager-main-1 Pending local-storage 15m
alertmanager-main-db-alertmanager-main-2 Pending local-storage 15m
localpvc-alertmanager-main-0 Bound local-pv-68cdb92 100Gi RWO local-storage 8h
localpvc-alertmanager-main-1 Bound local-pv-efbad35c 100Gi RWO local-storage 8h
localpvc-alertmanager-main-2 Bound local-pv-98fe334e 100Gi RWO local-storage 8h
localpvc-prometheus-k8s-0 Bound local-pv-f06c680f 100Gi RWO local-storage 8h
localpvc-prometheus-k8s-1 Bound local-pv-c8e63d2b 100Gi RWO local-storage 8h
prometheus-k8s-db-prometheus-k8s-0 Pending local-storage 15m
prometheus-k8s-db-prometheus-k8s-1 Pending local-storage 15m
See Upgrade stuck section for this case.
- When having
local-storage-operator
configured and upgrading from 4.4.x to 4.4.y (<= 4.4.8), the PVCs get duplicated and previous data is no longer available, for example:
$ oc get pvc -n openshift-monitoring
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
alertmanager-main-db-alertmanager-main-0 Bound pvc-1cb2c2a5-xxx 20Gi RWO ocs-ceph-test 14d
alertmanager-main-db-alertmanager-main-1 Bound pvc-4c8429fe-xxx 20Gi RWO ocs-ceph-test 14d
alertmanager-main-db-alertmanager-main-2 Bound pvc-5c0f0c9d-xxx 20Gi RWO ocs-ceph-test 14d
ocs-alertmanager-claim-alertmanager-main-0 Bound pvc-0ea759d4-xxx 20Gi RWO ocs-ceph-test 5d6h
ocs-alertmanager-claim-alertmanager-main-1 Bound pvc-29a18e3f-xxx 20Gi RWO ocs-ceph-test 5d6h
ocs-alertmanager-claim-alertmanager-main-2 Bound pvc-d8c70ddc-xxx 20Gi RWO ocs-ceph-test 5d6h
ocs-prometheus-claim-prometheus-k8s-0 Bound pvc-a91ae6f1-xxx 100Gi RWO ocs-ceph-test 5d6h
ocs-prometheus-claim-prometheus-k8s-1 Bound pvc-31bf1991-xxx 100Gi RWO ocs-ceph-test 5d6h
prometheus-k8s-db-prometheus-k8s-0 Bound pvc-04f59982-xxx 100Gi RWO ocs-ceph-test 14d
prometheus-k8s-db-prometheus-k8s-1 Bound pvc-a20075bb-xxx 100Gi RWO ocs-ceph-test 14d
See Duplicated PVCs section for this case.
NOTE: See Root Cause section for more details if needed.
Environment
- OpenShift Container Platform
- 4.3.x -> 4.4.x (<= 4.4.8)
- 4.4.x -> 4.4.y (<= 4.4.8)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.