在 OCP 4.6 及更新的版本中,集群更新没有错误,但集群配置池(Machine Config Pools)处于降级状态,带有 `Marking Degraded due to: unexpected on-disk state` 信息
Issue
-
在更新到一个较新版本的 OpenShift Container Platform 后,不是所有节点都被升级。例如:
$ oc get node NAME STATUS ROLES AGE VERSION master-0.ocp.example.net Ready master 34d v1.17.1+9d33dd3 master-1.ocp.example.net Ready master 34d v1.17.1+9d33dd3 master-2.ocp.example.net Ready master 34d v1.17.1+9d33dd3 worker-0.ocp.example.net Ready worker 34d v1.17.1+9d33dd3 worker-1.ocp.example.net Ready worker 34d v1.17.1+9d33dd3 worker-2.ocp.example.net Ready, SchedulingDisabled worker 34d v1.17.1+912792b <----------
-
在更新到较新版本的 OpenShift Container Platform 后,MachineConfigOperator 将报告存在降级的池:
$ oc describe co/machine-config ... 'Failed to resync $VERSION because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error pool $POOL is not ready, retrying. Status: (pool degraded: true total: x, ready y, updated: y, unavailable: 1)]'
-
一个机器配置池处于降级状态,在 MachineConfigOperator clusteroperator 扩展中看到类似如下的错误:
worker: 'pool is degraded because nodes fail with "1 nodes are reporting degraded status on sync": "Node worker0 is reporting: \"unexpected on-disk state validating against rendered-worker-abc: expected target osImageURL \\\"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:xxx\\\", have \\\"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:yyy\\\" (\\\"zzz\\\")\""'
Environment
- Red Hat OpenShift Container Platform (RHOCP) 4.6 及更高版本
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.