Pod startup failure on alternate nodes during node shutdown in vSphere environments due to VM and disk snapshots
Issue
- In an OpenShift cluster deployed on vSphere, when a node fails, Pods need to be moved to a different node. However, the Pods fail to start on the target node. The event logs indicate that the volume is still attached to the original node, causing the pod startup failure on the new node.
$ oc get pod -n openshift-monitoring -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
prometheus-k8s-0 0/6 Init:0/1 0 1h <none> node03 <none> <none>
$ oc get events -n openshift-monitoring
NAMESPACE LAST SEEN TYPE REASON OBJECT MESSAGE
1h34m Warning FailedAttachVolume pod/prometheus-k8s-0 Multi-Attach error for volume "prometheus-0-pv" Volume is already exclusively attached to one node and can't be attached to another
1h32m Warning FailedAttachVolume pod/prometheus-k8s-0 AttachVolume.Attach failed for volume "prometheus-0-pv" : rpc error: code = Internal desc = failed to attach disk: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" with node: "yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy" err failed to attach cns volume: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" to node vm: "VirtualMachine:vm-01 [VirtualCenterHost: node01.example.com, UUID: yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-01, VirtualCenterHost: node01.example.com]]". fault: "(*types.LocalizedMethodFault)(0xc000bbf900)({\n DynamicData: (types.DynamicData) {\n },\n Fault: (*types.ResourceInUse)(0xc001007b40)({\n VimFault: (types.VimFault) {\n MethodFault: (types.MethodFault) {\n FaultCause: (*types.LocalizedMethodFault)(<nil>),\n FaultMessage: ([]types.LocalizableMessage) <nil>\n }\n },\n Type: (string) \"\",\n Name: (string) (len=6) \"volume\"\n }),\n LocalizedMessage: (string) (len=32) \"The resource 'volume' is in use.\"\n})\n". opId: "49301d98"
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4
- vSphere
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.