How to reboot a single worker node in Azure Red Hat Openshift
Environment
- Red Hat OpenShift on Azure (ARO)
- 4
Issue
- Steps required to reboot a single worker node.
- Node power off or scale down to zero are not supported.
|
WARNING Do not reboot master nodes. Master nodes are managed by Red Hat. Raise a support case if maintenance is required on the master nodes |
Resolution
- Identify the required worker node
$ oc get nodes
NAME STATUS ROLES AGE VERSION
20230809-xxxx-master-0 Ready master 5d17h v1.24.15+990d55b
20230809-xxxx-master-1 Ready master 5d17h v1.24.15+990d55b
20230809-xxxx-master-2 Ready master 5d17h v1.24.15+990d55b
20230809-xxxx-worker-xxx-cbm52 Ready worker 5d16h v1.24.15+990d55b
20230809-xxxx-worker-xxx-lmtl7 Ready worker 5d16h v1.24.15+990d55b
20230809-xxxx-worker-xxx-wft2m Ready worker 5d16h v1.24.15+990d55b
Cordona worker node
$ oc adm cordon xxxx-l52dl-worker-xxx-cbm52
node/20230809-xxxx-worker-xxx-cbm52 cordoned
- It is recommended to review the list of pods before draining
$ oc get pods -A -o wide --field-selector spec.nodeName=<worker_node_name>
- Drain node in preparation for maintenance. If the command fails then use additional options:
--forceto delete pods not managed byReplicationController,ReplicaSet,Job,DaemonSet, orStatefulSetresources.
--delete-emptydir-dataoption deletes the pods with the local storage
--ignore-daemonsetsoption ignores the daemon sets, and pod eviction can resume successfully.
--disable-evictionoption to bypass PDB and drain the node.
$ oc adm drain 20230809-l52dl-worker-eastasia1-cbm52
node/20230809-l52dl-worker-eastasia1-cbm52 already cordoned
- Ensure the drain has scheduled all pods onto other nodes and there is adequate resources for all nodes. Confirm that critical applications are still available. Note some pods like
daemonsetscan not be rescheduled and will need to be killed as part of the reboot.
- Check for
undrainedpods on the node
$ oc get pod -o wide -A | grep "node_name"
- Check for recent
FailedSchedulingevents. This may indicate the cluster is under resourced and require additional nodes.
$ oc get events -A | grep "FailedScheduling"
- Check the status of the worker nodes, expected
Ready,SchedulingDisabled
$ oc get nodes
NAME STATUS ROLES AGE VERSION
20230809-xxxx-master-0 Ready master 5d17h v1.24.15+990d55b
20230809-xxxx-master-1 Ready master 5d17h v1.24.15+990d55b
20230809-xxxx-master-2 Ready master 5d17h v1.24.15+990d55b
20230809-xxxx-worker-xxx-cbm52 Ready,SchedulingDisabled worker 5d17h v1.24.15+990d55b <--- Disabled
20230809-xxxx-worker-xxx-lmtl7 Ready worker 5d17h v1.24.15+990d55b
20230809-xxxx-worker-xxx-wft2m Ready worker 5d17h v1.24.15+990d55b
- The
oc debug node/<node_name>command provides a way to open a shell prompt into the worker node. This crates a separate container and mounts the noderootfile system at the/hostfolder, and allows you to inspect any files from the node
$ oc debug node/20230809-xxxx-worker-xxx-cbm52
Temporary namespace openshift-debug-kck98 is created for debugging node...
Starting pod/20230809-xxxx-worker-xxx-cbm52-debug ...
To use host binaries, run `chroot /host`
Pod IP: x.x.x.x
If you don't see a command prompt, try pressing enter.
sh-4.4#
- Start a
chrootshell in the/hostfolder
$ chroot /host
- Reboot the worker node
$ reboot
Removing debug pod ...
Temporary namespace openshift-debug-xx was removed.
- Watch the progress and confirm the worker node is rebooted
$ oc describe node 20230809-xxxx-worker-xxx-cbm52 | grep LastTransitionTime -A2
Type Status LastHeartbeatTime LastTransitionTime Reason
Message
---- ------ ----------------- ------------------ ------
MemoryPressure False Tue, 15 Aug 2023 12:40:54 +0800 Tue, 15 Aug 2023 12:18:38 +0800 KubeletHasSufficientMemory
- Confirm the worker nodes is ready after the reboot
$ oc wait --for=condition=Ready node/20230809-xxxx-worker-xxx-cbm52
node/20230809-xxxx-worker-xxx-cbm52 condition met
- Restore (
Uncordon) the worker node from the maintenance mode
$ oc adm uncordon 20230809-xxxx-worker-xxx-cbm52
node/20230809-xxxx-worker-xxx-cbm52 uncordoned
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments