13.2. Replacing failed nodes on Red Hat OpenStack Platform installer-provisioned infrastructure
Perform this procedure to replace a failed node which is not operational on Red Hat OpenStack Platform installer-provisioned infrastructure (IPI) for OpenShift Container Storage.
- Log in to OpenShift Web Console and click Compute → Nodes.
- Identify the faulty node and click on its Machine Name.
- Click Actions → Edit Annotations, and click Add More.
machine.openshift.io/exclude-node-drainingand click Save.
- Click Actions → Delete Machine, and click Delete.
A new machine is automatically created, wait for new machine to start.重要
This activity may take at least 5-10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when the new node is labeled and functional.
- Click Compute → Nodes, confirm if the new node is in Ready state.
Apply the OpenShift Container Storage label to the new node using any one of the following:
- From User interface
- For the new node, click Action Menu (⋮) → Edit Labels
cluster.ocs.openshift.io/openshift-storageand click Save.
- From Command line interface
Execute the following command to apply the OpenShift Container Storage label to the new node:
$ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
- [Optional]: If the failed Red Hat OpenStack Platform instance is not removed automatically, terminate the instance from Red Hat OpenStack Platform console.
Execute the following command and verify that the new node is present in the output:
$ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
Click Workloads → Pods, confirm that at least the following pods on the new node are in Running state:
- Verify that all other required OpenShift Container Storage pods are in Running state.
Verify that new OSD pods are running on the replacement node.
$ oc get pods -o wide -n openshift-storage| egrep -i new-node-name | egrep osd
(Optional) If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.
For each of the new nodes identified in previous step, do the following:
Create a debug pod and open a chroot environment for the selected host(s).
$ oc debug node/<node name> $ chroot /host
Run “lsblk” and check for the “crypt” keyword beside the
- If verification steps fail, contact Red Hat Support.