Chapter 5. Replacing storage nodes
You can choose one of the following procedures to replace storage nodes:
5.1. Replacing operational nodes on IBM Z
Use this procedure to replace an operational node on IBM Z.
Procedure
- Log in to OpenShift Web Console.
- Click Compute → Nodes.
- Identify the node that needs to be replaced. Take a note of its Machine Name.
Mark the node as unschedulable using the following command:
$ oc adm cordon <node_name>
Drain the node using the following command:
$ oc adm drain <node_name> --force --delete-local-data --ignore-daemonsets
ImportantThis activity may take at least 5-10 minutes. Ceph errors generated during this period are temporary and are automatically resolved when the new node is labeled and functional.
- Click Compute → Machines. Search for the required machine.
- Besides the required machine, click the Action menu (⋮) → Delete Machine.
- Click Delete to confirm the machine deletion. A new machine is automatically created.
Wait for the new machine to start and transition into Running state.
ImportantThis activity may take at least 5-10 minutes.
- Click Compute → Nodes, confirm if the new node is in Ready state.
Apply the OpenShift Container Storage label to the new node using any one of the following:
- From User interface
- For the new node, click Action Menu (⋮) → Edit Labels
-
Add
cluster.ocs.openshift.io/openshift-storage
and click Save.
- From command line interface
Execute the following command to apply the OpenShift Container Storage label to the new node:
$ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
Verification steps
Execute the following command and verify that the new node is present in the output:
$ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
Click Workloads → Pods, confirm that at least the following pods on the new node are in Running state:
-
csi-cephfsplugin-*
-
csi-rbdplugin-*
-
- Verify that all other required OpenShift Container Storage pods are in Running state.
Verify that new OSD pods are running on the replacement node.
$ oc get pods -o wide -n openshift-storage| egrep -i new-node-name | egrep osd
(Optional) If data encryption is enabled on the cluster, verify that the new OSD devices are encrypted.
Create a debug pod and open a
chroot
environment for the host.$ oc debug node new-node-name $ chroot /host
Verify the devices are encrypted.
$ dmsetup ls | grep ocs-deviceset ocs-deviceset-0-data-0-57snx-block-dmcrypt (253:1)
$ lsblk | grep ocs-deviceset `-ocs-deviceset-0-data-0-57snx-block-dmcrypt 253:1 0 512G 0 crypt
- If verification steps fail, contact Red Hat Support.
5.2. Replacing failed nodes on IBM Z
Perform this procedure to replace a failed node which is not operational on IBM Z for OpenShift Container Storage.
Procedure
- Log in to OpenShift Web Console and click Compute → Nodes.
- Identify the faulty node and click on its Machine Name.
- Click Actions → Edit Annotations, and click Add More.
-
Add
machine.openshift.io/exclude-node-draining
and click Save. - Click Actions → Delete Machine, and click Delete.
A new machine is automatically created, wait for new machine to start.
ImportantThis activity may take at least 5-10 minutes. Ceph errors generated during this period are temporary and are automatically resolved when the new node is labeled and functional.
- Click Compute → Nodes, confirm if the new node is in Ready state.
Apply the OpenShift Container Storage label to the new node using any one of the following:
- From the web user interface
- For the new node, click Action Menu (⋮) → Edit Labels
-
Add
cluster.ocs.openshift.io/openshift-storage
and click Save.
- From the command line interface
Execute the following command to apply the OpenShift Container Storage label to the new node:
$ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
Execute the following command and verify that the new node is present in the output:
$ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= | cut -d' ' -f1
Click Workloads → Pods, confirm that at least the following pods on the new node are in Running state:
-
csi-cephfsplugin-*
-
csi-rbdplugin-*
-
- Verify that all other required OpenShift Container Storage pods are in Running state.
Verify that new OSD pods are running on the replacement node.
$ oc get pods -o wide -n openshift-storage| egrep -i new-node-name | egrep osd
(Optional) If data encryption is enabled on the cluster, verify that the new OSD devices are encrypted.
Create a debug pod and open a
chroot
environment for the host.$ oc debug node new-node-name $ chroot /host
Verify the devices are encrypted.
$ dmsetup ls | grep ocs-deviceset ocs-deviceset-0-data-0-57snx-block-dmcrypt (253:1)
$ lsblk | grep ocs-deviceset `-ocs-deviceset-0-data-0-57snx-block-dmcrypt 253:1 0 512G 0 crypt
- If verification steps fail, contact Red Hat Support.