Chapter 5. Replacing storage nodes

You can choose one of the following procedures to replace storage nodes:

5.1. Replacing operational nodes on IBM Z

Use this procedure to replace an operational node on IBM Z.

Procedure

Log in to OpenShift Web Console.
Click Compute → Nodes.
Identify the node that needs to be replaced. Take a note of its Machine Name.
Mark the node as unschedulable using the following command:
```
$ oc adm cordon <node_name>
```
Drain the node using the following command:
```
$ oc adm drain <node_name> --force --delete-local-data --ignore-daemonsets
```
Important
This activity may take at least 5-10 minutes. Ceph errors generated during this period are temporary and are automatically resolved when the new node is labeled and functional.
Click Compute → Machines. Search for the required machine.
Besides the required machine, click the Action menu (⋮) → Delete Machine.
Click Delete to confirm the machine deletion. A new machine is automatically created.
Wait for the new machine to start and transition into Running state.
Important
This activity may take at least 5-10 minutes.
Click Compute → Nodes, confirm if the new node is in Ready state.
Apply the OpenShift Container Storage label to the new node using any one of the following:
From User interface
For the new node, click Action Menu (⋮) → Edit Labels
Add cluster.ocs.openshift.io/openshift-storage and click Save.
From command line interface
Execute the following command to apply the OpenShift Container Storage label to the new node:
$ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""

Verification steps

Execute the following command and verify that the new node is present in the output:

$ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1

Click Workloads → Pods, confirm that at least the following pods on the new node are in Running state:
- csi-cephfsplugin-*
- csi-rbdplugin-*
Verify that all other required OpenShift Container Storage pods are in Running state.

Verify that new OSD pods are running on the replacement node.

$ oc get pods -o wide -n openshift-storage| egrep -i new-node-name | egrep osd

(Optional) If data encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

Create a debug pod and open a chroot environment for the host.
```
$ oc debug node new-node-name
$ chroot /host
```

Verify the devices are encrypted.

$ dmsetup ls | grep ocs-deviceset
ocs-deviceset-0-data-0-57snx-block-dmcrypt (253:1)

$ lsblk | grep ocs-deviceset
`-ocs-deviceset-0-data-0-57snx-block-dmcrypt 253:1    0   512G  0 crypt

Perform this procedure to replace a failed node which is not operational on IBM Z for OpenShift Container Storage.

Procedure

Log in to OpenShift Web Console and click Compute → Nodes.
Identify the faulty node and click on its Machine Name.
Click Actions → Edit Annotations, and click Add More.
Add machine.openshift.io/exclude-node-draining and click Save.
Click Actions → Delete Machine, and click Delete.
A new machine is automatically created, wait for new machine to start.
Important
This activity may take at least 5-10 minutes. Ceph errors generated during this period are temporary and are automatically resolved when the new node is labeled and functional.
Click Compute → Nodes, confirm if the new node is in Ready state.
Apply the OpenShift Container Storage label to the new node using any one of the following:
From the web user interface
For the new node, click Action Menu (⋮) → Edit Labels
Add cluster.ocs.openshift.io/openshift-storage and click Save.
From the command line interface
Execute the following command to apply the OpenShift Container Storage label to the new node:
$ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""

Execute the following command and verify that the new node is present in the output:

$ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= | cut -d' ' -f1

Click Workloads → Pods, confirm that at least the following pods on the new node are in Running state:
- csi-cephfsplugin-*
- csi-rbdplugin-*
Verify that all other required OpenShift Container Storage pods are in Running state.

Verify that new OSD pods are running on the replacement node.

$ oc get pods -o wide -n openshift-storage| egrep -i new-node-name | egrep osd

(Optional) If data encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

Create a debug pod and open a chroot environment for the host.
```
$ oc debug node new-node-name
$ chroot /host
```

Verify the devices are encrypted.

$ dmsetup ls | grep ocs-deviceset
ocs-deviceset-0-data-0-57snx-block-dmcrypt (253:1)

$ lsblk | grep ocs-deviceset
`-ocs-deviceset-0-data-0-57snx-block-dmcrypt 253:1    0   512G  0 crypt