Chapter 2. OpenShift Container Storage deployed on VMware

2.1. Replacing an operational VMware node on user-provisioned infrastructure

Perform this procedure to replace an operational node on VMware user-provisioned infrastructure (UPI).

Prerequisites

  • Red Hat recommends that replacement nodes are configured with similar infrastructure, resources, and disks to the node being replaced.
  • You must be logged into the OpenShift Container Platform (RHOCP) cluster.

Procedure

  1. Identify the node and its VM that needs to be replaced.
  2. Mark the node as unschedulable using the following command:

    $ oc adm cordon <node_name>
  3. Drain the node using the following command:

    $ oc adm drain <node_name> --force --delete-local-data --ignore-daemonsets
    Important

    This activity may take at least 5-10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when the new node is labeled and functional.

  4. Delete the node using the following command:

    $ oc delete nodes <node_name>
  5. Log in to vSphere and terminate the identified VM.

    Important

    VM should be deleted only from the inventory and not from the disk.

  6. Create a new VM on vSphere with the required infrastructure. See Platform requirements.
  7. Create a new OpenShift Container Platform worker node using the new VM.
  8. Check for certificate signing requests (CSRs) related to OpenShift Container Platform that are in Pending state:

    $ oc get csr
  9. Approve all required OpenShift Container Platform CSRs for the new node:

    $ oc adm certificate approve <Certificate_Name>
  10. Click ComputeNodes, confirm if the new node is in Ready state.
  11. Apply the OpenShift Container Storage label to the new node using any one of the following:

    From User interface
    1. For the new node, click Action Menu (⋮)Edit Labels
    2. Add cluster.ocs.openshift.io/openshift-storage and click Save.
    From Command line interface
    • Execute the following command to apply the OpenShift Container Storage label to the new node:

      $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""

Verification steps

  1. Execute the following command and verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click WorkloadsPods, confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all other required OpenShift Container Storage pods are in Running state.
  4. Verify that new OSD pods are running on the replacement node.

    $ oc get pods -o wide -n openshift-storage| egrep -i new-node-name | egrep osd
  5. (Optional) If data encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in previous step, do the following:

    1. Create a debug pod and open a chroot environment for the selected host(s).

      $ oc debug node/<node name>
      $ chroot /host
    2. Run “lsblk” and check for the “crypt” keyword beside the ocs-deviceset name(s)

      $ lsblk
  6. If verification steps fail, contact Red Hat Support.

2.2. Replacing a failed VMware node on user-provisioned infrastructure

Perform this procedure to replace a failed node on VMware user-provisioned infrastructure (UPI).

Prerequisites

  • Red Hat recommends that replacement nodes are configured with similar infrastructure, resources, and disks to the node being replaced.
  • You must be logged into the OpenShift Container Platform (RHOCP) cluster.

Procedure

  1. Identify the node and its VM that needs to be replaced.
  2. Delete the node using the following command:

    $ oc delete nodes <node_name>
  3. Log in to vSphere and terminate the identified VM.

    Important

    VM should be deleted only from the inventory and not from the disk.

  4. Create a new VM on vSphere with the required infrastructure. See Platform requirements.
  5. Create a new OpenShift Container Platform worker node using the new VM.
  6. Check for certificate signing requests (CSRs) related to OpenShift Container Platform that are in Pending state:

    $ oc get csr
  7. Approve all required OpenShift Container Platform CSRs for the new node:

    $ oc adm certificate approve <Certificate_Name>
  8. Click ComputeNodes, confirm if the new node is in Ready state.
  9. Apply the OpenShift Container Storage label to the new node using any one of the following:

    From User interface
    1. For the new node, click Action Menu (⋮)Edit Labels
    2. Add cluster.ocs.openshift.io/openshift-storage and click Save.
    From Command line interface
    • Execute the following command to apply the OpenShift Container Storage label to the new node:

      $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""

Verification steps

  1. Execute the following command and verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click WorkloadsPods, confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all other required OpenShift Container Storage pods are in Running state.
  4. Verify that new OSD pods are running on the replacement node.

    $ oc get pods -o wide -n openshift-storage| egrep -i new-node-name | egrep osd
  5. (Optional) If data encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in previous step, do the following:

    1. Create a debug pod and open a chroot environment for the selected host(s).

      $ oc debug node/<node name>
      $ chroot /host
    2. Run “lsblk” and check for the “crypt” keyword beside the ocs-deviceset name(s)

      $ lsblk
  6. If verification steps fail, contact Red Hat Support.