Chapter 9. Node maintenance

9.1. Manually refreshing TLS certificates

The TLS certificates for container-native virtualization components are created at the time of installation and are valid for one year. You must manually refresh these certificates before they expire.

9.1.1. Refreshing TLS certificates

To refresh the TLS certificates for container-native virtualization, download and run the rotate-certs script. This script is available from the kubevirt/hyperconverged-cluster-operator repository on GitHub.

Important

When refreshing the certificates, the following operations are impacted:

  • Migrations are canceled
  • Image uploads are canceled
  • VNC and console connections are closed

Prerequisites

  • Ensure that you are logged in to the cluster as a user with cluster-admin privileges. The script uses your active session to the cluster to refresh certificates in the openshift-cnv namespace.

Procedure

  1. Download the rotate-certs.sh script from GitHub:

    $ curl -O https://raw.githubusercontent.com/kubevirt/hyperconverged-cluster-operator/master/tools/rotate-certs.sh
  2. Ensure the script is executable:

    $ chmod +x rotate-certs.sh
  3. Run the script:

    $ ./rotate-certs.sh -n openshift-cnv

The TLS certificates are refreshed and valid for one year.

9.2. Node maintenance mode

9.2.1. Understanding node maintenance mode

Placing a node into maintenance marks the node as unschedulable and drains all the virtual machines and pods from it. Virtual machine instances that have a LiveMigrate eviction strategy are live migrated to another node without loss of service. This eviction strategy is configured by default in virtual machine created from common templates but must be configured manually for custom virtual machines.

Virtual machine instances without an eviction strategy will be deleted on the node and recreated on another node.

Important

Virtual machines must have a PersistentVolumeClaim (PVC) with a shared ReadWriteMany (RWX) access mode to be live migrated.

9.3. Setting a node to maintenance mode

9.3.1. Understanding node maintenance mode

Placing a node into maintenance marks the node as unschedulable and drains all the virtual machines and pods from it. Virtual machine instances that have a LiveMigrate eviction strategy are live migrated to another node without loss of service. This eviction strategy is configured by default in virtual machine created from common templates but must be configured manually for custom virtual machines.

Virtual machine instances without an eviction strategy will be deleted on the node and recreated on another node.

Important

Virtual machines must have a PersistentVolumeClaim (PVC) with a shared ReadWriteMany (RWX) access mode to be live migrated.

Place a node into maintenance from either the web console or the CLI.

9.3.2. Setting a node to maintenance mode in the web console

Set a node to maintenance mode using the Options menu kebab found on each node in the ComputeNodes list, or using the Actions control of the Node Details screen.

Procedure

  1. In the container-native virtualization console, click ComputeNodes.
  2. You can set the node to maintenance from this screen, which makes it easier to perform actions on multiple nodes in the one screen or from the Node Details screen where you can view comprehensive details of the selected node:

    • Click the Options menu kebab at the end of the node and select Start Maintenance.
    • Click the node name to open the Node Details screen and click ActionsStart Maintenance.
  3. Click Start Maintenance in the confirmation window.

The node will live migrate virtual machine instances that have the liveMigration eviction strategy, and the node is no longer schedulable. All other pods and virtual machines on the node are deleted and recreated on another node.

9.3.3. Setting a node to maintenance mode in the CLI

Set a node to maintenance mode by creating a NodeMaintenance Custom Resource (CR) object that references the node name and the reason for setting it to maintenance mode.

Procedure

  1. Create the node maintenance CR configuration. This example uses a CR that is called node02-maintenance.yaml:

    apiVersion: kubevirt.io/v1alpha1
    kind: NodeMaintenance
    metadata:
      name: node02-maintenance
    spec:
      nodeName: node02
      reason: "Replacing node02"
  2. Create the NodeMaintenance object in the cluster:

    $ oc apply -f <node02-maintenance.yaml>

The node live migrates virtual machine instances that have the liveMigration eviction strategy, and taint the node so that it is no longer schedulable. All other pods and virtual machines on the node are deleted and recreated on another node.

9.4. Resuming a node from maintenance mode

Resuming a node brings it out of maintenance mode and schedulable again.

Resume a node from maintenance from either the web console or the CLI.

9.4.1. Resuming a node from maintenance mode in the web console

Resume a node from maintenance mode using the Options menu kebab found on each node in the ComputeNodes list, or using the Actions control of the Node Details screen.

Procedure

  1. In the container-native virtualization console, click ComputeNodes.
  2. You can resume the node from this screen, which makes it easier to perform actions on multiple nodes in the one screen, or from the Node Details screen where you can view comprehensive details of the selected node:

    • Click the Options menu kebab at the end of the node and select Stop Maintenance.
    • Click the node name to open the Node Details screen and click ActionsStop Maintenance.
  3. Click Stop Maintenance in the confirmation window.

The node becomes schedulable, but virtual machine instances that were running on the node prior to maintenance will not automatically migrate back to this node.

9.4.2. Resuming a node from maintenance mode in the CLI

Resume a node from maintenance mode and make it schedulable again by deleting the NodeMaintenance object for the node.

Procedure

  1. Find the NodeMaintenance object:

    $ oc get nodemaintenance
  2. Optional: Insepct the NodeMaintenance object to ensure it is associated with the correct node:

    $ oc describe nodemaintenance <node02-maintenance>
    Name:         node02-maintenance
    Namespace:
    Labels:
    Annotations:
    API Version:  kubevirt.io/v1alpha1
    Kind:         NodeMaintenance
    ...
    Spec:
      Node Name:  node02
      Reason:     Replacing node02
  3. Delete the NodeMaintenance object:

    $ oc delete nodemaintenance <node02-maintenance>