Chapter 10. Node maintenance

10.1. Automatic renewal of TLS certificates

All TLS certificates for OpenShift Virtualization components are renewed and rotated automatically. You are not required to refresh them manually.

10.1.1. Automatic renewal of TLS certificates

TLS certificates are automatically deleted and replaced according to the following schedule:

  • KubeVirt certificates are renewed daily.
  • Containerized Data Importer controller (CDI) certificates are renewed every 15 days.
  • MAC pool certificates are renewed every year.

Automatic TLS certificate rotation does not disrupt any operations. For example, the following operations continue to function without any disruption:

  • Migrations
  • Image uploads
  • VNC and console connections

10.2. Managing node labeling for obsolete CPU models

You can schedule a virtual machine (VM) on a node where the CPU model and policy attribute of the VM are compatible with the CPU models and policy attributes that the node supports. By specifying a list of obsolete CPU models in a config map, you can exclude them from the list of labels created for CPU models.

10.2.1. Understanding node labeling for obsolete CPU models

To ensure that a node supports only valid CPU models for scheduled VMs, create a config map with a list of obsolete CPU models. When the node-labeller obtains the list of obsolete CPU models, it eliminates those CPU models and creates labels for valid CPU models.

Note

If you do not configure a config map with a list of obsolete CPU models, all CPU models are evaluated for labels, including obsolete CPU models that are not present in your environment.

Through the process of iteration, the list of base CPU features in the minimum CPU model are eliminated from the list of labels generated for the node. For example, an environment might have two supported CPU models: Penryn and Haswell.

If Penryn is specified as the CPU model for minCPU, the node-labeller evaluates each base CPU feature for Penryn and compares it with each CPU feature supported by Haswell. If the CPU feature is supported by both Penryn and Haswell, the node-labeller eliminates that feature from the list of CPU features for creating labels. If a CPU feature is supported only by Haswell and not by Penryn, that CPU feature is included in the list of generated labels. The node-labeller follows this iterative process to eliminate base CPU features that are present in the minimum CPU model and create labels.

The following example shows the complete list of CPU features for Penryn which is specified as the CPU model for minCPU:

Example of CPU features for Penryn

apic
clflush
cmov
cx16
cx8
de
fpu
fxsr
lahf_lm
lm
mca
mce
mmx
msr
mtrr
nx
pae
pat
pge
pni
pse
pse36
sep
sse
sse2
sse4.1
ssse3
syscall
tsc

The following example shows the complete list of CPU features for Haswell:

Example of CPU features for Haswell

aes
apic
avx
avx2
bmi1
bmi2
clflush
cmov
cx16
cx8
de
erms
fma
fpu
fsgsbase
fxsr
hle
invpcid
lahf_lm
lm
mca
mce
mmx
movbe
msr
mtrr
nx
pae
pat
pcid
pclmuldq
pge
pni
popcnt
pse
pse36
rdtscp
rtm
sep
smep
sse
sse2
sse4.1
sse4.2
ssse3
syscall
tsc
tsc-deadline
x2apic
xsave

The following example shows the list of node labels generated by the node-labeller after iterating and comparing the CPU features for Penryn with the CPU features for Haswell:

Example of node labels after iteration

aes
avx
avx2
bmi1
bmi2
erms
fma
fsgsbase
hle
invpcid
movbe
pcid
pclmuldq
popcnt
rdtscp
rtm
sse4.2
tsc-deadline
x2apic
xsave

10.2.2. Configuring a config map for obsolete CPU models

Use this procedure to configure a config map for obsolete CPU models.

Procedure

  • Create a ConfigMap object, specifying the obsolete CPU models in the obsoleteCPUs array. For example:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cpu-plugin-configmap 1
    data: 2
      cpu-plugin-configmap:
        obsoleteCPUs: 3
          - "486"
          - "pentium"
          - "pentium2"
          - "pentium3"
          - "pentiumpro"
        minCPU: "Penryn" 4
    1
    Name of the config map.
    2
    Configuration data.
    3
    List of obsolete CPU models.
    4
    Minimum CPU model that is used for basic CPU features.

10.3. Node maintenance mode

10.3.1. Understanding node maintenance mode

Placing a node into maintenance marks the node as unschedulable and drains all the virtual machines and pods from it. Virtual machine instances that have a LiveMigrate eviction strategy are live migrated to another node without loss of service. This eviction strategy is configured by default in virtual machine created from common templates but must be configured manually for custom virtual machines.

Virtual machine instances without an eviction strategy will be deleted on the node and recreated on another node.

Important

Virtual machines must have a persistent volume claim (PVC) with a shared ReadWriteMany (RWX) access mode to be live migrated.

10.4. Setting a node to maintenance mode

10.4.1. Understanding node maintenance mode

Placing a node into maintenance marks the node as unschedulable and drains all the virtual machines and pods from it. Virtual machine instances that have a LiveMigrate eviction strategy are live migrated to another node without loss of service. This eviction strategy is configured by default in virtual machine created from common templates but must be configured manually for custom virtual machines.

Virtual machine instances without an eviction strategy will be deleted on the node and recreated on another node.

Important

Virtual machines must have a persistent volume claim (PVC) with a shared ReadWriteMany (RWX) access mode to be live migrated.

Place a node into maintenance from either the web console or the CLI.

10.4.2. Setting a node to maintenance mode in the web console

Set a node to maintenance mode using the Options menu kebab found on each node in the ComputeNodes list, or using the Actions control of the Node Details screen.

Procedure

  1. In the OpenShift Virtualization console, click ComputeNodes.
  2. You can set the node to maintenance from this screen, which makes it easier to perform actions on multiple nodes in the one screen or from the Node Details screen where you can view comprehensive details of the selected node:

    • Click the Options menu kebab at the end of the node and select Start Maintenance.
    • Click the node name to open the Node Details screen and click ActionsStart Maintenance.
  3. Click Start Maintenance in the confirmation window.

The node will live migrate virtual machine instances that have the LiveMigration eviction strategy, and the node is no longer schedulable. All other pods and virtual machines on the node are deleted and recreated on another node.

10.4.3. Setting a node to maintenance mode in the CLI

Set a node to maintenance mode by creating a NodeMaintenance custom resource (CR) object that references the node name and the reason for setting it to maintenance mode.

Procedure

  1. Create the node maintenance CR configuration. This example uses a CR that is called node02-maintenance.yaml:

    apiVersion: nodemaintenance.kubevirt.io/v1beta1
    kind: NodeMaintenance
    metadata:
      name: node02-maintenance
    spec:
      nodeName: node02
      reason: "Replacing node02"
  2. Create the NodeMaintenance object in the cluster:

    $ oc apply -f <node02-maintenance.yaml>

The node live migrates virtual machine instances that have the LiveMigration eviction strategy, and taint the node so that it is no longer schedulable. All other pods and virtual machines on the node are deleted and recreated on another node.

10.5. Resuming a node from maintenance mode

Resuming a node brings it out of maintenance mode and schedulable again.

Resume a node from maintenance from either the web console or the CLI.

10.5.1. Resuming a node from maintenance mode in the web console

Resume a node from maintenance mode using the Options menu kebab found on each node in the ComputeNodes list, or using the Actions control of the Node Details screen.

Procedure

  1. In the OpenShift Virtualization console, click ComputeNodes.
  2. You can resume the node from this screen, which makes it easier to perform actions on multiple nodes in the one screen, or from the Node Details screen where you can view comprehensive details of the selected node:

    • Click the Options menu kebab at the end of the node and select Stop Maintenance.
    • Click the node name to open the Node Details screen and click ActionsStop Maintenance.
  3. Click Stop Maintenance in the confirmation window.

The node becomes schedulable, but virtual machine instances that were running on the node prior to maintenance will not automatically migrate back to this node.

10.5.2. Resuming a node from maintenance mode in the CLI

Resume a node from maintenance mode and make it schedulable again by deleting the NodeMaintenance object for the node.

Procedure

  1. Find the NodeMaintenance object:

    $ oc get nodemaintenance
  2. Optional: Insepct the NodeMaintenance object to ensure it is associated with the correct node:

    $ oc describe nodemaintenance <node02-maintenance>

    Example output

    Name:         node02-maintenance
    Namespace:
    Labels:
    Annotations:
    API Version:  nodemaintenance.kubevirt.io/v1beta1
    Kind:         NodeMaintenance
    ...
    Spec:
      Node Name:  node02
      Reason:     Replacing node02

  3. Delete the NodeMaintenance object:

    $ oc delete nodemaintenance <node02-maintenance>