Chapter 2. Post-installation machine configuration tasks

There are times when you need to make changes to the operating systems running on OpenShift Container Platform nodes. This can include changing settings for network time service, adding kernel arguments, or configuring journaling in a specific way.

Aside from a few specialized features, most changes to operating systems on OpenShift Container Platform nodes can be done by creating what are referred to as MachineConfig objects that are managed by the Machine Config Operator.

Tasks in this section describe how to use features of the Machine Config Operator to configure operating system features on OpenShift Container Platform nodes.

2.1. Understanding the Machine Config Operator

2.1.1. Machine Config Operator

Purpose

The Machine Config Operator manages and applies configuration and updates of the base operating system and container runtime, including everything between the kernel and kubelet.

There are four components:

  • machine-config-server: Provides Ignition configuration to new machines joining the cluster.
  • machine-config-controller: Coordinates the upgrade of machines to the desired configurations defined by a MachineConfig object. Options are provided to control the upgrade for sets of machines individually.
  • machine-config-daemon: Applies new machine configuration during update. Validates and verifies the state of the machine to the requested machine configuration.
  • machine-config: Provides a complete source of machine configuration at installation, first start up, and updates for a machine.
Project

openshift-machine-config-operator

2.1.2. Machine config overview

The Machine Config Operator (MCO) manages updates to systemd, CRI-O and Kubelet, the kernel, Network Manager and other system features. It also offers a MachineConfig CRD that can write configuration files onto the host (see machine-config-operator). Understanding what MCO does and how it interacts with other components is critical to making advanced, system-level changes to an OpenShift Container Platform cluster. Here are some things you should know about MCO, machine configs, and how they are used:

  • A machine config can make a specific change to a file or service on the operating system of each system representing a pool of OpenShift Container Platform nodes.
  • MCO applies changes to operating systems in pools of machines. All OpenShift Container Platform clusters start with worker and control plane node (also known as the master node) pools. By adding more role labels, you can configure custom pools of nodes. For example, you can set up a custom pool of worker nodes that includes particular hardware features needed by an application. However, examples in this section focus on changes to the default pool types.

    Important

    A node can have multiple labels applied that indicate its type, such as master or worker, however it can be a member of only a single machine config pool.

  • Some machine configuration must be in place before OpenShift Container Platform is installed to disk. In most cases, this can be accomplished by creating a machine config that is injected directly into the OpenShift Container Platform installer process, instead of running as a post-installation machine config. In other cases, you might need to do bare metal installation where you pass kernel arguments at OpenShift Container Platform installer start-up, to do such things as setting per-node individual IP addresses or advanced disk partitioning.
  • MCO manages items that are set in machine configs. Manual changes you do to your systems will not be overwritten by MCO, unless MCO is explicitly told to manage a conflicting file. In other words, MCO only makes specific updates you request, it does not claim control over the whole node.
  • Manual changes to nodes are strongly discouraged. If you need to decommission a node and start a new one, those direct changes would be lost.
  • MCO is only supported for writing to files in /etc and /var directories, although there are symbolic links to some directories that can be writeable by being symbolically linked to one of those areas. The /opt and /usr/local directories are examples.
  • Ignition is the configuration format used in MachineConfigs. See the Ignition Configuration Specification v3.2.0 for details.
  • Although Ignition config settings can be delivered directly at OpenShift Container Platform installation time, and are formatted in the same way that MCO delivers Ignition configs, MCO has no way of seeing what those original Ignition configs are. Therefore, you should wrap Ignition config settings into a machine config before deploying them.
  • When a file managed by MCO changes outside of MCO, the Machine Config Daemon (MCD) sets the node as degraded. It will not overwrite the offending file, however, and should continue to operate in a degraded state.
  • A key reason for using a machine config is that it will be applied when you spin up new nodes for a pool in your OpenShift Container Platform cluster. The machine-api-operator provisions a new machine and MCO configures it.

MCO uses Ignition as the configuration format. OpenShift Container Platform 4.6 moved from Ignition config specification version 2 to version 3.

2.1.2.1. What can you change with machine configs?

The kinds of components that MCO can change include:

  • config: Create Ignition config objects (see the Ignition configuration specification) to do things like modify files, systemd services, and other features on OpenShift Container Platform machines, including:

    • Configuration files: Create or overwrite files in the /var or /etc directory.
    • systemd units: Create and set the status of a systemd service or add to an existing systemd service by dropping in additional settings.
    • users and groups: Change SSH keys in the passwd section post-installation.
Important

Changing SSH keys via machine configs is only supported for the core user.

  • kernelArguments: Add arguments to the kernel command line when OpenShift Container Platform nodes boot.
  • kernelType: Optionally identify a non-standard kernel to use instead of the standard kernel. Use realtime to use the RT kernel (for RAN). This is only supported on select platforms.
  • fips: Enable FIPS mode. FIPS should be set at installation-time setting and not a post-installation procedure.
Important

The use of FIPS Validated / Modules in Process cryptographic libraries is only supported on OpenShift Container Platform deployments on the x86_64 architecture.

  • extensions: Extend RHCOS features by adding selected pre-packaged software. For this feature, available extensions include usbguard and kernel modules.
  • Custom resources (for ContainerRuntime and Kubelet): Outside of machine configs, MCO manages two special custom resources for modifying CRI-O container runtime settings (ContainerRuntime CR) and the Kubelet service (Kubelet CR).

The MCO is not the only Operator that can change operating system components on OpenShift Container Platform nodes. Other Operators can modify operating system-level features as well. One example is the Node Tuning Operator, which allows you to do node-level tuning through Tuned daemon profiles.

Tasks for the MCO configuration that can be done post-installation are included in the following procedures. See descriptions of RHCOS bare metal installation for system configuration tasks that must be done during or before OpenShift Container Platform installation.

2.1.2.2. Project

See the openshift-machine-config-operator GitHub site for details.

2.1.3. Checking machine config pool status

To see the status of the Machine Config Operator, its sub-components, and the resources it manages, use the following oc commands:

Procedure

  1. To see the number of MCO-managed nodes available on your cluster for each pool, type:

    $ oc get machineconfigpool
    NAME      CONFIG                  UPDATED  UPDATING   DEGRADED  MACHINECOUNT  READYMACHINECOUNT  UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT  AGE
    master    rendered-master-dd…     True     False      False     3             3                  3                                0                     4h42m
    worker    rendered-worker-fde…    True     False      False     3             3                  3                                0                     4h42m

    In the previous output, there are three master and three worker nodes. All machines are updated and none are currently updating. Because all nodes are Updated and Ready and none are Degraded, you can tell that there are no issues.

  2. To see each existing machineconfig, type:

    $ oc get machineconfigs
    NAME                             GENERATEDBYCONTROLLER          IGNITIONVERSION  AGE
    00-master                        2c9371fbb673b97a6fe8b1c52...   3.2.0            5h18m
    00-worker                        2c9371fbb673b97a6fe8b1c52...   3.2.0            5h18m
    01-master-container-runtime      2c9371fbb673b97a6fe8b1c52...   3.2.0            5h18m
    01-master-kubelet                2c9371fbb673b97a6fe8b1c52…     3.2.0            5h18m
    ...
    rendered-master-dde...           2c9371fbb673b97a6fe8b1c52...   3.2.0            5h18m
    rendered-worker-fde...           2c9371fbb673b97a6fe8b1c52...   3.2.0            5h18m

    Note that the machineconfigs listed as rendered are not meant to be changed or deleted. Expect them to be hidden at some point in the future.

  3. Check the status of worker (or change to master) to see the status of that pool of nodes:

    $ oc describe mcp worker
    ...
      Degraded Machine Count:     0
      Machine Count:              3
      Observed Generation:        2
      Ready Machine Count:        3
      Unavailable Machine Count:  0
      Updated Machine Count:      3
    Events:                       <none>
  4. You can view the contents of a particular machine config (in this case, 01-master-kubelet). The trimmed output from the following oc describe command shows that this machineconfig contains both configuration files (cloud.conf and kubelet.conf) and a systemd service (Kubernetes Kubelet):

    $ oc describe machineconfigs 01-master-kubelet
    Name:         01-master-kubelet
    ...
    Spec:
      Config:
        Ignition:
          Version:  3.2.0
        Storage:
          Files:
            Contents:
              Source:   data:,
            Mode:       420
            Overwrite:  true
            Path:       /etc/kubernetes/cloud.conf
            Contents:
              Source:   data:,kind%3A%20KubeletConfiguration%0AapiVersion%3A%20kubelet.config.k8s.io%2Fv1beta1%0Aauthentication%3A%0A%20%20x509%3A%0A%20%20%20%20clientCAFile%3A%20%2Fetc%2Fkubernetes%2Fkubelet-ca.crt%0A%20%20anonymous...
            Mode:       420
            Overwrite:  true
            Path:       /etc/kubernetes/kubelet.conf
        Systemd:
          Units:
            Contents:  [Unit]
    Description=Kubernetes Kubelet
    Wants=rpc-statd.service network-online.target crio.service
    After=network-online.target crio.service
    
    ExecStart=/usr/bin/hyperkube \
        kubelet \
          --config=/etc/kubernetes/kubelet.conf \ ...

If something goes wrong with a machine config that you apply, you can always back out that change. For example, if you had run oc create -f ./myconfig.yaml to apply a machine config, you could remove that machine config by typing:

$ oc delete -f ./myconfig.yaml

If that was the only problem, the nodes in the affected pool should return to a non-degraded state. This actually causes the rendered configuration to roll back to its previously rendered state.

If you add your own machine configs to your cluster, you can use the commands shown in the previous example to check their status and the related status of the pool to which they are applied.

2.2. Using MachineConfig objects to configure nodes

You can use the tasks in this section to create MachineConfig objects that modify files, systemd unit files, and other operating system features running on OpenShift Container Platform nodes. For more ideas on working with machine configs, see content related to adding or updating SSH authorized keys, verifying image signatures, enabling SCTP, and configuring iSCSI initiatornames for OpenShift Container Platform.

OpenShift Container Platform supports Ignition specification version 3.2. All new machine configs you create going forward should be based on Ignition specification version 3.2. If you are upgrading your OpenShift Container Platform cluster, any existing Ignition specification version 2.x machine configs will be translated automatically to specification version 3.2.

2.2.1. Configuring chrony time service

You can set the time server and related settings used by the chrony time service (chronyd) by modifying the contents of the chrony.conf file and passing those contents to your nodes as a machine config.

Procedure

  1. Create a Butane config including the contents of the chrony.conf file. For example, to configure chrony on worker nodes, create a 99-worker-chrony.bu file.

    Note

    See "Creating machine configs with Butane" for information about Butane.

    variant: openshift
    version: 4.8.0
    metadata:
      name: 99-worker-chrony 1
      labels:
        machineconfiguration.openshift.io/role: worker 2
    storage:
      files:
      - path: /etc/chrony.conf
        mode: 0644
        overwrite: true
        contents:
          inline: |
            pool 0.rhel.pool.ntp.org iburst 3
            driftfile /var/lib/chrony/drift
            makestep 1.0 3
            rtcsync
            logdir /var/log/chrony
    1 2
    On control plane nodes, substitute master for worker in both of these locations.
    3
    Specify any valid, reachable time source, such as the one provided by your DHCP server. Alternately, you can specify any of the following NTP servers: 1.rhel.pool.ntp.org, 2.rhel.pool.ntp.org, or 3.rhel.pool.ntp.org.
  2. Use Butane to generate a MachineConfig object file, 99-worker-chrony.yaml, containing the configuration to be delivered to the nodes:

    $ butane 99-worker-chrony.bu -o 99-worker-chrony.yaml
  3. Apply the configurations in one of two ways:

    • If the cluster is not running yet, after you generate manifest files, add the MachineConfig object file to the <installation_directory>/openshift directory, and then continue to create the cluster.
    • If the cluster is already running, apply the file:

      $ oc apply -f ./99-worker-chrony.yaml

2.2.2. Adding kernel arguments to nodes

In some special cases, you might want to add kernel arguments to a set of nodes in your cluster. This should only be done with caution and clear understanding of the implications of the arguments you set.

Warning

Improper use of kernel arguments can result in your systems becoming unbootable.

Examples of kernel arguments you could set include:

  • enforcing=0: Configures Security Enhanced Linux (SELinux) to run in permissive mode. In permissive mode, the system acts as if SELinux is enforcing the loaded security policy, including labeling objects and emitting access denial entries in the logs, but it does not actually deny any operations. While not recommended for production systems, permissive mode can be helpful for debugging.
  • nosmt: Disables symmetric multithreading (SMT) in the kernel. Multithreading allows multiple logical threads for each CPU. You could consider nosmt in multi-tenant environments to reduce risks from potential cross-thread attacks. By disabling SMT, you essentially choose security over performance.

See Kernel.org kernel parameters for a list and descriptions of kernel arguments.

In the following procedure, you create a MachineConfig object that identifies:

  • A set of machines to which you want to add the kernel argument. In this case, machines with a worker role.
  • Kernel arguments that are appended to the end of the existing kernel arguments.
  • A label that indicates where in the list of machine configs the change is applied.

Prerequisites

  • Have administrative privilege to a working OpenShift Container Platform cluster.

Procedure

  1. List existing MachineConfig objects for your OpenShift Container Platform cluster to determine how to label your machine config:

    $ oc get MachineConfig

    Example output

    NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
    00-master                                          52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    00-worker                                          52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-master-container-runtime                        52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-master-kubelet                                  52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-worker-container-runtime                        52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-worker-kubelet                                  52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    99-master-generated-registries                     52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    99-master-ssh                                                                                 3.2.0             40m
    99-worker-generated-registries                     52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    99-worker-ssh                                                                                 3.2.0             40m
    rendered-master-23e785de7587df95a4b517e0647e5ab7   52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    rendered-worker-5d596d9293ca3ea80c896a1191735bb1   52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m

  2. Create a MachineConfig object file that identifies the kernel argument (for example, 05-worker-kernelarg-selinuxpermissive.yaml)

    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: worker1
      name: 05-worker-kernelarg-selinuxpermissive2
    spec:
      config:
        ignition:
          version: 3.2.0
      kernelArguments:
        - enforcing=03
    1
    Applies the new kernel argument only to worker nodes.
    2
    Named to identify where it fits among the machine configs (05) and what it does (adds a kernel argument to configure SELinux permissive mode).
    3
    Identifies the exact kernel argument as enforcing=0.
  3. Create the new machine config:

    $ oc create -f 05-worker-kernelarg-selinuxpermissive.yaml
  4. Check the machine configs to see that the new one was added:

    $ oc get MachineConfig

    Example output

    NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
    00-master                                          52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    00-worker                                          52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-master-container-runtime                        52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-master-kubelet                                  52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-worker-container-runtime                        52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-worker-kubelet                                  52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    05-worker-kernelarg-selinuxpermissive                                                         3.2.0             105s
    99-master-generated-registries                     52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    99-master-ssh                                                                                 3.2.0             40m
    99-worker-generated-registries                     52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    99-worker-ssh                                                                                 3.2.0             40m
    rendered-master-23e785de7587df95a4b517e0647e5ab7   52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    rendered-worker-5d596d9293ca3ea80c896a1191735bb1   52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m

  5. Check the nodes:

    $ oc get nodes

    Example output

    NAME                           STATUS                     ROLES    AGE   VERSION
    ip-10-0-136-161.ec2.internal   Ready                      worker   28m   v1.21.0
    ip-10-0-136-243.ec2.internal   Ready                      master   34m   v1.21.0
    ip-10-0-141-105.ec2.internal   Ready,SchedulingDisabled   worker   28m   v1.21.0
    ip-10-0-142-249.ec2.internal   Ready                      master   34m   v1.21.0
    ip-10-0-153-11.ec2.internal    Ready                      worker   28m   v1.21.0
    ip-10-0-153-150.ec2.internal   Ready                      master   34m   v1.21.0

    You can see that scheduling on each worker node is disabled as the change is being applied.

  6. Check that the kernel argument worked by going to one of the worker nodes and listing the kernel command line arguments (in /proc/cmdline on the host):

    $ oc debug node/ip-10-0-141-105.ec2.internal

    Example output

    Starting pod/ip-10-0-141-105ec2internal-debug ...
    To use host binaries, run `chroot /host`
    
    sh-4.2# cat /host/proc/cmdline
    BOOT_IMAGE=/ostree/rhcos-... console=tty0 console=ttyS0,115200n8
    rootflags=defaults,prjquota rw root=UUID=fd0... ostree=/ostree/boot.0/rhcos/16...
    coreos.oem.id=qemu coreos.oem.id=ec2 ignition.platform.id=ec2 enforcing=0
    
    sh-4.2# exit

    You should see the enforcing=0 argument added to the other kernel arguments.

2.2.3. Enabling multipathing with kernel arguments on RHCOS

Red Hat Enterprise Linux CoreOS (RHCOS) supports multipathing on the primary disk, allowing stronger resilience to hardware failure to achieve higher host availability. Post-installation support is available by activating multipathing via the machine config.

Important

Enabling multipathing during installation is supported and recommended for nodes provisioned in OpenShift Container Platform 4.8 or higher. In setups where any I/O to non-optimized paths results in I/O system errors, you must enable multipathing at installation time. For more information about enabling multipathing during installation time, see "Enabling multipathing with kernel arguments on RHCOS" in the Installing on bare metal documentation.

Important

On IBM Z and LinuxONE, you can enable multipathing only if you configured your cluster for it during installation. For more information, see "Installing RHCOS and starting the OpenShift Container Platform bootstrap process" in Installing a cluster with z/VM on IBM Z and LinuxONE.

Prerequisites

  • You have a running OpenShift Container Platform cluster that uses version 4.7 or later.
  • You are logged in to the cluster as a user with administrative privileges.

Procedure

  1. To enable multipathing post-installation on control plane nodes:

    • Create a machine config file, such as 99-master-kargs-mpath.yaml, that instructs the cluster to add the master label and that identifies the multipath kernel argument, for example:

      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        labels:
          machineconfiguration.openshift.io/role: "master"
        name: 99-master-kargs-mpath
      spec:
        kernelArguments:
          - 'rd.multipath=default'
          - 'root=/dev/disk/by-label/dm-mpath-root'
  2. To enable multipathing post-installation on worker nodes:

    • Create a machine config file, such as 99-worker-kargs-mpath.yaml, that instructs the cluster to add the worker label and that identifies the multipath kernel argument, for example:

      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        labels:
          machineconfiguration.openshift.io/role: "worker"
        name: 99-worker-kargs-mpath
      spec:
        kernelArguments:
          - 'rd.multipath=default'
          - 'root=/dev/disk/by-label/dm-mpath-root'
  3. Create the new machine config by using either the master or worker YAML file you previously created:

    $ oc create -f ./99-master-kargs-mpath.yaml
  4. Check the machine configs to see that the new one was added:

    $ oc get MachineConfig

    Example output

    NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
    00-master                                          52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    00-worker                                          52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-master-container-runtime                        52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-master-kubelet                                  52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-worker-container-runtime                        52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    01-worker-kubelet                                  52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    99-master-kargs-mpath                              52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             105s
    99-master-generated-registries                     52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    99-master-ssh                                                                                 3.2.0             40m
    99-worker-generated-registries                     52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    99-worker-ssh                                                                                 3.2.0             40m
    rendered-master-23e785de7587df95a4b517e0647e5ab7   52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m
    rendered-worker-5d596d9293ca3ea80c896a1191735bb1   52dd3ba6a9a527fc3ab42afac8d12b693534c8c9   3.2.0             33m

  5. Check the nodes:

    $ oc get nodes

    Example output

    NAME                           STATUS                     ROLES    AGE   VERSION
    ip-10-0-136-161.ec2.internal   Ready                      worker   28m   v1.20.0
    ip-10-0-136-243.ec2.internal   Ready                      master   34m   v1.20.0
    ip-10-0-141-105.ec2.internal   Ready,SchedulingDisabled   worker   28m   v1.20.0
    ip-10-0-142-249.ec2.internal   Ready                      master   34m   v1.20.0
    ip-10-0-153-11.ec2.internal    Ready                      worker   28m   v1.20.0
    ip-10-0-153-150.ec2.internal   Ready                      master   34m   v1.20.0

    You can see that scheduling on each worker node is disabled as the change is being applied.

  6. Check that the kernel argument worked by going to one of the worker nodes and listing the kernel command line arguments (in /proc/cmdline on the host):

    $ oc debug node/ip-10-0-141-105.ec2.internal

    Example output

    Starting pod/ip-10-0-141-105ec2internal-debug ...
    To use host binaries, run `chroot /host`
    
    sh-4.2# cat /host/proc/cmdline
    ...
    rd.multipath=default root=/dev/disk/by-label/dm-mpath-root
    ...
    
    sh-4.2# exit

    You should see the added kernel arguments.

Additional resources

2.2.4. Adding a real-time kernel to nodes

Some OpenShift Container Platform workloads require a high degree of determinism.While Linux is not a real-time operating system, the Linux real-time kernel includes a preemptive scheduler that provides the operating system with real-time characteristics.

If your OpenShift Container Platform workloads require these real-time characteristics, you can switch your machines to the Linux real-time kernel. For OpenShift Container Platform, 4.8 you can make this switch using a MachineConfig object. Although making the change is as simple as changing a machine config kernelType setting to realtime, there are a few other considerations before making the change:

  • Currently, real-time kernel is supported only on worker nodes, and only for radio access network (RAN) use.
  • The following procedure is fully supported with bare metal installations that use systems that are certified for Red Hat Enterprise Linux for Real Time 8.
  • Real-time support in OpenShift Container Platform is limited to specific subscriptions.
  • The following procedure is also supported for use with Google Cloud Platform.

Prerequisites

  • Have a running OpenShift Container Platform cluster (version 4.4 or later).
  • Log in to the cluster as a user with administrative privileges.

Procedure

  1. Create a machine config for the real-time kernel: Create a YAML file (for example, 99-worker-realtime.yaml) that contains a MachineConfig object for the realtime kernel type. This example tells the cluster to use a real-time kernel for all worker nodes:

    $ cat << EOF > 99-worker-realtime.yaml
    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: "worker"
      name: 99-worker-realtime
    spec:
      kernelType: realtime
    EOF
  2. Add the machine config to the cluster. Type the following to add the machine config to the cluster:

    $ oc create -f 99-worker-realtime.yaml
  3. Check the real-time kernel: Once each impacted node reboots, log in to the cluster and run the following commands to make sure that the real-time kernel has replaced the regular kernel for the set of nodes you configured:

    $ oc get nodes

    Example output

    NAME                                        STATUS  ROLES    AGE   VERSION
    ip-10-0-143-147.us-east-2.compute.internal  Ready   worker   103m  v1.21.0
    ip-10-0-146-92.us-east-2.compute.internal   Ready   worker   101m  v1.21.0
    ip-10-0-169-2.us-east-2.compute.internal    Ready   worker   102m  v1.21.0

    $ oc debug node/ip-10-0-143-147.us-east-2.compute.internal

    Example output

    Starting pod/ip-10-0-143-147us-east-2computeinternal-debug ...
    To use host binaries, run `chroot /host`
    
    sh-4.4# uname -a
    Linux <worker_node> 4.18.0-147.3.1.rt24.96.el8_1.x86_64 #1 SMP PREEMPT RT
            Wed Nov 27 18:29:55 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

    The kernel name contains rt and text “PREEMPT RT” indicates that this is a real-time kernel.

  4. To go back to the regular kernel, delete the MachineConfig object:

    $ oc delete -f 99-worker-realtime.yaml

2.2.5. Configuring journald settings

If you need to configure settings for the journald service on OpenShift Container Platform nodes, you can do that by modifying the appropriate configuration file and passing the file to the appropriate pool of nodes as a machine config.

This procedure describes how to modify journald rate limiting settings in the /etc/systemd/journald.conf file and apply them to worker nodes. See the journald.conf man page for information on how to use that file.

Prerequisites

  • Have a running OpenShift Container Platform cluster.
  • Log in to the cluster as a user with administrative privileges.

Procedure

  1. Create a Butane config file, 40-worker-custom-journald.bu, that includes an /etc/systemd/journald.conf file with the required settings.

    Note

    See "Creating machine configs with Butane" for information about Butane.

    variant: openshift
    version: 4.8.0
    metadata:
      name: 40-worker-custom-journald
      labels:
        machineconfiguration.openshift.io/role: worker
    storage:
      files:
      - path: /etc/systemd/journald.conf
        mode: 0644
        overwrite: true
        contents:
          inline: |
            # Disable rate limiting
            RateLimitInterval=1s
            RateLimitBurst=10000
            Storage=volatile
            Compress=no
            MaxRetentionSec=30s
  2. Use Butane to generate a MachineConfig object file, 40-worker-custom-journald.yaml, containing the configuration to be delivered to the worker nodes:

    $ butane 40-worker-custom-journald.bu -o 40-worker-custom-journald.yaml
  3. Apply the machine config to the pool:

    $ oc apply -f 40-worker-custom-journald.yaml
  4. Check that the new machine config is applied and that the nodes are not in a degraded state. It might take a few minutes. The worker pool will show the updates in progress, as each node successfully has the new machine config applied:

    $ oc get machineconfigpool
    NAME   CONFIG             UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
    master rendered-master-35 True    False    False    3            3                 3                   0                    34m
    worker rendered-worker-d8 False   True     False    3            1                 1                   0                    34m
  5. To check that the change was applied, you can log in to a worker node:

    $ oc get node | grep worker
    ip-10-0-0-1.us-east-2.compute.internal   Ready    worker   39m   v0.0.0-master+$Format:%h$
    $ oc debug node/ip-10-0-0-1.us-east-2.compute.internal
    Starting pod/ip-10-0-141-142us-east-2computeinternal-debug ...
    ...
    sh-4.2# chroot /host
    sh-4.4# cat /etc/systemd/journald.conf
    # Disable rate limiting
    RateLimitInterval=1s
    RateLimitBurst=10000
    Storage=volatile
    Compress=no
    MaxRetentionSec=30s
    sh-4.4# exit

2.2.6. Adding extensions to RHCOS

RHCOS is a minimal container-oriented RHEL operating system, designed to provide a common set of capabilities to OpenShift Container Platform clusters across all platforms. While adding software packages to RHCOS systems is generally discouraged, the MCO provides an extensions feature you can use to add a minimal set of features to RHCOS nodes.

Currently, the following extension is available:

  • usbguard: Adding the usbguard extension protects RHCOS systems from attacks from intrusive USB devices. See USBGuard for details.

The following procedure describes how to use a machine config to add one or more extensions to your RHCOS nodes.

Prerequisites

  • Have a running OpenShift Container Platform cluster (version 4.6 or later).
  • Log in to the cluster as a user with administrative privileges.

Procedure

  1. Create a machine config for extensions: Create a YAML file (for example, 80-extensions.yaml) that contains a MachineConfig extensions object. This example tells the cluster to add the usbguard extension.

    $ cat << EOF > 80-extensions.yaml
    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: worker
      name: 80-worker-extensions
    spec:
      config:
        ignition:
          version: 3.2.0
      extensions:
        - usbguard
    EOF
  2. Add the machine config to the cluster. Type the following to add the machine config to the cluster:

    $ oc create -f 80-extensions.yaml

    This sets all worker nodes to have rpm packages for usbguard installed.

  3. Check that the extensions were applied:

    $ oc get machineconfig 80-worker-extensions

    Example output

    NAME                 GENERATEDBYCONTROLLER IGNITIONVERSION AGE
    80-worker-extensions                       3.2.0           57s

  4. Check that the new machine config is now applied and that the nodes are not in a degraded state. It may take a few minutes. The worker pool will show the updates in progress, as each machine successfully has the new machine config applied:

    $ oc get machineconfigpool

    Example output

    NAME   CONFIG             UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
    master rendered-master-35 True    False    False    3            3                 3                   0                    34m
    worker rendered-worker-d8 False   True     False    3            1                 1                   0                    34m

  5. Check the extensions. To check that the extension was applied, run:

    $ oc get node | grep worker

    Example output

    NAME                                        STATUS  ROLES    AGE   VERSION
    ip-10-0-169-2.us-east-2.compute.internal    Ready   worker   102m  v1.18.3

    $ oc debug node/ip-10-0-169-2.us-east-2.compute.internal

    Example output

    ...
    To use host binaries, run `chroot /host`
    sh-4.4# chroot /host
    sh-4.4# rpm -q usbguard
    usbguard-0.7.4-4.el8.x86_64.rpm

Use the "Configuring chrony time service" section as a model for how to go about adding other configuration files to OpenShift Container Platform nodes.

2.3. Configuring MCO-related custom resources

Besides managing MachineConfig objects, the MCO manages two custom resources (CRs): KubeletConfig and ContainerRuntimeConfig. Those CRs let you change node-level settings impacting how the Kubelet and CRI-O container runtime services behave.

2.3.1. Creating a KubeletConfig CRD to edit kubelet parameters

The kubelet configuration is currently serialized as an Ignition configuration, so it can be directly edited. However, there is also a new kubelet-config-controller added to the Machine Config Controller (MCC). This lets you use a KubeletConfig custom resource (CR) to edit the kubelet parameters.

Note

As the fields in the kubeletConfig object are passed directly to the kubelet from upstream Kubernetes, the kubelet validates those values directly. Invalid values in the kubeletConfig object might cause cluster nodes to become unavailable. For valid values, see the Kubernetes documentation.

Consider the following guidance:

  • Create one KubeletConfig CR for each machine config pool with all the config changes you want for that pool. If you are applying the same content to all of the pools, you need only one KubeletConfig CR for all of the pools.
  • Edit an existing KubeletConfig CR to modify existing settings or add new settings, instead of creating a CR for each change. It is recommended that you create a CR only to modify a different machine config pool, or for changes that are intended to be temporary, so that you can revert the changes.
  • As needed, create multiple KubeletConfig CRs with a limit of 10 per cluster. For the first KubeletConfig CR, the Machine Config Operator (MCO) creates a machine config appended with kubelet. With each subsequent CR, the controller creates another kubelet machine config with a numeric suffix. For example, if you have a kubelet machine config with a -2 suffix, the next kubelet machine config is appended with -3.

If you want to delete the machine configs, delete them in reverse order to avoid exceeding the limit. For example, you delete the kubelet-3 machine config before deleting the kubelet-2 machine config.

Note

If you have a machine config with a kubelet-9 suffix, and you create another KubeletConfig CR, a new machine config is not created, even if there are fewer than 10 kubelet machine configs.

Example KubeletConfig CR

$ oc get kubeletconfig

NAME                AGE
set-max-pods        15m

Example showing a KubeletConfig machine config

$ oc get mc | grep kubelet

...
99-worker-generated-kubelet-1                  b5c5119de007945b6fe6fb215db3b8e2ceb12511   3.2.0             26m
...

The following procedure is an example to show how to configure the maximum number of pods per node on the worker nodes.

Prerequisites

  1. Obtain the label associated with the static MachineConfigPool CR for the type of node you want to configure. Perform one of the following steps:

    1. View the machine config pool:

      $ oc describe machineconfigpool <name>

      For example:

      $ oc describe machineconfigpool worker

      Example output

      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfigPool
      metadata:
        creationTimestamp: 2019-02-08T14:52:39Z
        generation: 1
        labels:
          custom-kubelet: set-max-pods 1

      1
      If a label has been added it appears under labels.
    2. If the label is not present, add a key/value pair:

      $ oc label machineconfigpool worker custom-kubelet=set-max-pods

Procedure

  1. View the available machine configuration objects that you can select:

    $ oc get machineconfig

    By default, the two kubelet-related configs are 01-master-kubelet and 01-worker-kubelet.

  2. Check the current value for the maximum pods per node:

    $ oc describe node <node_name>

    For example:

    $ oc describe node ci-ln-5grqprb-f76d1-ncnqq-worker-a-mdv94

    Look for value: pods: <value> in the Allocatable stanza:

    Example output

    Allocatable:
     attachable-volumes-aws-ebs:  25
     cpu:                         3500m
     hugepages-1Gi:               0
     hugepages-2Mi:               0
     memory:                      15341844Ki
     pods:                        250

  3. Set the maximum pods per node on the worker nodes by creating a custom resource file that contains the kubelet configuration:

    apiVersion: machineconfiguration.openshift.io/v1
    kind: KubeletConfig
    metadata:
      name: set-max-pods
    spec:
      machineConfigPoolSelector:
        matchLabels:
          custom-kubelet: set-max-pods 1
      kubeletConfig:
        maxPods: 500 2
    1
    Enter the label from the machine config pool.
    2
    Add the kubelet configuration. In this example, use maxPods to set the maximum pods per node.
    Note

    The rate at which the kubelet talks to the API server depends on queries per second (QPS) and burst values. The default values, 50 for kubeAPIQPS and 100 for kubeAPIBurst, are sufficient if there are limited pods running on each node. It is recommended to update the kubelet QPS and burst rates if there are enough CPU and memory resources on the node.

    apiVersion: machineconfiguration.openshift.io/v1
    kind: KubeletConfig
    metadata:
      name: set-max-pods
    spec:
      machineConfigPoolSelector:
        matchLabels:
          custom-kubelet: set-max-pods
      kubeletConfig:
        maxPods: <pod_count>
        kubeAPIBurst: <burst_rate>
        kubeAPIQPS: <QPS>
    1. Update the machine config pool for workers with the label:

      $ oc label machineconfigpool worker custom-kubelet=large-pods
    2. Create the KubeletConfig object:

      $ oc create -f change-maxPods-cr.yaml
    3. Verify that the KubeletConfig object is created:

      $ oc get kubeletconfig

      Example output

      NAME                AGE
      set-max-pods        15m

      Depending on the number of worker nodes in the cluster, wait for the worker nodes to be rebooted one by one. For a cluster with 3 worker nodes, this could take about 10 to 15 minutes.

  4. Verify that the changes are applied to the node:

    1. Check on a worker node that the maxPods value changed:

      $ oc describe node <node_name>
    2. Locate the Allocatable stanza:

       ...
      Allocatable:
        attachable-volumes-gce-pd:  127
        cpu:                        3500m
        ephemeral-storage:          123201474766
        hugepages-1Gi:              0
        hugepages-2Mi:              0
        memory:                     14225400Ki
        pods:                       500 1
       ...
      1
      In this example, the pods parameter should report the value you set in the KubeletConfig object.
  5. Verify the change in the KubeletConfig object:

    $ oc get kubeletconfigs set-max-pods -o yaml

    This should show a status: "True" and type:Success:

    spec:
      kubeletConfig:
        maxPods: 500
      machineConfigPoolSelector:
        matchLabels:
          custom-kubelet: set-max-pods
    status:
      conditions:
      - lastTransitionTime: "2021-06-30T17:04:07Z"
        message: Success
        status: "True"
        type: Success

2.3.2. Creating a ContainerRuntimeConfig CR to edit CRI-O parameters

You can change some of the settings associated with the OpenShift Container Platform CRI-O runtime for the nodes associated with a specific machine config pool (MCP). Using a ContainerRuntimeConfig custom resource (CR), you set the configuration values and add a label to match the MCP. The MCO then rebuilds the crio.conf and storage.conf configuration files on the associated nodes with the updated values.

Note

To revert the changes implemented by using a ContainerRuntimeConfig CR, you must delete the CR. Removing the label from the machine config pool does not revert the changes.

You can modify the following settings by using a ContainerRuntimeConfig CR:

  • PIDs limit: The pidsLimit parameter sets the CRI-O pids_limit parameter, which is maximum number of processes allowed in a container. The default is 1024 (pids_limit = 1024).
  • Log level: The logLevel parameter sets the CRI-O log_level parameter, which is the level of verbosity for log messages. The default is info (log_level = info). Other options include fatal, panic, error, warn, debug, and trace.
  • Overlay size: The overlaySize parameter sets the CRI-O Overlay storage driver size parameter, which is the maximum size of a container image. The default is 10 GB (size = "10G").
  • Maximum log size: The logSizeMax parameter sets the CRI-O log_size_max parameter, which is the maximum size allowed for the container log file. The default is unlimited (log_size_max = -1). If set to a positive number, it must be at least 8192 to not be smaller than the ConMon read buffer. ConMon is a program that monitors communications between a container manager (such as Podman or CRI-O) and the OCI runtime (such as runc or crun) for a single container.

You should have one ContainerRuntimeConfig CR for each machine config pool with all the config changes you want for that pool. If you are applying the same content to all the pools, you only need one ContainerRuntimeConfig CR for all the pools.

You should edit an existing ContainerRuntimeConfig CR to modify existing settings or add new settings instead of creating a new CR for each change. It is recommended to create a new ContainerRuntimeConfig CR only to modify a different machine config pool, or for changes that are intended to be temporary so that you can revert the changes.

You can create multiple ContainerRuntimeConfig CRs, as needed, with a limit of 10 per cluster. For the first ContainerRuntimeConfig CR, the MCO creates a machine config appended with containerruntime. With each subsequent CR, the controller creates a new containerruntime machine config with a numeric suffix. For example, if you have a containerruntime machine config with a -2 suffix, the next containerruntime machine config is appended with -3.

If you want to delete the machine configs, you should delete them in reverse order to avoid exceeding the limit. For example, you should delete the containerruntime-3 machine config before deleting the containerruntime-2 machine config.

Note

If you have a machine config with a containerruntime-9 suffix, and you create another ContainerRuntimeConfig CR, a new machine config is not created, even if there are fewer than 10 containerruntime machine configs.

Example showing multiple ContainerRuntimeConfig CRs

$ oc get ctrcfg

Example output

NAME         AGE
ctr-pid      24m
ctr-overlay  15m
ctr-level    5m45s

Example showing multiple containerruntime machine configs

$ oc get mc | grep container

Example output

...
01-master-container-runtime                        b5c5119de007945b6fe6fb215db3b8e2ceb12511   3.2.0             57m
...
01-worker-container-runtime                        b5c5119de007945b6fe6fb215db3b8e2ceb12511   3.2.0             57m
...
99-worker-generated-containerruntime               b5c5119de007945b6fe6fb215db3b8e2ceb12511   3.2.0             26m
99-worker-generated-containerruntime-1             b5c5119de007945b6fe6fb215db3b8e2ceb12511   3.2.0             17m
99-worker-generated-containerruntime-2             b5c5119de007945b6fe6fb215db3b8e2ceb12511   3.2.0             7m26s
...

The following example raises the pids_limit to 2048, sets the log_level to debug, sets the overlay size to 8 GB, and sets the log_size_max to unlimited:

Example ContainerRuntimeConfig CR

apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
 name: overlay-size
spec:
 machineConfigPoolSelector:
   matchLabels:
     pools.operator.machineconfiguration.openshift.io/worker: '' 1
 containerRuntimeConfig:
   pidsLimit: 2048 2
   logLevel: debug 3
   overlaySize: 8G 4
   logSizeMax: "-1" 5

1
Specifies the machine config pool label.
2
Optional: Specifies the level of verbosity for log messages.
3
Optional: Specifies the maximum size allowed for the container log file. If set to a positive number, it must be at least 8192.
4
Optional: Specifies the maximum size of a container image.
5
Optional: Specifies the maximum number of processes allowed in a container.

Procedure

To change CRI-O settings using the ContainerRuntimeConfig CR:

  1. Create a YAML file for the ContainerRuntimeConfig CR:

    apiVersion: machineconfiguration.openshift.io/v1
    kind: ContainerRuntimeConfig
    metadata:
     name: overlay-size
    spec:
     machineConfigPoolSelector:
       matchLabels:
         pools.operator.machineconfiguration.openshift.io/worker: '' 1
     containerRuntimeConfig: 2
       pidsLimit: 2048
       logLevel: debug
       overlaySize: 8G
       logSizeMax: "-1"
    1
    Specify a label for the machine config pool that you want you want to modify.
    2
    Set the parameters as needed.
  2. Create the ContainerRuntimeConfig CR:

    $ oc create -f <file_name>.yaml
  3. Verify that the CR is created:

    $ oc get ContainerRuntimeConfig

    Example output

    NAME           AGE
    overlay-size   3m19s

  4. Check that a new containerruntime machine config is created:

    $ oc get machineconfigs | grep containerrun

    Example output

    99-worker-generated-containerruntime   2c9371fbb673b97a6fe8b1c52691999ed3a1bfc2  3.2.0  31s

  5. Monitor the machine config pool until all are shown as ready:

    $ oc get mcp worker

    Example output

    NAME    CONFIG               UPDATED  UPDATING  DEGRADED  MACHINECOUNT  READYMACHINECOUNT  UPDATEDMACHINECOUNT  DEGRADEDMACHINECOUNT  AGE
    worker  rendered-worker-169  False    True      False     3             1                  1                    0                     9h

  6. Verify that the settings were applied in CRI-O:

    1. Open an oc debug session to a node in the machine config pool and run chroot /host.

      $ oc debug node/<node_name>
      sh-4.4# chroot /host
    2. Verify the changes in the crio.conf file:

      sh-4.4# crio config | egrep 'log_level|pids_limit|log_size_max'

      Example output

      pids_limit = 2048
      log_size_max = -1
      log_level = "debug"

    3. Verify the changes in the `storage.conf`file:

      sh-4.4# head -n 7 /etc/containers/storage.conf

      Example output

      [storage]
        driver = "overlay"
        runroot = "/var/run/containers/storage"
        graphroot = "/var/lib/containers/storage"
        [storage.options]
          additionalimagestores = []
          size = "8G"

2.3.3. Setting the default maximum container root partition size for Overlay with CRI-O

The root partition of each container shows all of the available disk space of the underlying host. Follow this guidance to set a maximum partition size for the root disk of all containers.

To configure the maximum Overlay size, as well as other CRI-O options like the log level and PID limit, you can create the following ContainerRuntimeConfig custom resource definition (CRD):

apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
 name: overlay-size
spec:
 machineConfigPoolSelector:
   matchLabels:
     custom-crio: overlay-size
 containerRuntimeConfig:
   pidsLimit: 2048
   logLevel: debug
   overlaySize: 8G

Procedure

  1. Create the configuration object:

    $ oc apply -f overlaysize.yml
  2. To apply the new CRI-O configuration to your worker nodes, edit the worker machine config pool:

    $ oc edit machineconfigpool worker
  3. Add the custom-crio label based on the matchLabels name you set in the ContainerRuntimeConfig CRD:

    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfigPool
    metadata:
      creationTimestamp: "2020-07-09T15:46:34Z"
      generation: 3
      labels:
        custom-crio: overlay-size
        machineconfiguration.openshift.io/mco-built-in: ""
  4. Save the changes, then view the machine configs:

    $ oc get machineconfigs

    New 99-worker-generated-containerruntime and rendered-worker-xyz objects are created:

    Example output

    99-worker-generated-containerruntime  4173030d89fbf4a7a0976d1665491a4d9a6e54f1   3.2.0             7m42s
    rendered-worker-xyz                   4173030d89fbf4a7a0976d1665491a4d9a6e54f1   3.2.0             7m36s

  5. After those objects are created, monitor the machine config pool for the changes to be applied:

    $ oc get mcp worker

    The worker nodes show UPDATING as True, as well as the number of machines, the number updated, and other details:

    Example output

    NAME   CONFIG              UPDATED   UPDATING   DEGRADED  MACHINECOUNT  READYMACHINECOUNT  UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
    worker rendered-worker-xyz False True False     3             2                   2                    0                      20h

    When complete, the worker nodes transition back to UPDATING as False, and the UPDATEDMACHINECOUNT number matches the MACHINECOUNT:

    Example output

    NAME   CONFIG              UPDATED   UPDATING   DEGRADED  MACHINECOUNT  READYMACHINECOUNT  UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
    worker   rendered-worker-xyz   True      False      False      3         3            3             0           20h

    Looking at a worker machine, you see that the new 8 GB max size configuration is applied to all of the workers:

    Example output

    head -n 7 /etc/containers/storage.conf
    [storage]
      driver = "overlay"
      runroot = "/var/run/containers/storage"
      graphroot = "/var/lib/containers/storage"
      [storage.options]
        additionalimagestores = []
        size = "8G"

    Looking inside a container, you see that the root partition is now 8 GB:

    Example output

    ~ $ df -h
    Filesystem                Size      Used Available Use% Mounted on
    overlay                   8.0G      8.0K      8.0G   0% /