OVN/OVS offloading with OpenShift on NVIDIA BlueField-2 DPUs

Updated -

OpenShift Container Platform installed on DPUs facilitates OVN/OVS offloading.

Note
OVN/OVS offloading with OpenShift on NVIDIA BlueField-2 DPUs is a Developer Preview feature in OpenShift Container Platform 4.10 only. It is not available on previous versions of OpenShift Container Platform.

About Developer Preview features
Developer Preview features are not supported with Red Hat production service level agreements (SLAs) and are not functionally complete. Red Hat does not advise using them in a production setting. Developer Preview features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process. These releases may not have any documentation, and testing is limited. Red Hat may provide ways to submit feedback on Developer Preview releases without an associated SLA.

The features described in this document are for Developer Preview purposes and are not supported by Red Hat at this time.

About OVN/OVS offloading with OpenShift on NVIDIA BlueField-2 DPUs

Learn how to accelerate and offload software subsystems using OpenShift Container Platform and the Data Processing Unit (DPU). DPUs are a class of reprogrammable high-performance processors combined with high-performance network interfaces optimized to perform and accelerate network and storage functions carried out by data center servers.
The Data Processing Unit (DPU) is a complete compute system with an independent software stack, network identity, and provisioning capabilities. The DPU is fully capable of hosting its own applications using either embedded or orchestrated deployment models.

The unique capabilities of the DPU are disruptive because they allow for key infrastructure functions and their associated software stacks, to be completely removed from the host node’s CPU cores and to be relocated onto the DPU.

Installing and configuring an accelerated infrastructure with OpenShift and DPUs

Installing OpenShift Container Platform on a DPU makes it possible to offload packet processing from the host x86 to the DPU. Offloading resource-intensive computational tasks, such as packet processing, from the server’s CPU to the DPU frees up cycles cycles on the OpenShift Container Platform worker nodes to run the applications.

OpenShift and DPU deployment architecture

The proposed deployment architecture is a two-cluster design. In this architecture, DPU cards are provisioned as worker nodes of the ARM-based infrastructure cluster. The tenant cluster, composed of the x86 servers, is where the normal user applications run. The following diagram illustrates the deployment architecture.

The installation of this infrastructure cluster can be on ARM hardware or a mixed environment of x86 control plane and ARM worker nodes.

OpenShift Container Platform on DPU

The steps involved in deploying and configuring are:

  1. Install the infrastructure cluster by using the assisted installer.

    1. Install 3 control plane nodes and at least 2 DPU worker nodes.
  2. Install the control plane nodes of the tenant cluster by using the assisted installer.

  3. Install the DPU Network Operator on the infrastructure cluster.

    1. Partially configure the DPU Network Operator effectively making it aware of the attached DPU.
  4. Configure support for hardware offloading in the infrastructure cluster.

  5. Configure support for hardware offloading in the tenant cluster.

    1. Enable the SR-IOV Network Operator on the hosts with DPU mode.
  6. Add worker nodes to the tenant cluster.

  7. Final infrastructure configuration.

Installing the infrastructure cluster

Use the Assisted Installer to install OpenShift Container Platform on the infrastructure cluster running on multi architecture with the control plane nodes running on x86_64 and the worker nodes running on ARM architecture. The infrastructure cluster both the control plane and worker nodes can run on ARM architecture.

The infrastructure cluster should use the DPU’s uplink network as the primary network and not the out-of-band management network.

  • Shared VLAN: The Assisted Installer requires all nodes to be present on the same VLAN. This is a requirement of utilizing a virtual IP for both ingress and API.

  • DHCP: Much like an installer-provisioned infrastructure (IPI) installation DHCP is required in the VLAN.

  • DNS: Records for ingress and API are required for accessing the cluster.

Generating the discovery ISOs with the Assisted Installer for the infrastructure nodes

Installing OpenShift Container Platform infrastructure cluster (3 Control plane nodes and 3 worker nodes) requires the creation of two discovery ISOs. You can use the Assisted Installer UI or the API to create discovery ISOs for both x86_64 and arm64.

See the Assisted Installer for OpenShift Container Platform documentation for details on using the Assisted Installer.

When installing using the UI and setting the cluster details in the dropdown menu associated with the OpenShift version select an OpenShift Container Platform version with the -multi extension for example OpenShift Container Platform 4.12.15-multi. Follow the installation instructions, and for the control plane nodes, choose x86_64 as the CPU architecture from the dropdown menu, and proceed to download the discovery ISO. For the worker nodes, choose 'arm64' as the CPU architecture and follow the instructions to download a second discovery ISO.

A summary of the steps relevant to installing a multi architecture using the Assisted Installer API are as follows:

  1. Register a new cluster setting the openshift_version to install OpenShift Container Platform 4.12.15-multi.

  2. Register a new infrastructure environment setting the cpu_architecture to x86_64 and proceed to download the discovery ISO. For the worker nodes, set arm64 as the cpu_architecture and follow the instructions to download a second discovery ISO.

    You need to create two infrastructure environments.

After completing this stage you will have two discovery ISOs. Boot the x86_64 control plane nodes following the standard boot procedure appropriate for your environment described in Booting hosts with the discovery image for additional details.

Download and extract the discovery ISO using ansible

The DPU network card does not currently allow you to install using virtual media so you need to use the PXE boot method. An ansible repo is available to ensure the download and extraction of the discovery ISO needed to PXE boot these additional hosts is easy to perform. Use the playbooks in this repo to:

  • Add worker nodes to the existing infrastructure cluster.

  • Set up the PXE services putting the files in the correct location to boot the added hosts.

The Ansible Assisted Installer Playbooks includes two different roles:

  • setup-pxe: This role installs the basic services required for PXE booting namely:

    • HTTP server

    • TFTP server

    • Optional: DHCP server. Add the configuration for DHCP to the playbook.yaml to install the DHCP service.

      DHCP is optional as you may already have a DHCP server.

  • download-iso-pxe: This role is for Assisted Installer day2 deployments. Giving an existing cluster it adds the worker node using the API. This role also transforms the cluster to Day2, downloads and extracts the ISO into HTTP and TFTP folders. In addition it generates the grub.cfg.

This procedure describes how to download and install the discovery ISO. Hosts have already been added to the existing cluster.

  • You have ansible version 2.9 or greater installed.

  • You have installed jq.See the jq Manual for detailed information about using jq.

  1. Clone the Ansible Assisted Installer Playbooks repo.

    Clone the repo on the installer node or on your laptop (that has access to the installer node).

    $ git clone https://github.com/rh-ecosystem-edge/ansible-ai-playbook.git
    
  2. Edit the playbook.yaml.

    ---
    - hosts: all
      roles:
        - setup-pxe 
        - download-iso-pxe
      vars:
        ARCH: arm
        URL: http://<URL>:<PORT> 
        CLUSTER_ID: "<CLUSTER UUID>" 
    
    • This field sets up the basic services required for PXE booting

    • Enter the Assisted Installer URL and port number

    • Enter the cluster UUID from the Assisted Installer UI.

    The option exists to download the ISO manually and extract what is needed to support PXE booting the hosts. To do this add two variables to the playbook.yaml. The variables are:

    • ISO_NAME: "discovery_image.iso" - This variable is the name of the downloaded ISO.

    • WORKDIR: "/tmp/download-iso-pxe"- This variable specifies the directory where the ISO is downloaded to.

  3. Download and extract the discovery ISO by using the following command.

    $ ansible-playbook -i localhost, playbook.yaml --tags extract,download
    

    Use localhost if running the command on the installer node. Replace localhost with the FQDN of the installer node if running the command on your laptop. By default the name of the downloaded ISO is discovery_image.iso and it is downloaded to /tmp/download-iso-pxe.

  4. Verify the discovery image exists in /tmp/download-iso-pxe.

    1. List the contents of the download directory.

      $ ll /tmp/download-iso-pxe
      

      Expected output

      discovery_image.iso
      
  5. Verify the correct files are extracted to the correct locations.

    1. Verify the ignition file and rootfs image exists in /var/www/html/pxe.

      $ ll /var/www/html/pxe
      

      Expected output

      total 839720
      -rw-r--r--. 1 apache apache     10484 Mar 23 06:15 config.ign
      -rw-r--r--. 1 apache apache 859858944 Mar 23 06:15 rootfs.img
      
    2. Verify the grub.cfg and the files needed to boot the VM exist in /var/lib/tftpboot.

      $ ll /var/lib/tftpboot
      

      Expected output

      total 89956
      -rw-r--r--. 1 root root   857984 Jan 28 01:13 BOOTAA64.EFI
      -rw-r--r--. 1 root root  2446280 Jan 28 01:13 grubaa64.efi
      -rw-r--r--. 1 root root      567 Mar 23 06:16 grub.cfg
      -rw-r--r--. 1 root root 79229244 Mar 23 06:06 initrd.img
      -rw-r--r--. 1 root root  9565104 Mar 23 06:06 vmlinuz
      

At this stage you have all the files required to PXE boot the DPU hosts.

Disabling DNS, the Ingress Controller, and monitoring on the worker nodes

Apply the following commands to disable the DNS, Ingress Operator and monitoring on the infrastructure worker nodes.

  • Install the OpenShift CLI (oc).

  • Log in to infrastructure cluster as a user with cluster-admin privileges..

  1. Run the following command to patch the dns.operator/default and ensure that the DNS Operator is scheduled on a OpenShift Container Platform node with the label node-role.kubernetes.io/master:

    $ oc patch dns.operator/default --type=merge --patch='{"spec":{"nodePlacement":{"nodeSelector":{"node-role.kubernetes.io/master":""}}}}'
    
  2. Run the following command to patch the ingresscontroller/default and ensure that the Ingress Operator is scheduled on a OpenShift Container Platform node with the label node-role.kubernetes.io/master:

    $ oc patch ingresscontroller/default --type=merge --patch='{"spec":{"nodePlacement":{"nodeSelector":{"matchLabels":{"node-role.kubernetes.io/master":""}}}}}' -n openshift-ingress-operator
    
  3. Disable monitoring on the worker nodes by following these steps:

    1. Create the following YAML and save it in file named monitor-patch-cm.yaml:

      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: cluster-monitoring-config
        namespace: openshift-monitoring
      data:
        config.yaml: |
          prometheusOperator:
            nodeSelector:
              node-role.kubernetes.io/master: ""
          prometheusK8s:
            nodeSelector:
              node-role.kubernetes.io/master: ""
          alertmanagerMain:
            nodeSelector:
              node-role.kubernetes.io/master: ""
          kubeStateMetrics:
            nodeSelector:
              node-role.kubernetes.io/master: ""
          grafana:
            nodeSelector:
              node-role.kubernetes.io/master: ""
          telemeterClient:
            nodeSelector:
              node-role.kubernetes.io/master: ""
          k8sPrometheusAdapter:
            nodeSelector:
              node-role.kubernetes.io/master: ""
          openshiftStateMetrics:
            nodeSelector:
             node-role.kubernetes.io/master: ""
          thanosQuerier:
            nodeSelector:
              node-role.kubernetes.io/master: ""
      
    2. Run the following command:

      $ oc create -f monitor-patch-cm.yaml
      

Installing the tenant cluster (Control plane nodes only)

Use the Assisted Installer to install the tenant cluster.

  • Shared VLAN: The Assisted Installer requires all nodes to be present on the same VLAN. This is a requirement of utilizing a virtual IP for both ingress and API.

  • DHCP: Much like an IPI installation DHCP is required in the VLAN.

  • DNS: Records for ingress and API are required for accessing the cluster.

Generating the discovery ISO with the Assisted Installer

Installing OpenShift Container Platform x86_64 tenant cluster (3 Control plane nodes and 3 workers nodes) requires a discovery ISO, which the Assisted Installer (AI) can generate with the cluster name, base domain, Secure Shell (SSH) public key, and pull secret.

In the initial install you are only installing the control plane nodes.

Use the Assisted Installer to create a discovery ISO for the x86_64 tenant cluster. See the Assisted Installer for OpenShift Container Platform documentation for details on using the Assisted Installer.

Once you have downloaded the discovery ISO boot the tenant cluster control plane nodes with the discovery image. See Booting hosts with the discovery image for additional details.

Installing the BlueField-2 DPU

The general steps to follow for installing the BlueField-2 DPU are:

  1. Ensure that your server hardware is compatible with the BlueField-2 DPU. You can check the compatibility list on the NVIDIA website.

  2. Install the BlueField-2 DPU onto your server hardware. The DPU will need to be installed in a PCI Express (PCIe) slot on your server.

  3. Connect the BlueField-2 DPU to your server’s power supply and network interface.

  4. Install the BlueField-2 DPU software on your server. The software includes the BlueField-2 driver, firmware, and management software.

  5. Configure the BlueField-2 DPU using the management software. You can use the BlueField-2 Manager to configure and manage the DPU.

  6. Verify that the BlueField-2 DPU is installed and working correctly. You can use the BlueField-2 Manager to check the status of the DPU.

These are general steps, and the specific installation process might change depending on your server hardware and software environment. Consult the installation guide for the BlueField-2 DPU for detailed instructions.

Installing and enabling offloading with the DPU Network Operator

The DPU Network Operator on the infrastructure cluster side is responsible for the life-cycle management of the ovn-kube components and the necessary host network initialization on DPU cards.

  • An infrastructure cluster composed of x86 control plane and ARM worker nodes are up and running.

  • An tenant cluster composed of x86 control plane nodes.

  • DPU cards installed on the worker nodes of the infrastructure cluster where hardware offloading need to be enabled.

  • Pods in the infrastructure cluster can reach the API server of the tenant cluster

  • Network configuration:

    • The infrastructure cluster and the tenant cluster share the same VLAN as the API network.

    • The two clusters shall use different VIPs as the cluster API VIP.

    • The DHCP server needs to be configured to be able to assign IP addresses to hosts from both clusters from the same subnet.

    • The DNS server shall be able to resolve the URI to the API URL for both the clusters.

As a cluster administrator, you can install the Operator using the CLI.

  • Install the OpenShift CLI (oc).

  • Log in to infrastructure cluster as a user with cluster-admin privileges.

  1. Create a namespace for the DPU Network Operator by completing the following actions:

    1. Create the following Namespace Custom Resource (CR) that defines the openshift-dpu-network-operator namespace, and then save the YAML in the dpuno-namespace.yaml file

      apiVersion: v1
      kind: Namespace
      metadata:
        name: openshift-dpu-network-operator
        annotations:
          workload.openshift.io/allowed: management
      
    2. Create the namespace by running the following command:

      $ oc create -f dpuno-namespace.yaml
      
  2. Install the DPU Network Operator in the namespace you created in the previous step by creating the following objects:

    1. Create the following OperatorGroup CR and save the YAML in the dpuno-operatorgroup.yaml file:

      apiVersion: operators.coreos.com/v1
      kind: OperatorGroup
      metadata:
        name: dpu-network-operators
        namespace: openshift-dpu-network-operator
      
    2. Create the OperatorGroup CR by running the following command:

      $ oc create -f dpuno-operatorgroup.yaml
      
  3. Create the following Subscription CR and save the YAML in the dpu-sub.yaml file:

    Example Subscription

    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: openshift-dpu-operator-subscription
      namespace: openshift-dpu-network-operator
    spec:
      channel: "stable"
      name: dpu-network-operator
      source: redhat-operators 
      sourceNamespace: openshift-marketplace
    
    • You must specify the redhat-operators value.
  4. Create the Subscription object by running the following command:

    $ oc create -f dpu-sub.yaml
    
  5. Change to the openshift-dpu-network-operator project:

    $ oc project openshift-dpu-network-operator
    
  6. Verify the DPU Network Operator is running:

    $ oc get pods -n openshift-dpu-network-operator
    

    Example output

    NAME                                                      READY   STATUS    RESTARTS   AGE
    dpu-network-operator-controller-manager-cc9ccc4bd-9vqcg   2/2     Running   0          62s
    

Creating a dedicated namespace

You need to install the OVNKubeConfig custom resource (CR), the ovnkube-node overrides config map (CM) and the secret containing the tenant kubeconfig into the same namespace. Follow this procedure to create this namespace.

  • Install the OpenShift CLI (oc).

  • Log in as a user with cluster-admin privileges.

  1. Create a YAML file named for example dpu_namespace.yaml that contains the following YAML:

    apiVersion: v1
    kind: Namespace
    metadata:
      labels:
        pod-security.kubernetes.io/enforce: privileged
        pod-security.kubernetes.io/audit: privileged
        pod-security.kubernetes.io/warn: privileged
        security.openshift.io/scc.podSecurityLabelSync: "false"
        openshift.io/run-level: "0"
      name: tenantcluster-dpu
    
  2. Create the namespace by running the following command:

    $ oc create -f dpu_namespace.yaml
    

Configuring support for hardware offloading in the infrastructure cluster

Configure support for hardware offloading in the infrastructure cluster by using this procedure.

  • Install the OpenShift CLI (oc).

  • Log in as a user with cluster-admin privileges.

  1. Create the following OVNKubeConfig custom resource (CR) with the poolName dpu, leaving the kubeConfigFile blank. Save the YAML in the ovnkubeconfig.yaml file:

    apiVersion: dpu.openshift.io/v1alpha1
    kind: OVNKubeConfig
    metadata:
      name: ovnkubeconfig-sample
      namespace: tenantcluster-dpu
    spec:
      poolName: dpu
      nodeSelector:
        matchLabels:
          node-role.kubernetes.io/dpu-worker: ""
    
    1. Run the following command:

      $ oc create -f ovnkubeconfig.yaml
      

      The DPU Network Operator creates a custom MachineConfigPool and a custom MachineConfig.

  2. Get the node names by using the following command:

    $ oc get nodes
    
  3. Label the DPU nodes in the infrastructure cluster. The Machine Config Operator applies the new MachineConfig to the DPU nodes, therefore enabling switchdev mode on them:

    $ oc label node <NODENAME> node-role.kubernetes.io/dpu-worker=
    
  4. Create a Cluster Network Operator (CNO) ConfigMap in the infrastructure cluster setting the mode to DPU. Save the YAML in the ovnkubeconfigmap.yaml file:

    apiVersion: v1
    kind: ConfigMap
    metadata:
        name: dpu-mode-config
        namespace: openshift-network-operator
    data:
        mode: "dpu"
    immutable: true
    
    1. Run the following command:

      $ oc create -f ovnkubeconfigmap.yaml
      
  5. Label the DPU nodes in the infrastructure cluster where you want to enable hardware offloading.

    $ oc label node <NODENAME> network.operator.openshift.io/dpu=
    

Configuring support for hardware offloading in the tenant cluster

Configure support for hardware offloading in the tenant cluster by using this procedure.

  • Install the OpenShift CLI (oc).

  • Log in as a user with cluster-admin privileges.

  1. Create a MachineConfigPool for all the DPU workers. Save the YAML in the dputenantmachineconfig.yaml file:

    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfigPool
    metadata:
      name: dpu-host
    spec:
      machineConfigSelector:
        matchExpressions:
        - key: machineconfiguration.openshift.io/role
          operator: In
          values:
          - worker
          - dpu-host
      nodeSelector:
        matchLabels:
          node-role.kubernetes.io/dpu-host: ""
    
    1. Run the following command:

      $ oc create -f dputenantmachineconfig.yaml
      
  2. Label the DPU nodes:

    $ oc label node <NODENAME> node-role.kubernetes.io/dpu-host=
    
  3. Install and configure the SR-IOV Network Operator:

    This procedure describes how to install the SR-IOV Network Operator using the web console. However, if the console is not reachable due to the ingress being down follow the guidance on "CLI: Installing the SR-IOV Network Operator".

    1. In the OpenShift Container Platform web console, click AdministrationNamespaces.

    2. Click Create Namespace.

    3. In the Name field, enter openshift-sriov-network-operator, and click Create.

    4. In the OpenShift Container Platform web console, click OperatorsOperatorHub.

    5. Select SR-IOV Network Operator from the list of available Operators, and click Install.

    6. On the Install Operator page, under A specific namespace on the cluster, select openshift-sriov-network-operator.

    7. Click Install.

    8. Verify that the SR-IOV Network Operator is installed successfully:

      1. Navigate to the OperatorsInstalled Operators page.

      2. Ensure that SR-IOV Network Operator is listed in the openshift-sriov-network-operator project with a Status of InstallSucceeded.

        During installation an Operator might display a Failed status.
        If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.

        If the operator does not appear as installed, to troubleshoot further:

        • Inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.

        • Navigate to the WorkloadsPods page and check the logs for pods in the
          openshift-sriov-network-operator project.

  4. Add this machine config pool to the SriovNetworkPoolConfig custom resource.

    1. Create a file, such as sriov-pool-config.yaml, with the following content:

      apiVersion: sriovnetwork.openshift.io/v1
      kind: SriovNetworkPoolConfig
      metadata:
        name: default
        namespace: openshift-sriov-network-operator
      spec:
        ovsHardwareOffloadConfig:
          name: dpu-host 
      
      • The name here is the same as the machine config pool (MCP) name created in step 1.
    2. Apply the configuration:

      $ oc create -f sriov-pool-config.yaml
      

      After applying the sriov-pool-config.yaml the nodes reboot and you need to wait until MCP on the dpu-host is up to date again.

  5. Create a SriovNetworkNodePolicy to configure the virtual functions (VFs) on the hosts.

    1. Save the YAML in the SriovNetworkNodePolicy.yaml file:

      apiVersion: sriovnetwork.openshift.io/v1
      kind: SriovNetworkNodePolicy
      metadata:
        name: policy-mlnx-bf 
        namespace: openshift-sriov-network-operator
      spec:
        resourceName: mlnx_bf 
        nodeSelector:
          node-role.kubernetes.io/dpu-host: "" 
        priority: 99 
        numVfs: 4 
        nicSelector: 
          vendor: "15b3" 
          deviceId: "a2d6" 
          pfNames: ['ens1f0#1-3'] 
          rootDevices: ['0000:3b:00.0'] 
      
      • The name for the custom resource object.

      • The resource name of the SR-IOV network device plug-in. You can create multiple SR-IOV network node policies for a resource name.

      • The node selector specifies the nodes to configure. Ensure this is consistent with the nodeSelector of the MCP created in step 1.

      • Optional: The priority is an integer value between 0 and 99. A smaller value receives higher priority. For example, a priority of 10 is a higher priority than 99. The default value is 99.

      • The number of the virtual functions (VF) to create for the SR-IOV physical network device. For a Mellanox NIC, the number of VFs cannot be larger than 128.

      • The NIC selector identifies the device for the Operator to configure. You do not have to specify values for all the parameters. It is recommended to identify the network device with enough precision to avoid selecting a device unintentionally.

      • The vendor hexadecimal code of the SR-IOV network device. Vendor id 15b3 is for Mellanox devices.

      • The device hexadecimal code of the SR-IOV network device. For example, a2d6 is the device ID for a Bluefield-2 DPU device.

      • An array of one or more physical function (PF) names for the device. The setting ens1f0#1-3 in this example ensures 1 virtual function is reserved for the management port.

      • An array of one or more PCI bus addresses for the PF of the device. Provide the address in the following format: 0000:02:00.1.

    2. Create the SriovNetworkNodePolicy object:

      $ oc create -f SriovNetworkNodePolicy.yaml
      

      After applying SriovNetworkNodePolicy.yaml, the nodes reboot and you need to wait until the dpu-host machine config pools are up to date again.

  6. Optional: Follow these optional steps if virtual functions are not being created on the tenant cluster.

    1. Create the following Machine Config:

      $ cat <<EOF > realloc.yaml
      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        labels:
          machineconfiguration.openshift.io/role: dpu-host
        name: pci-realloc
      spec:
        config:
          ignition:
            version: 3.2.0
        kernelArguments:
            - pci=realloc
      
    2. Apply the Machine Config and wait until all the nodes are rebooted:

      $ oc create -f realloc.yaml
      
  7. Create a Cluster Network Operator (CNO) ConfigMap in the tenant cluster setting the mode to dpu-host.

    1. Save the YAML in the sriovdpuconfigmap.yaml file:

      apiVersion: v1
      kind: ConfigMap
      metadata:
          name: dpu-mode-config
          namespace: openshift-network-operator
      data:
          mode: "dpu-host"
      immutable: true
      
    2. Run the following command:

      $ oc create -f sriovdpuconfigmap.yaml
      
  8. Create a machine config to disable Open vSwitch (OVS).

    1. Create a YAML file for example disable-ovs.yaml:

      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        labels:
          machineconfiguration.openshift.io/role: dpu-host
        name: disable-ovs
      spec:
        config:
          ignition:
            version: 3.1.0
          systemd:
            units:
            - mask: true
              name: ovs-vswitchd.service
            - enabled: false
              name: ovs-configuration.service
      
    2. Add this machine config to the cluster by running the following command:

      $ oc create -f disable-ovs.yaml
      
  9. Set the environment variable OVNKUBE_NODE_MGMT_PORT_NETDEV for each DPU host.

    1. Save the YAML in the setenvovnkube.yaml file:

      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: env-overrides
        namespace: openshift-ovn-kubernetes
      data:
        x86-worker-node0: |
          OVNKUBE_NODE_MGMT_PORT_NETDEV=ens1f0v0 
      
      • ens1f0v0 is the virtual function (VF) name that is assigned to the ovnkube node management port on the host.
    2. Run the following command:

      $ oc create -f setenvovnkube.yaml
      
  10. Label the DPU nodes in the tenant cluster. Run the following command :

    $ oc label node <NODENAME> network.operator.openshift.io/dpu-host=
    

Adding worker nodes to the tenant cluster

Use this procedure to add worker nodes to the tenant cluster by generating a new discovery ISO.

  • Install the OpenShift CLI (oc).

  • Log in as a user with cluster-admin privileges.

  1. Navigate to the console.redhat.com/openshift.

  2. From the list of created clusters find your created tenant cluster.

  3. Select your tenant cluster.

  4. Click the Add hosts button and select the installation media.

    1. Select Minimal image file: Provision with virtual media to download a smaller image that will fetch the data needed to boot. The nodes must have virtual media capability. This is the recommended method.

    2. Select Full image file: Provision with physical media to download the larger full image.

    3. Select iPXE: Provision from your network server. Use this when you have an iPXE server that has already been setup.

  5. Add an SSH public key so that you can connect to the cluster nodes as the core user. Having a login to the cluster nodes can provide you with debugging information during the installation.

  6. Optional: If the cluster hosts are behind a firewall that requires the use of a proxy, select Configure cluster-wide proxy settings. Enter the username, password, IP address and port for the HTTP and HTTPS URLs of the proxy server.

  7. Optional: Configure cluster-wide trusted certificate if needed. If the cluster hosts are in a network with a re-encrypting (MITM) proxy or the cluster needs to trust certificates for other purposes (e.g. container image registries).

  8. Click Generate Discovery ISO.

  9. Download the discovery ISO.

  10. Boot the tenant cluster worker host(s) with the discovery image. See Booting hosts with the discovery image for additional details.

Disabling DNS, the Ingress Controller, and monitoring on the worker nodes

Apply the following commands to disable the DNS, Ingress Operator and monitoring on the tenant worker nodes.

  • Install the OpenShift CLI (oc).

  • Log in to infrastructure cluster as a user with cluster-admin privileges..

  1. Run the following command to patch the dns.operator/default and ensure that the DNS Operator is scheduled on a OpenShift Container Platform node with the label node-role.kubernetes.io/master:

    $ oc patch dns.operator/default --type=merge --patch='{"spec":{"nodePlacement":{"nodeSelector":{"node-role.kubernetes.io/master":""}}}}'
    
  2. Run the following command to patch the ingresscontroller/default and ensure that the Ingress Operator is scheduled on a OpenShift Container Platform node with the label node-role.kubernetes.io/master:

    $ oc patch ingresscontroller/default --type=merge --patch='{"spec":{"nodePlacement":{"nodeSelector":{"matchLabels":{"node-role.kubernetes.io/master":""}}}}}' -n openshift-ingress-operator
    
  3. Disable monitoring on the worker nodes by following these steps:

    1. Create the following YAML and save it in a file named monitor-patch-cm.yaml:

      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: cluster-monitoring-config
        namespace: openshift-monitoring
      data:
        config.yaml: |
          prometheusOperator:
            nodeSelector:
              node-role.kubernetes.io/master: ""
          prometheusK8s:
            nodeSelector:
              node-role.kubernetes.io/master: ""
          alertmanagerMain:
            nodeSelector:
              node-role.kubernetes.io/master: ""
          kubeStateMetrics:
            nodeSelector:
              node-role.kubernetes.io/master: ""
          grafana:
            nodeSelector:
              node-role.kubernetes.io/master: ""
          telemeterClient:
            nodeSelector:
              node-role.kubernetes.io/master: ""
          k8sPrometheusAdapter:
            nodeSelector:
              node-role.kubernetes.io/master: ""
          openshiftStateMetrics:
            nodeSelector:
             node-role.kubernetes.io/master: ""
          thanosQuerier:
            nodeSelector:
              node-role.kubernetes.io/master: ""
      
    2. Run the following command:

      $ oc create -f monitor-patch-cm.yaml
      

Creating a subscription to the Node Maintenance Operator in the tenant cluster

Create a subscription to the Node Maintenance Operator following this procedure.

  • Install the OpenShift CLI (oc).

  • Log in as a user with cluster-admin privileges.

  1. Create a Subscription CR:

    1. Define the Subscription CR and save the YAML file, for example, node-maintenance-subscription.yaml:

      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: node-maintenance-operator
        namespace: openshift-operators
      spec:
        channel: stable
        InstallPlaneApproval: Automatic
        name: node-maintenance-operator
        source: redhat-operators
        sourceNamespace: openshift-marketplace
        StartingCSV: node-maintenance-operator.v4.12.0
      
    2. To create the Subscription CR, run the following command:

      $ oc create -f node-maintenance-subscription.yaml
      
  1. Verify that the installation succeeded by inspecting the CSV resource:

    $ oc get csv -n openshift-operators
    

    Example output

    NAME                               DISPLAY                     VERSION   REPLACES  PHASE
    node-maintenance-operator.v4.12    Node Maintenance Operator   4.12                Succeeded
    
  2. Verify that the Node Maintenance Operator is running:

    $ oc get deploy -n openshift-operators
    

    Example output

    NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
    node-maintenance-operator-controller-manager   1/1     1            1           10d
    

Final infrastructure cluster configuration

Run the following steps against the infrastructure cluster.

  1. Add the kubeconfig of the tenant cluster as a secret.

    $ oc create secret generic tenant-cluster-1-kubeconf -n tenantcluster-dpu --from-file=config=/root/kubeconfig.tenant
    
  2. Add the per node configuration override for the ovnkube-node by listing all the DPU nodes under :data:

    kind: ConfigMap
    apiVersion: v1
    metadata:
      name: env-overrides
      namespace: tenantcluster-dpu
    data:
      worker-bf: |
        TENANT_K8S_NODE=x86-worker-node0
        DPU_IP=192.168.111.29
        MGMT_IFNAME=pf0vf0
    
  3. Update the OvnkubeConfig CR by adding the kubeConfigFile field. Wait until the ovnkube pods are created.

    apiVersion: dpu.openshift.io/v1alpha1
    kind: OVNKubeConfig
    metadata:
      name: ovnkubeconfig-sample
    spec:
      kubeConfigFile: tenant-cluster-1-kubeconf
      poolName: dpu
      nodeSelector:
        matchLabels:
          node-role.kubernetes.io/dpu-worker: ''
    

In the DPU Network Operator there are several status messages that can be used to verify the Operator’s working status:

  • McpReady indicates that the MachineConfig and MachineConfigPool are ready.

  • TenantObjsSynced indicates that the tenant ConfigMap and secrets are synced to the infrastructure nodes.

  • OvnKubeReady indicates that the ovnkube-node DaemonSet is ready.

Modify log rotation of CRI-O

The BlueField-2 DPU has 16G eMMC, and when OpenShift Container Platform is running over time the logs can fill up the disc. In OpenShift Container Platform, the rotation is carried out by default at approximately 50M and 5 files. The pod log file size and number are managed by the kubelet on each node which then passes the setting on to CRI-O. The default settings currently used in OpenShift Container Platform are as follows:

--container-log-max-size 50Mi (default 50Mi)
--container-log-max-files 5 (default 5)

The current kubelet configuration used by a node is located at /etc/kubernetes/kubelet.conf. The default parameters are overwritten from CRI-O:

containerLogMaxSize: 50Mi
containerLogMaxFiles: 5

Pod/container logs are already rotated automatically by default. The kubelet.conf file is managed by the machine-config operator so additional steps are necessary.

Limit the container logs on the infrastructure cluster worker nodes following this procedure.

  1. Obtain the MachineConfig for the kubelet:

    oc get machineconfig | grep -i kubelet
    

    Example output

    01-master-kubelet                                       46327eb10a201b48be7fb3da96a2ffe322539af8   3.2.0             47m
    01-worker-kubelet                                       46327eb10a201b48be7fb3da96a2ffe322539af8   3.2.0             47m
    
  2. Set up a custom kubelet for the worker nodes (nodes with DPUs) on the infrastructure cluster.

    1. Label the worker machineconfigpool with custom-kubelet custom tag of logrotation:

      $ oc label machineconfigpool worker custom-kubelet=logrotation
      
    2. Create a YAML file logrotation.yaml defining a kubeletconfig custom resource (CR) for your configuration change:

      apiVersion: machineconfiguration.openshift.io/v1
      kind: KubeletConfig
      metadata:
        name: cr-logrotation
      spec:
        machineConfigPoolSelector:
          matchLabels:
            custom-kubelet: logrotation 
        kubeletConfig:
          containerLogMaxFiles: 3 
          containerLogMaxSize: 10Mi 
      
      • Set this value to the same as the custom-kubelet value configured in step 2a.

      • Set the desired maximum number (for example 3) of container log files that can be present for a container.

      • Set the desired maximum size (for example 10Mi) of container log file before it is rotated.

    3. Create the CR object:

      $ oc create -f logrotation.yaml
      
  1. Verify the kubeletconfig:

    $ oc get kubeletconfig
    
  2. Verify the kubeletconfig:

    $ oc get kubeletconfig -o yaml
    
  3. Verify that three machineconfigs now exist:

    $ oc get machineconfig | grep -i kubelet
    

    Example output

    01-master-kubelet                                       46327eb10a201b48be7fb3da96a2ffe322539af8   3.2.0             74m
    01-worker-kubelet                                       46327eb10a201b48be7fb3da96a2ffe322539af8   3.2.0             74m
    99-worker-generated-kubelet                             46327eb10a201b48be7fb3da96a2ffe322539af8   3.2.0             11m
    
  4. Enter into a debug session on one of the worker nodes. This step instantiates a debug pod called <node_name>-debug:

    $ oc debug node/<node_name>
    
  5. Set /host as the root directory within the debug shell. The debug pod mounts the host’s root file system in /host within the pod. By changing the root directory to /host, you can run binaries contained in the host’s executable paths:

    # chroot /host
    
  6. Verify the settings in kubelet.conf:

    # grep -e containerLogMaxSize -e containerLogMaxFiles /etc/kubernetes/kubelet.conf
    

    Example output

     "containerLogMaxSize": "10Mi",
     "containerLogMaxFiles": 3,
    

Comments