OVN/OVS offloading with OpenShift on NVIDIA BlueField-2 DPUs

About OVN/OVS offloading with OpenShift on NVIDIA BlueField-2 DPUs
Installing and configuring an accelerated infrastructure with OpenShift and DPUs
OpenShift and DPU deployment architecture
Installing the infrastructure cluster
Generating the discovery ISOs with the Assisted Installer for the infrastructure nodes
Download and extract the discovery ISO using ansible
Disabling DNS, the Ingress Controller, and monitoring on the worker nodes
Installing the tenant cluster (Control plane nodes only)
Generating the discovery ISO with the Assisted Installer
Installing the BlueField-2 DPU

Installing and enabling offloading with the DPU Network Operator
Creating a dedicated namespace
Configuring support for hardware offloading in the infrastructure cluster
Configuring support for hardware offloading in the tenant cluster
Adding worker nodes to the tenant cluster
Disabling DNS, the Ingress Controller, and monitoring on the worker nodes
Creating a subscription to the Node Maintenance Operator in the tenant cluster
Final infrastructure cluster configuration
Modify log rotation of CRI-O

OpenShift Container Platform installed on DPUs facilitates OVN/OVS offloading.

Note
OVN/OVS offloading with OpenShift on NVIDIA BlueField-2 DPUs is a Developer Preview feature in OpenShift Container Platform 4.10 only. It is not available on previous versions of OpenShift Container Platform.

About Developer Preview features
Developer Preview features are not supported with Red Hat production service level agreements (SLAs) and are not functionally complete. Red Hat does not advise using them in a production setting. Developer Preview features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process. These releases may not have any documentation, and testing is limited. Red Hat may provide ways to submit feedback on Developer Preview releases without an associated SLA.

The features described in this document are for Developer Preview purposes and are not supported by Red Hat at this time.

About OVN/OVS offloading with OpenShift on NVIDIA BlueField-2 DPUs

Learn how to accelerate and offload software subsystems using OpenShift Container Platform and the Data Processing Unit (DPU). DPUs are a class of reprogrammable high-performance processors combined with high-performance network interfaces optimized to perform and accelerate network and storage functions carried out by data center servers.
The Data Processing Unit (DPU) is a complete compute system with an independent software stack, network identity, and provisioning capabilities. The DPU is fully capable of hosting its own applications using either embedded or orchestrated deployment models.

The unique capabilities of the DPU are disruptive because they allow for key infrastructure functions and their associated software stacks, to be completely removed from the host node’s CPU cores and to be relocated onto the DPU.

Installing and configuring an accelerated infrastructure with OpenShift and DPUs

Installing OpenShift Container Platform on a DPU makes it possible to offload packet processing from the host x86 to the DPU. Offloading resource-intensive computational tasks, such as packet processing, from the server’s CPU to the DPU frees up cycles cycles on the OpenShift Container Platform worker nodes to run the applications.

OpenShift and DPU deployment architecture

The proposed deployment architecture is a two-cluster design. In this architecture, DPU cards are provisioned as worker nodes of the ARM-based infrastructure cluster. The tenant cluster, composed of the x86 servers, is where the normal user applications run. The following diagram illustrates the deployment architecture.

The installation of this infrastructure cluster can be on ARM hardware or a mixed environment of x86 control plane and ARM worker nodes.

OpenShift Container Platform on DPU

The steps involved in deploying and configuring are:

Install the infrastructure cluster by using the assisted installer.
1. Install 3 control plane nodes and at least 2 DPU worker nodes.
Install the control plane nodes of the tenant cluster by using the assisted installer.
Install the DPU Network Operator on the infrastructure cluster.
1. Partially configure the DPU Network Operator effectively making it aware of the attached DPU.
Configure support for hardware offloading in the infrastructure cluster.
Configure support for hardware offloading in the tenant cluster.
1. Enable the SR-IOV Network Operator on the hosts with DPU mode.
Add worker nodes to the tenant cluster.
Final infrastructure configuration.

Installing the infrastructure cluster

Use the Assisted Installer to install OpenShift Container Platform on the infrastructure cluster running on multi architecture with the control plane nodes running on x86_64 and the worker nodes running on ARM architecture. The infrastructure cluster both the control plane and worker nodes can run on ARM architecture.

The infrastructure cluster should use the DPU’s uplink network as the primary network and not the out-of-band management network.

Shared VLAN: The Assisted Installer requires all nodes to be present on the same VLAN. This is a requirement of utilizing a virtual IP for both ingress and API.
DHCP: Much like an installer-provisioned infrastructure (IPI) installation DHCP is required in the VLAN.
DNS: Records for ingress and API are required for accessing the cluster.

Generating the discovery ISOs with the Assisted Installer for the infrastructure nodes

Installing OpenShift Container Platform infrastructure cluster (3 Control plane nodes and 3 worker nodes) requires the creation of two discovery ISOs. You can use the Assisted Installer UI or the API to create discovery ISOs for both x86_64 and arm64.

See the Assisted Installer for OpenShift Container Platform documentation for details on using the Assisted Installer.

When installing using the UI and setting the cluster details in the dropdown menu associated with the OpenShift version select an OpenShift Container Platform version with the -multi extension for example OpenShift Container Platform 4.12.15-multi. Follow the installation instructions, and for the control plane nodes, choose x86_64 as the CPU architecture from the dropdown menu, and proceed to download the discovery ISO. For the worker nodes, choose 'arm64' as the CPU architecture and follow the instructions to download a second discovery ISO.

A summary of the steps relevant to installing a multi architecture using the Assisted Installer API are as follows:

Register a new cluster setting the openshift_version to install OpenShift Container Platform 4.12.15-multi.
Register a new infrastructure environment setting the cpu_architecture to x86_64 and proceed to download the discovery ISO. For the worker nodes, set arm64 as the cpu_architecture and follow the instructions to download a second discovery ISO.

You need to create two infrastructure environments.

After completing this stage you will have two discovery ISOs. Boot the x86_64 control plane nodes following the standard boot procedure appropriate for your environment described in Booting hosts with the discovery image for additional details.

Download and extract the discovery ISO using ansible

The DPU network card does not currently allow you to install using virtual media so you need to use the PXE boot method. An ansible repo is available to ensure the download and extraction of the discovery ISO needed to PXE boot these additional hosts is easy to perform. Use the playbooks in this repo to:

Add worker nodes to the existing infrastructure cluster.
Set up the PXE services putting the files in the correct location to boot the added hosts.

The Ansible Assisted Installer Playbooks includes two different roles:

setup-pxe: This role installs the basic services required for PXE booting namely:
- HTTP server
- TFTP server
- Optional: DHCP server. Add the configuration for DHCP to the playbook.yaml to install the DHCP service.
  
  DHCP is optional as you may already have a DHCP server.
download-iso-pxe: This role is for Assisted Installer day2 deployments. Giving an existing cluster it adds the worker node using the API. This role also transforms the cluster to Day2, downloads and extracts the ISO into HTTP and TFTP folders. In addition it generates the grub.cfg.

This procedure describes how to download and install the discovery ISO. Hosts have already been added to the existing cluster.

You have ansible version 2.9 or greater installed.
You have installed jq.See the jq Manual for detailed information about using jq.

Clone the Ansible Assisted Installer Playbooks repo.

Clone the repo on the installer node or on your laptop (that has access to the installer node).
```
$ git clone https://github.com/rh-ecosystem-edge/ansible-ai-playbook.git
```
Edit the playbook.yaml.
```
---
- hosts: all
  roles:
    - setup-pxe 
    - download-iso-pxe
  vars:
    ARCH: arm
    URL: http://<URL>:<PORT> 
    CLUSTER_ID: "<CLUSTER UUID>" 
```
- This field sets up the basic services required for PXE booting
- Enter the Assisted Installer URL and port number
- Enter the cluster UUID from the Assisted Installer UI.
The option exists to download the ISO manually and extract what is needed to support PXE booting the hosts. To do this add two variables to the playbook.yaml. The variables are:
- ISO_NAME: "discovery_image.iso" - This variable is the name of the downloaded ISO.
- WORKDIR: "/tmp/download-iso-pxe"- This variable specifies the directory where the ISO is downloaded to.
Download and extract the discovery ISO by using the following command.
```
$ ansible-playbook -i localhost, playbook.yaml --tags extract,download
```
Use localhost if running the command on the installer node. Replace localhost with the FQDN of the installer node if running the command on your laptop. By default the name of the downloaded ISO is discovery_image.iso and it is downloaded to /tmp/download-iso-pxe.
Verify the discovery image exists in /tmp/download-iso-pxe.
1. List the contents of the download directory.
```
$ ll /tmp/download-iso-pxe
```
  Expected output
```
discovery_image.iso
```

Verify the correct files are extracted to the correct locations.

Verify the ignition file and rootfs image exists in /var/www/html/pxe.

$ ll /var/www/html/pxe

Expected output

total 839720
-rw-r--r--. 1 apache apache     10484 Mar 23 06:15 config.ign
-rw-r--r--. 1 apache apache 859858944 Mar 23 06:15 rootfs.img

Verify the grub.cfg and the files needed to boot the VM exist in /var/lib/tftpboot.

$ ll /var/lib/tftpboot

Expected output

total 89956
-rw-r--r--. 1 root root   857984 Jan 28 01:13 BOOTAA64.EFI
-rw-r--r--. 1 root root  2446280 Jan 28 01:13 grubaa64.efi
-rw-r--r--. 1 root root      567 Mar 23 06:16 grub.cfg
-rw-r--r--. 1 root root 79229244 Mar 23 06:06 initrd.img
-rw-r--r--. 1 root root  9565104 Mar 23 06:06 vmlinuz

At this stage you have all the files required to PXE boot the DPU hosts.

Disabling DNS, the Ingress Controller, and monitoring on the worker nodes

Apply the following commands to disable the DNS, Ingress Operator and monitoring on the infrastructure worker nodes.

Install the OpenShift CLI (oc).
Log in to infrastructure cluster as a user with cluster-admin privileges..

Run the following command to patch the dns.operator/default and ensure that the DNS Operator is scheduled on a OpenShift Container Platform node with the label node-role.kubernetes.io/master:
```
$ oc patch dns.operator/default --type=merge --patch='{"spec":{"nodePlacement":{"nodeSelector":{"node-role.kubernetes.io/master":""}}}}'
```
Run the following command to patch the ingresscontroller/default and ensure that the Ingress Operator is scheduled on a OpenShift Container Platform node with the label node-role.kubernetes.io/master:
```
$ oc patch ingresscontroller/default --type=merge --patch='{"spec":{"nodePlacement":{"nodeSelector":{"matchLabels":{"node-role.kubernetes.io/master":""}}}}}' -n openshift-ingress-operator
```

Disable monitoring on the worker nodes by following these steps:

Create the following YAML and save it in file named monitor-patch-cm.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusOperator:
      nodeSelector:
        node-role.kubernetes.io/master: ""
    prometheusK8s:
      nodeSelector:
        node-role.kubernetes.io/master: ""
    alertmanagerMain:
      nodeSelector:
        node-role.kubernetes.io/master: ""
    kubeStateMetrics:
      nodeSelector:
        node-role.kubernetes.io/master: ""
    grafana:
      nodeSelector:
        node-role.kubernetes.io/master: ""
    telemeterClient:
      nodeSelector:
        node-role.kubernetes.io/master: ""
    k8sPrometheusAdapter:
      nodeSelector:
        node-role.kubernetes.io/master: ""
    openshiftStateMetrics:
      nodeSelector:
       node-role.kubernetes.io/master: ""
    thanosQuerier:
      nodeSelector:
        node-role.kubernetes.io/master: ""

Run the following command:
```
$ oc create -f monitor-patch-cm.yaml
```

Installing the tenant cluster (Control plane nodes only)

Use the Assisted Installer to install the tenant cluster.

Shared VLAN: The Assisted Installer requires all nodes to be present on the same VLAN. This is a requirement of utilizing a virtual IP for both ingress and API.
DHCP: Much like an IPI installation DHCP is required in the VLAN.
DNS: Records for ingress and API are required for accessing the cluster.

Generating the discovery ISO with the Assisted Installer

Installing OpenShift Container Platform x86_64 tenant cluster (3 Control plane nodes and 3 workers nodes) requires a discovery ISO, which the Assisted Installer (AI) can generate with the cluster name, base domain, Secure Shell (SSH) public key, and pull secret.

In the initial install you are only installing the control plane nodes.

Use the Assisted Installer to create a discovery ISO for the x86_64 tenant cluster. See the Assisted Installer for OpenShift Container Platform documentation for details on using the Assisted Installer.

Once you have downloaded the discovery ISO boot the tenant cluster control plane nodes with the discovery image. See Booting hosts with the discovery image for additional details.

Installing the BlueField-2 DPU

The general steps to follow for installing the BlueField-2 DPU are:

Ensure that your server hardware is compatible with the BlueField-2 DPU. You can check the compatibility list on the NVIDIA website.
Install the BlueField-2 DPU onto your server hardware. The DPU will need to be installed in a PCI Express (PCIe) slot on your server.
Connect the BlueField-2 DPU to your server’s power supply and network interface.
Install the BlueField-2 DPU software on your server. The software includes the BlueField-2 driver, firmware, and management software.
Configure the BlueField-2 DPU using the management software. You can use the BlueField-2 Manager to configure and manage the DPU.
Verify that the BlueField-2 DPU is installed and working correctly. You can use the BlueField-2 Manager to check the status of the DPU.

These are general steps, and the specific installation process might change depending on your server hardware and software environment. Consult the installation guide for the BlueField-2 DPU for detailed instructions.

Installing and enabling offloading with the DPU Network Operator

The DPU Network Operator on the infrastructure cluster side is responsible for the life-cycle management of the ovn-kube components and the necessary host network initialization on DPU cards.

An infrastructure cluster composed of x86 control plane and ARM worker nodes are up and running.
An tenant cluster composed of x86 control plane nodes.
DPU cards installed on the worker nodes of the infrastructure cluster where hardware offloading need to be enabled.
Pods in the infrastructure cluster can reach the API server of the tenant cluster
Network configuration:
- The infrastructure cluster and the tenant cluster share the same VLAN as the API network.
- The two clusters shall use different VIPs as the cluster API VIP.
- The DHCP server needs to be configured to be able to assign IP addresses to hosts from both clusters from the same subnet.
- The DNS server shall be able to resolve the URI to the API URL for both the clusters.

As a cluster administrator, you can install the Operator using the CLI.

Install the OpenShift CLI (oc).
Log in to infrastructure cluster as a user with cluster-admin privileges.

Create a namespace for the DPU Network Operator by completing the following actions:
1. Create the following Namespace Custom Resource (CR) that defines the openshift-dpu-network-operator namespace, and then save the YAML in the dpuno-namespace.yaml file
```
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-dpu-network-operator
  annotations:
    workload.openshift.io/allowed: management
```
2. Create the namespace by running the following command:
```
$ oc create -f dpuno-namespace.yaml
```
Install the DPU Network Operator in the namespace you created in the previous step by creating the following objects:
1. Create the following OperatorGroup CR and save the YAML in the dpuno-operatorgroup.yaml file:
```
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: dpu-network-operators
  namespace: openshift-dpu-network-operator
```
2. Create the OperatorGroup CR by running the following command:
```
$ oc create -f dpuno-operatorgroup.yaml
```

Create the following Subscription CR and save the YAML in the dpu-sub.yaml file:

Example Subscription

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: openshift-dpu-operator-subscription
  namespace: openshift-dpu-network-operator
spec:
  channel: "stable"
  name: dpu-network-operator
  source: redhat-operators 
  sourceNamespace: openshift-marketplace

You must specify the redhat-operators value.

Create the Subscription object by running the following command:
```
$ oc create -f dpu-sub.yaml
```
Change to the openshift-dpu-network-operator project:
```
$ oc project openshift-dpu-network-operator
```

Verify the DPU Network Operator is running:

$ oc get pods -n openshift-dpu-network-operator

Example output

NAME                                                      READY   STATUS    RESTARTS   AGE
dpu-network-operator-controller-manager-cc9ccc4bd-9vqcg   2/2     Running   0          62s

Creating a dedicated namespace

You need to install the OVNKubeConfig custom resource (CR), the ovnkube-node overrides config map (CM) and the secret containing the tenant kubeconfig into the same namespace. Follow this procedure to create this namespace.

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.

Create a YAML file named for example dpu_namespace.yaml that contains the following YAML:

apiVersion: v1
kind: Namespace
metadata:
  labels:
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/audit: privileged
    pod-security.kubernetes.io/warn: privileged
    security.openshift.io/scc.podSecurityLabelSync: "false"
    openshift.io/run-level: "0"
  name: tenantcluster-dpu

Create the namespace by running the following command:
```
$ oc create -f dpu_namespace.yaml
```

Configuring support for hardware offloading in the infrastructure cluster

Configure support for hardware offloading in the infrastructure cluster by using this procedure.

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.

Create the following OVNKubeConfig custom resource (CR) with the poolName dpu, leaving the kubeConfigFile blank. Save the YAML in the ovnkubeconfig.yaml file:
```
apiVersion: dpu.openshift.io/v1alpha1
kind: OVNKubeConfig
metadata:
  name: ovnkubeconfig-sample
  namespace: tenantcluster-dpu
spec:
  poolName: dpu
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/dpu-worker: ""
```
1. Run the following command:
```
$ oc create -f ovnkubeconfig.yaml
```
  The DPU Network Operator creates a custom MachineConfigPool and a custom MachineConfig.
Get the node names by using the following command:
```
$ oc get nodes
```
Label the DPU nodes in the infrastructure cluster. The Machine Config Operator applies the new MachineConfig to the DPU nodes, therefore enabling switchdev mode on them:
```
$ oc label node <NODENAME> node-role.kubernetes.io/dpu-worker=
```

Create a Cluster Network Operator (CNO) ConfigMap in the infrastructure cluster setting the mode to DPU. Save the YAML in the ovnkubeconfigmap.yaml file:

apiVersion: v1
kind: ConfigMap
metadata:
    name: dpu-mode-config
    namespace: openshift-network-operator
data:
    mode: "dpu"
immutable: true

Run the following command:
```
$ oc create -f ovnkubeconfigmap.yaml
```

Label the DPU nodes in the infrastructure cluster where you want to enable hardware offloading.
```
$ oc label node <NODENAME> network.operator.openshift.io/dpu=
```

Configuring support for hardware offloading in the tenant cluster

Configure support for hardware offloading in the tenant cluster by using this procedure.

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.

Create a MachineConfigPool for all the DPU workers. Save the YAML in the dputenantmachineconfig.yaml file:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: dpu-host
spec:
  machineConfigSelector:
    matchExpressions:
    - key: machineconfiguration.openshift.io/role
      operator: In
      values:
      - worker
      - dpu-host
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/dpu-host: ""

Run the following command:

$ oc create -f dputenantmachineconfig.yaml

Label the DPU nodes:

$ oc label node <NODENAME> node-role.kubernetes.io/dpu-host=

Install and configure the SR-IOV Network Operator:

This procedure describes how to install the SR-IOV Network Operator using the web console. However, if the console is not reachable due to the ingress being down follow the guidance on "CLI: Installing the SR-IOV Network Operator".
1. In the OpenShift Container Platform web console, click Administration → Namespaces.
2. Click Create Namespace.
3. In the Name field, enter openshift-sriov-network-operator, and click Create.
4. In the OpenShift Container Platform web console, click Operators → OperatorHub.
5. Select SR-IOV Network Operator from the list of available Operators, and click Install.
6. On the Install Operator page, under A specific namespace on the cluster, select openshift-sriov-network-operator.
7. Click Install.
8. Verify that the SR-IOV Network Operator is installed successfully:
  1. Navigate to the Operators → Installed Operators page.
  2. Ensure that SR-IOV Network Operator is listed in the openshift-sriov-network-operator project with a Status of InstallSucceeded.
    
    During installation an Operator might display a Failed status.
    If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.
    
    If the operator does not appear as installed, to troubleshoot further:
    - Inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.
    - Navigate to the Workloads → Pods page and check the logs for pods in the
      openshift-sriov-network-operator project.
Add this machine config pool to the SriovNetworkPoolConfig custom resource.
1. Create a file, such as sriov-pool-config.yaml, with the following content:
```
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkPoolConfig
metadata:
  name: default
  namespace: openshift-sriov-network-operator
spec:
  ovsHardwareOffloadConfig:
    name: dpu-host 
```
  - The name here is the same as the machine config pool (MCP) name created in step 1.
2. Apply the configuration:
```
$ oc create -f sriov-pool-config.yaml
```
  After applying the sriov-pool-config.yaml the nodes reboot and you need to wait until MCP on the dpu-host is up to date again.
Create a SriovNetworkNodePolicy to configure the virtual functions (VFs) on the hosts.
1. Save the YAML in the SriovNetworkNodePolicy.yaml file:
```
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-mlnx-bf 
  namespace: openshift-sriov-network-operator
spec:
  resourceName: mlnx_bf 
  nodeSelector:
    node-role.kubernetes.io/dpu-host: "" 
  priority: 99 
  numVfs: 4 
  nicSelector: 
    vendor: "15b3" 
    deviceId: "a2d6" 
    pfNames: ['ens1f0#1-3'] 
    rootDevices: ['0000:3b:00.0'] 
```
  - The name for the custom resource object.
  - The resource name of the SR-IOV network device plug-in. You can create multiple SR-IOV network node policies for a resource name.
  - The node selector specifies the nodes to configure. Ensure this is consistent with the nodeSelector of the MCP created in step 1.
  - Optional: The priority is an integer value between 0 and 99. A smaller value receives higher priority. For example, a priority of 10 is a higher priority than 99. The default value is 99.
  - The number of the virtual functions (VF) to create for the SR-IOV physical network device. For a Mellanox NIC, the number of VFs cannot be larger than 128.
  - The NIC selector identifies the device for the Operator to configure. You do not have to specify values for all the parameters. It is recommended to identify the network device with enough precision to avoid selecting a device unintentionally.
  - The vendor hexadecimal code of the SR-IOV network device. Vendor id 15b3 is for Mellanox devices.
  - The device hexadecimal code of the SR-IOV network device. For example, a2d6 is the device ID for a Bluefield-2 DPU device.
  - An array of one or more physical function (PF) names for the device. The setting ens1f0#1-3 in this example ensures 1 virtual function is reserved for the management port.
  - An array of one or more PCI bus addresses for the PF of the device. Provide the address in the following format: 0000:02:00.1.
2. Create the SriovNetworkNodePolicy object:
```
$ oc create -f SriovNetworkNodePolicy.yaml
```
  After applying SriovNetworkNodePolicy.yaml, the nodes reboot and you need to wait until the dpu-host machine config pools are up to date again.

Optional: Follow these optional steps if virtual functions are not being created on the tenant cluster.

Create the following Machine Config:

$ cat <<EOF > realloc.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: dpu-host
  name: pci-realloc
spec:
  config:
    ignition:
      version: 3.2.0
  kernelArguments:
      - pci=realloc

Apply the Machine Config and wait until all the nodes are rebooted:
```
$ oc create -f realloc.yaml
```

Create a Cluster Network Operator (CNO) ConfigMap in the tenant cluster setting the mode to dpu-host.

Save the YAML in the sriovdpuconfigmap.yaml file:

apiVersion: v1
kind: ConfigMap
metadata:
    name: dpu-mode-config
    namespace: openshift-network-operator
data:
    mode: "dpu-host"
immutable: true

Run the following command:
```
$ oc create -f sriovdpuconfigmap.yaml
```

Create a machine config to disable Open vSwitch (OVS).

Create a YAML file for example disable-ovs.yaml:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: dpu-host
  name: disable-ovs
spec:
  config:
    ignition:
      version: 3.1.0
    systemd:
      units:
      - mask: true
        name: ovs-vswitchd.service
      - enabled: false
        name: ovs-configuration.service

Add this machine config to the cluster by running the following command:
```
$ oc create -f disable-ovs.yaml
```

Set the environment variable OVNKUBE_NODE_MGMT_PORT_NETDEV for each DPU host.
1. Save the YAML in the setenvovnkube.yaml file:
```
apiVersion: v1
kind: ConfigMap
metadata:
  name: env-overrides
  namespace: openshift-ovn-kubernetes
data:
  x86-worker-node0: |
    OVNKUBE_NODE_MGMT_PORT_NETDEV=ens1f0v0 
```
  - ens1f0v0 is the virtual function (VF) name that is assigned to the ovnkube node management port on the host.
2. Run the following command:
```
$ oc create -f setenvovnkube.yaml
```
Label the DPU nodes in the tenant cluster. Run the following command :
```
$ oc label node <NODENAME> network.operator.openshift.io/dpu-host=
```

Adding worker nodes to the tenant cluster

Use this procedure to add worker nodes to the tenant cluster by generating a new discovery ISO.

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.

Navigate to the console.redhat.com/openshift.
From the list of created clusters find your created tenant cluster.
Select your tenant cluster.
Click the Add hosts button and select the installation media.
1. Select Minimal image file: Provision with virtual media to download a smaller image that will fetch the data needed to boot. The nodes must have virtual media capability. This is the recommended method.
2. Select Full image file: Provision with physical media to download the larger full image.
3. Select iPXE: Provision from your network server. Use this when you have an iPXE server that has already been setup.
Add an SSH public key so that you can connect to the cluster nodes as the core user. Having a login to the cluster nodes can provide you with debugging information during the installation.
Optional: If the cluster hosts are behind a firewall that requires the use of a proxy, select Configure cluster-wide proxy settings. Enter the username, password, IP address and port for the HTTP and HTTPS URLs of the proxy server.
Optional: Configure cluster-wide trusted certificate if needed. If the cluster hosts are in a network with a re-encrypting (MITM) proxy or the cluster needs to trust certificates for other purposes (e.g. container image registries).
Click Generate Discovery ISO.
Download the discovery ISO.
Boot the tenant cluster worker host(s) with the discovery image. See Booting hosts with the discovery image for additional details.

Disabling DNS, the Ingress Controller, and monitoring on the worker nodes

Apply the following commands to disable the DNS, Ingress Operator and monitoring on the tenant worker nodes.

Install the OpenShift CLI (oc).
Log in to infrastructure cluster as a user with cluster-admin privileges..

Run the following command to patch the dns.operator/default and ensure that the DNS Operator is scheduled on a OpenShift Container Platform node with the label node-role.kubernetes.io/master:
```
$ oc patch dns.operator/default --type=merge --patch='{"spec":{"nodePlacement":{"nodeSelector":{"node-role.kubernetes.io/master":""}}}}'
```
Run the following command to patch the ingresscontroller/default and ensure that the Ingress Operator is scheduled on a OpenShift Container Platform node with the label node-role.kubernetes.io/master:
```
$ oc patch ingresscontroller/default --type=merge --patch='{"spec":{"nodePlacement":{"nodeSelector":{"matchLabels":{"node-role.kubernetes.io/master":""}}}}}' -n openshift-ingress-operator
```

Disable monitoring on the worker nodes by following these steps:

Create the following YAML and save it in a file named monitor-patch-cm.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusOperator:
      nodeSelector:
        node-role.kubernetes.io/master: ""
    prometheusK8s:
      nodeSelector:
        node-role.kubernetes.io/master: ""
    alertmanagerMain:
      nodeSelector:
        node-role.kubernetes.io/master: ""
    kubeStateMetrics:
      nodeSelector:
        node-role.kubernetes.io/master: ""
    grafana:
      nodeSelector:
        node-role.kubernetes.io/master: ""
    telemeterClient:
      nodeSelector:
        node-role.kubernetes.io/master: ""
    k8sPrometheusAdapter:
      nodeSelector:
        node-role.kubernetes.io/master: ""
    openshiftStateMetrics:
      nodeSelector:
       node-role.kubernetes.io/master: ""
    thanosQuerier:
      nodeSelector:
        node-role.kubernetes.io/master: ""

Run the following command:
```
$ oc create -f monitor-patch-cm.yaml
```

Creating a subscription to the Node Maintenance Operator in the tenant cluster

Create a subscription to the Node Maintenance Operator following this procedure.

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.

Create a Subscription CR:

Define the Subscription CR and save the YAML file, for example, node-maintenance-subscription.yaml:

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: node-maintenance-operator
  namespace: openshift-operators
spec:
  channel: stable
  InstallPlaneApproval: Automatic
  name: node-maintenance-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  StartingCSV: node-maintenance-operator.v4.12.0

To create the Subscription CR, run the following command:
```
$ oc create -f node-maintenance-subscription.yaml
```

Verify that the installation succeeded by inspecting the CSV resource:

$ oc get csv -n openshift-operators

Example output

NAME                               DISPLAY                     VERSION   REPLACES  PHASE
node-maintenance-operator.v4.12    Node Maintenance Operator   4.12                Succeeded

Verify that the Node Maintenance Operator is running:

$ oc get deploy -n openshift-operators

Example output

NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
node-maintenance-operator-controller-manager   1/1     1            1           10d

Final infrastructure cluster configuration

Run the following steps against the infrastructure cluster.

Add the kubeconfig of the tenant cluster as a secret.

$ oc create secret generic tenant-cluster-1-kubeconf -n tenantcluster-dpu --from-file=config=/root/kubeconfig.tenant

Add the per node configuration override for the ovnkube-node by listing all the DPU nodes under :data:

kind: ConfigMap
apiVersion: v1
metadata:
  name: env-overrides
  namespace: tenantcluster-dpu
data:
  worker-bf: |
    TENANT_K8S_NODE=x86-worker-node0
    DPU_IP=192.168.111.29
    MGMT_IFNAME=pf0vf0

Update the OvnkubeConfig CR by adding the kubeConfigFile field. Wait until the ovnkube pods are created.

apiVersion: dpu.openshift.io/v1alpha1
kind: OVNKubeConfig
metadata:
  name: ovnkubeconfig-sample
spec:
  kubeConfigFile: tenant-cluster-1-kubeconf
  poolName: dpu
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/dpu-worker: ''

In the DPU Network Operator there are several status messages that can be used to verify the Operator’s working status:

McpReady indicates that the MachineConfig and MachineConfigPool are ready.
TenantObjsSynced indicates that the tenant ConfigMap and secrets are synced to the infrastructure nodes.
OvnKubeReady indicates that the ovnkube-node DaemonSet is ready.

Modify log rotation of CRI-O

The BlueField-2 DPU has 16G eMMC, and when OpenShift Container Platform is running over time the logs can fill up the disc. In OpenShift Container Platform, the rotation is carried out by default at approximately 50M and 5 files. The pod log file size and number are managed by the kubelet on each node which then passes the setting on to CRI-O. The default settings currently used in OpenShift Container Platform are as follows:

--container-log-max-size 50Mi (default 50Mi)
--container-log-max-files 5 (default 5)

The current kubelet configuration used by a node is located at /etc/kubernetes/kubelet.conf. The default parameters are overwritten from CRI-O:

containerLogMaxSize: 50Mi
containerLogMaxFiles: 5

Pod/container logs are already rotated automatically by default. The kubelet.conf file is managed by the machine-config operator so additional steps are necessary.

Limit the container logs on the infrastructure cluster worker nodes following this procedure.

Obtain the MachineConfig for the kubelet:

oc get machineconfig | grep -i kubelet

Example output

01-master-kubelet                                       46327eb10a201b48be7fb3da96a2ffe322539af8   3.2.0             47m
01-worker-kubelet                                       46327eb10a201b48be7fb3da96a2ffe322539af8   3.2.0             47m

Set up a custom kubelet for the worker nodes (nodes with DPUs) on the infrastructure cluster.
1. Label the worker machineconfigpool with custom-kubelet custom tag of logrotation:
```
$ oc label machineconfigpool worker custom-kubelet=logrotation
```
2. Create a YAML file logrotation.yaml defining a kubeletconfig custom resource (CR) for your configuration change:
```
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: cr-logrotation
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: logrotation 
  kubeletConfig:
    containerLogMaxFiles: 3 
    containerLogMaxSize: 10Mi 
```
  - Set this value to the same as the custom-kubelet value configured in step 2a.
  - Set the desired maximum number (for example 3) of container log files that can be present for a container.
  - Set the desired maximum size (for example 10Mi) of container log file before it is rotated.
3. Create the CR object:
```
$ oc create -f logrotation.yaml
```

Verify the kubeletconfig:
```
$ oc get kubeletconfig
```
Verify the kubeletconfig:
```
$ oc get kubeletconfig -o yaml
```

Verify that three machineconfigs now exist:

$ oc get machineconfig | grep -i kubelet

Example output

01-master-kubelet                                       46327eb10a201b48be7fb3da96a2ffe322539af8   3.2.0             74m
01-worker-kubelet                                       46327eb10a201b48be7fb3da96a2ffe322539af8   3.2.0             74m
99-worker-generated-kubelet                             46327eb10a201b48be7fb3da96a2ffe322539af8   3.2.0             11m

Enter into a debug session on one of the worker nodes. This step instantiates a debug pod called <node_name>-debug:
```
$ oc debug node/<node_name>
```
Set /host as the root directory within the debug shell. The debug pod mounts the host’s root file system in /host within the pod. By changing the root directory to /host, you can run binaries contained in the host’s executable paths:
```
# chroot /host
```

Verify the settings in kubelet.conf:

# grep -e containerLogMaxSize -e containerLogMaxFiles /etc/kubernetes/kubelet.conf

Example output

 "containerLogMaxSize": "10Mi",
 "containerLogMaxFiles": 3,

Select Your Language

OVN/OVS offloading with OpenShift on NVIDIA BlueField-2 DPUs

Table of Contents

About OVN/OVS offloading with OpenShift on NVIDIA BlueField-2 DPUs

Installing and configuring an accelerated infrastructure with OpenShift and DPUs

OpenShift and DPU deployment architecture

Installing the infrastructure cluster

Generating the discovery ISOs with the Assisted Installer for the infrastructure nodes

Download and extract the discovery ISO using ansible

Disabling DNS, the Ingress Controller, and monitoring on the worker nodes

Installing the tenant cluster (Control plane nodes only)

Generating the discovery ISO with the Assisted Installer

Installing the BlueField-2 DPU

Installing and enabling offloading with the DPU Network Operator

Creating a dedicated namespace

Configuring support for hardware offloading in the infrastructure cluster

Configuring support for hardware offloading in the tenant cluster

Adding worker nodes to the tenant cluster

Disabling DNS, the Ingress Controller, and monitoring on the worker nodes

Creating a subscription to the Node Maintenance Operator in the tenant cluster

Final infrastructure cluster configuration

Modify log rotation of CRI-O

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Table of Contents

About OVN/OVS offloading with OpenShift on NVIDIA BlueField-2 DPUs

Installing and configuring an accelerated infrastructure with OpenShift and DPUs

OpenShift and DPU deployment architecture

Installing the infrastructure cluster

Generating the discovery ISOs with the Assisted Installer for the infrastructure nodes

Download and extract the discovery ISO using ansible

Disabling DNS, the Ingress Controller, and monitoring on the worker nodes

Installing the tenant cluster (Control plane nodes only)

Generating the discovery ISO with the Assisted Installer

Installing the BlueField-2 DPU

Installing and enabling offloading with the DPU Network Operator

Creating a dedicated namespace

Configuring support for hardware offloading in the infrastructure cluster

Configuring support for hardware offloading in the tenant cluster

Adding worker nodes to the tenant cluster

Disabling DNS, the Ingress Controller, and monitoring on the worker nodes

Creating a subscription to the Node Maintenance Operator in the tenant cluster

Final infrastructure cluster configuration

Modify log rotation of CRI-O

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links