OVN/OVS offloading with OpenShift on NVIDIA BlueField-2 DPUs
Table of Contents
- About OVN/OVS offloading with OpenShift on NVIDIA BlueField-2 DPUs
- Installing and configuring an accelerated infrastructure with OpenShift and DPUs
- OpenShift and DPU deployment architecture
- Installing the infrastructure cluster
- Generating the discovery ISOs with the Assisted Installer for the infrastructure nodes
- Download and extract the discovery ISO using ansible
- Disabling DNS, the Ingress Controller, and monitoring on the worker nodes
- Installing the tenant cluster (Control plane nodes only)
- Generating the discovery ISO with the Assisted Installer
- Installing the BlueField-2 DPU
- Installing and enabling offloading with the DPU Network Operator
- Creating a dedicated namespace
- Configuring support for hardware offloading in the infrastructure cluster
- Configuring support for hardware offloading in the tenant cluster
- Adding worker nodes to the tenant cluster
- Disabling DNS, the Ingress Controller, and monitoring on the worker nodes
- Creating a subscription to the Node Maintenance Operator in the tenant cluster
- Final infrastructure cluster configuration
- Modify log rotation of CRI-O
OpenShift Container Platform installed on DPUs facilitates OVN/OVS offloading.
Note
OVN/OVS offloading with OpenShift on NVIDIA BlueField-2 DPUs is a Developer Preview feature in OpenShift Container Platform 4.10 only. It is not available on previous versions of OpenShift Container Platform.About Developer Preview features
Developer Preview features are not supported with Red Hat production service level agreements (SLAs) and are not functionally complete. Red Hat does not advise using them in a production setting. Developer Preview features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process. These releases may not have any documentation, and testing is limited. Red Hat may provide ways to submit feedback on Developer Preview releases without an associated SLA.
The features described in this document are for Developer Preview purposes and are not supported by Red Hat at this time.
About OVN/OVS offloading with OpenShift on NVIDIA BlueField-2 DPUs
Learn how to accelerate and offload software subsystems using OpenShift Container Platform and the Data Processing Unit (DPU). DPUs are a class of reprogrammable high-performance processors combined with high-performance network interfaces optimized to perform and accelerate network and storage functions carried out by data center servers.
The Data Processing Unit (DPU) is a complete compute system with an independent software stack, network identity, and provisioning capabilities. The DPU is fully capable of hosting its own applications using either embedded or orchestrated deployment models.
The unique capabilities of the DPU are disruptive because they allow for key infrastructure functions and their associated software stacks, to be completely removed from the host node’s CPU cores and to be relocated onto the DPU.
Installing and configuring an accelerated infrastructure with OpenShift and DPUs
Installing OpenShift Container Platform on a DPU makes it possible to offload packet processing from the host x86 to the DPU. Offloading resource-intensive computational tasks, such as packet processing, from the server’s CPU to the DPU frees up cycles cycles on the OpenShift Container Platform worker nodes to run the applications.
OpenShift and DPU deployment architecture
The proposed deployment architecture is a two-cluster design. In this architecture, DPU cards are provisioned as worker nodes of the ARM-based infrastructure cluster. The tenant cluster, composed of the x86 servers, is where the normal user applications run. The following diagram illustrates the deployment architecture.
The installation of this infrastructure cluster can be on ARM hardware or a mixed environment of x86 control plane and ARM worker nodes.
The steps involved in deploying and configuring are:
-
Install the infrastructure cluster by using the assisted installer.
- Install 3 control plane nodes and at least 2 DPU worker nodes.
-
Install the control plane nodes of the tenant cluster by using the assisted installer.
-
Install the DPU Network Operator on the infrastructure cluster.
- Partially configure the DPU Network Operator effectively making it aware of the attached DPU.
-
Configure support for hardware offloading in the infrastructure cluster.
-
Configure support for hardware offloading in the tenant cluster.
- Enable the SR-IOV Network Operator on the hosts with DPU mode.
-
Add worker nodes to the tenant cluster.
-
Final infrastructure configuration.
Installing the infrastructure cluster
Use the Assisted Installer to install OpenShift Container Platform on the infrastructure cluster running on multi architecture with the control plane nodes running on x86_64
and the worker nodes running on ARM architecture. The infrastructure cluster both the control plane and worker nodes can run on ARM architecture.
The infrastructure cluster should use the DPU’s uplink network as the primary network and not the out-of-band management network.
-
Shared VLAN: The Assisted Installer requires all nodes to be present on the same VLAN. This is a requirement of utilizing a virtual IP for both ingress and API.
-
DHCP: Much like an installer-provisioned infrastructure (IPI) installation DHCP is required in the VLAN.
-
DNS: Records for ingress and API are required for accessing the cluster.
Generating the discovery ISOs with the Assisted Installer for the infrastructure nodes
Installing OpenShift Container Platform infrastructure cluster (3 Control plane nodes and 3 worker nodes) requires the creation of two discovery ISOs. You can use the Assisted Installer UI or the API to create discovery ISOs for both x86_64
and arm64
.
See the Assisted Installer for OpenShift Container Platform documentation for details on using the Assisted Installer.
When installing using the UI and setting the cluster details in the dropdown menu associated with the OpenShift version select an OpenShift Container Platform version with the -multi
extension for example OpenShift Container Platform 4.12.15-multi
. Follow the installation instructions, and for the control plane nodes, choose x86_64
as the CPU architecture
from the dropdown menu, and proceed to download the discovery ISO. For the worker nodes, choose 'arm64' as the CPU architecture and follow the instructions to download a second discovery ISO.
A summary of the steps relevant to installing a multi architecture using the Assisted Installer API are as follows:
-
Register a new cluster setting the
openshift_version
to installOpenShift Container Platform 4.12.15-multi
. -
Register a new infrastructure environment setting the
cpu_architecture
tox86_64
and proceed to download the discovery ISO. For the worker nodes, setarm64
as thecpu_architecture
and follow the instructions to download a second discovery ISO.You need to create two infrastructure environments.
After completing this stage you will have two discovery ISOs. Boot the x86_64
control plane nodes following the standard boot procedure appropriate for your environment described in Booting hosts with the discovery image for additional details.
Download and extract the discovery ISO using ansible
The DPU network card does not currently allow you to install using virtual media so you need to use the PXE boot method. An ansible repo is available to ensure the download and extraction of the discovery ISO needed to PXE boot these additional hosts is easy to perform. Use the playbooks in this repo to:
-
Add worker nodes to the existing infrastructure cluster.
-
Set up the PXE services putting the files in the correct location to boot the added hosts.
The Ansible Assisted Installer Playbooks includes two different roles:
-
setup-pxe
: This role installs the basic services required for PXE booting namely:-
HTTP server
-
TFTP server
-
Optional: DHCP server. Add the configuration for DHCP to the
playbook.yaml
to install the DHCP service.DHCP is optional as you may already have a DHCP server.
-
-
download-iso-pxe
: This role is for Assisted Installer day2 deployments. Giving an existing cluster it adds the worker node using the API. This role also transforms the cluster to Day2, downloads and extracts the ISO into HTTP and TFTP folders. In addition it generates thegrub.cfg
.
This procedure describes how to download and install the discovery ISO. Hosts have already been added to the existing cluster.
-
You have
ansible
version 2.9 or greater installed. -
You have installed
jq
.See the jq Manual for detailed information about usingjq
.
-
Clone the Ansible Assisted Installer Playbooks repo.
Clone the repo on the installer node or on your laptop (that has access to the installer node).
$ git clone https://github.com/rh-ecosystem-edge/ansible-ai-playbook.git
-
Edit the
playbook.yaml
.--- - hosts: all roles: - setup-pxe - download-iso-pxe vars: ARCH: arm URL: http://<URL>:<PORT> CLUSTER_ID: "<CLUSTER UUID>"
-
This field sets up the basic services required for PXE booting
-
Enter the Assisted Installer URL and port number
-
Enter the cluster UUID from the Assisted Installer UI.
The option exists to download the ISO manually and extract what is needed to support PXE booting the hosts. To do this add two variables to the
playbook.yaml
. The variables are:-
ISO_NAME: "discovery_image.iso" - This variable is the name of the downloaded ISO.
-
WORKDIR: "/tmp/download-iso-pxe"- This variable specifies the directory where the ISO is downloaded to.
-
-
Download and extract the discovery ISO by using the following command.
$ ansible-playbook -i localhost, playbook.yaml --tags extract,download
Use
localhost
if running the command on the installer node. Replacelocalhost
with theFQDN
of the installer node if running the command on your laptop. By default the name of the downloaded ISO isdiscovery_image.iso
and it is downloaded to/tmp/download-iso-pxe
. -
Verify the discovery image exists in
/tmp/download-iso-pxe
.-
List the contents of the download directory.
$ ll /tmp/download-iso-pxe
Expected output
discovery_image.iso
-
-
Verify the correct files are extracted to the correct locations.
-
Verify the ignition file and rootfs image exists in
/var/www/html/pxe
.$ ll /var/www/html/pxe
Expected output
total 839720 -rw-r--r--. 1 apache apache 10484 Mar 23 06:15 config.ign -rw-r--r--. 1 apache apache 859858944 Mar 23 06:15 rootfs.img
-
Verify the
grub.cfg
and the files needed to boot the VM exist in/var/lib/tftpboot
.$ ll /var/lib/tftpboot
Expected output
total 89956 -rw-r--r--. 1 root root 857984 Jan 28 01:13 BOOTAA64.EFI -rw-r--r--. 1 root root 2446280 Jan 28 01:13 grubaa64.efi -rw-r--r--. 1 root root 567 Mar 23 06:16 grub.cfg -rw-r--r--. 1 root root 79229244 Mar 23 06:06 initrd.img -rw-r--r--. 1 root root 9565104 Mar 23 06:06 vmlinuz
-
At this stage you have all the files required to PXE boot the DPU hosts.
Disabling DNS, the Ingress Controller, and monitoring on the worker nodes
Apply the following commands to disable the DNS, Ingress Operator and monitoring on the infrastructure worker nodes.
-
Install the OpenShift CLI (
oc
). -
Log in to infrastructure cluster as a user with
cluster-admin
privileges..
-
Run the following command to patch the
dns.operator/default
and ensure that the DNS Operator is scheduled on a OpenShift Container Platform node with the labelnode-role.kubernetes.io/master
:$ oc patch dns.operator/default --type=merge --patch='{"spec":{"nodePlacement":{"nodeSelector":{"node-role.kubernetes.io/master":""}}}}'
-
Run the following command to patch the
ingresscontroller/default
and ensure that the Ingress Operator is scheduled on a OpenShift Container Platform node with the labelnode-role.kubernetes.io/master
:$ oc patch ingresscontroller/default --type=merge --patch='{"spec":{"nodePlacement":{"nodeSelector":{"matchLabels":{"node-role.kubernetes.io/master":""}}}}}' -n openshift-ingress-operator
-
Disable monitoring on the worker nodes by following these steps:
-
Create the following YAML and save it in file named
monitor-patch-cm.yaml
:apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusOperator: nodeSelector: node-role.kubernetes.io/master: "" prometheusK8s: nodeSelector: node-role.kubernetes.io/master: "" alertmanagerMain: nodeSelector: node-role.kubernetes.io/master: "" kubeStateMetrics: nodeSelector: node-role.kubernetes.io/master: "" grafana: nodeSelector: node-role.kubernetes.io/master: "" telemeterClient: nodeSelector: node-role.kubernetes.io/master: "" k8sPrometheusAdapter: nodeSelector: node-role.kubernetes.io/master: "" openshiftStateMetrics: nodeSelector: node-role.kubernetes.io/master: "" thanosQuerier: nodeSelector: node-role.kubernetes.io/master: ""
-
Run the following command:
$ oc create -f monitor-patch-cm.yaml
-
Installing the tenant cluster (Control plane nodes only)
Use the Assisted Installer to install the tenant cluster.
-
Shared VLAN: The Assisted Installer requires all nodes to be present on the same VLAN. This is a requirement of utilizing a virtual IP for both ingress and API.
-
DHCP: Much like an IPI installation DHCP is required in the VLAN.
-
DNS: Records for ingress and API are required for accessing the cluster.
Generating the discovery ISO with the Assisted Installer
Installing OpenShift Container Platform x86_64
tenant cluster (3 Control plane nodes and 3 workers nodes) requires a discovery ISO, which the Assisted Installer (AI) can generate with the cluster name, base domain, Secure Shell (SSH) public key, and pull secret.
In the initial install you are only installing the control plane nodes.
Use the Assisted Installer to create a discovery ISO for the x86_64
tenant cluster. See the Assisted Installer for OpenShift Container Platform documentation for details on using the Assisted Installer.
Once you have downloaded the discovery ISO boot the tenant cluster control plane nodes with the discovery image. See Booting hosts with the discovery image for additional details.
Installing the BlueField-2 DPU
The general steps to follow for installing the BlueField-2 DPU are:
-
Ensure that your server hardware is compatible with the BlueField-2 DPU. You can check the compatibility list on the NVIDIA website.
-
Install the BlueField-2 DPU onto your server hardware. The DPU will need to be installed in a PCI Express (PCIe) slot on your server.
-
Connect the BlueField-2 DPU to your server’s power supply and network interface.
-
Install the BlueField-2 DPU software on your server. The software includes the BlueField-2 driver, firmware, and management software.
-
Configure the BlueField-2 DPU using the management software. You can use the BlueField-2 Manager to configure and manage the DPU.
-
Verify that the BlueField-2 DPU is installed and working correctly. You can use the BlueField-2 Manager to check the status of the DPU.
These are general steps, and the specific installation process might change depending on your server hardware and software environment. Consult the installation guide for the BlueField-2 DPU for detailed instructions.
Installing and enabling offloading with the DPU Network Operator
The DPU Network Operator on the infrastructure cluster side is responsible for the life-cycle management of the ovn-kube
components and the necessary host network initialization on DPU cards.
-
An infrastructure cluster composed of
x86
control plane and ARM worker nodes are up and running. -
An tenant cluster composed of
x86
control plane nodes. -
DPU cards installed on the worker nodes of the infrastructure cluster where hardware offloading need to be enabled.
-
Pods in the infrastructure cluster can reach the API server of the tenant cluster
-
Network configuration:
-
The infrastructure cluster and the tenant cluster share the same VLAN as the API network.
-
The two clusters shall use different VIPs as the cluster API VIP.
-
The DHCP server needs to be configured to be able to assign IP addresses to hosts from both clusters from the same subnet.
-
The DNS server shall be able to resolve the URI to the API URL for both the clusters.
-
As a cluster administrator, you can install the Operator using the CLI.
-
Install the OpenShift CLI (
oc
). -
Log in to infrastructure cluster as a user with
cluster-admin
privileges.
-
Create a namespace for the DPU Network Operator by completing the following actions:
-
Create the following Namespace Custom Resource (CR) that defines the
openshift-dpu-network-operator
namespace, and then save the YAML in thedpuno-namespace.yaml
fileapiVersion: v1 kind: Namespace metadata: name: openshift-dpu-network-operator annotations: workload.openshift.io/allowed: management
-
Create the namespace by running the following command:
$ oc create -f dpuno-namespace.yaml
-
-
Install the DPU Network Operator in the namespace you created in the previous step by creating the following objects:
-
Create the following
OperatorGroup
CR and save the YAML in thedpuno-operatorgroup.yaml
file:apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: dpu-network-operators namespace: openshift-dpu-network-operator
-
Create the
OperatorGroup
CR by running the following command:$ oc create -f dpuno-operatorgroup.yaml
-
-
Create the following Subscription CR and save the YAML in the
dpu-sub.yaml
file:Example Subscription
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: openshift-dpu-operator-subscription namespace: openshift-dpu-network-operator spec: channel: "stable" name: dpu-network-operator source: redhat-operators sourceNamespace: openshift-marketplace
- You must specify the
redhat-operators
value.
- You must specify the
-
Create the Subscription object by running the following command:
$ oc create -f dpu-sub.yaml
-
Change to the
openshift-dpu-network-operator
project:$ oc project openshift-dpu-network-operator
-
Verify the DPU Network Operator is running:
$ oc get pods -n openshift-dpu-network-operator
Example output
NAME READY STATUS RESTARTS AGE dpu-network-operator-controller-manager-cc9ccc4bd-9vqcg 2/2 Running 0 62s
Creating a dedicated namespace
You need to install the OVNKubeConfig
custom resource (CR), the ovnkube-node
overrides config map (CM) and the secret containing the tenant kubeconfig
into the same namespace. Follow this procedure to create this namespace.
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
-
Create a YAML file named for example
dpu_namespace.yaml
that contains the following YAML:apiVersion: v1 kind: Namespace metadata: labels: pod-security.kubernetes.io/enforce: privileged pod-security.kubernetes.io/audit: privileged pod-security.kubernetes.io/warn: privileged security.openshift.io/scc.podSecurityLabelSync: "false" openshift.io/run-level: "0" name: tenantcluster-dpu
-
Create the namespace by running the following command:
$ oc create -f dpu_namespace.yaml
Configuring support for hardware offloading in the infrastructure cluster
Configure support for hardware offloading in the infrastructure cluster by using this procedure.
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
-
Create the following OVNKubeConfig custom resource (CR) with the poolName
dpu
, leaving thekubeConfigFile
blank. Save the YAML in theovnkubeconfig.yaml
file:apiVersion: dpu.openshift.io/v1alpha1 kind: OVNKubeConfig metadata: name: ovnkubeconfig-sample namespace: tenantcluster-dpu spec: poolName: dpu nodeSelector: matchLabels: node-role.kubernetes.io/dpu-worker: ""
-
Run the following command:
$ oc create -f ovnkubeconfig.yaml
The DPU Network Operator creates a custom MachineConfigPool and a custom MachineConfig.
-
-
Get the node names by using the following command:
$ oc get nodes
-
Label the DPU nodes in the infrastructure cluster. The Machine Config Operator applies the new MachineConfig to the DPU nodes, therefore enabling
switchdev
mode on them:$ oc label node <NODENAME> node-role.kubernetes.io/dpu-worker=
-
Create a Cluster Network Operator (CNO) ConfigMap in the infrastructure cluster setting the mode to DPU. Save the YAML in the
ovnkubeconfigmap.yaml
file:apiVersion: v1 kind: ConfigMap metadata: name: dpu-mode-config namespace: openshift-network-operator data: mode: "dpu" immutable: true
-
Run the following command:
$ oc create -f ovnkubeconfigmap.yaml
-
-
Label the DPU nodes in the infrastructure cluster where you want to enable hardware offloading.
$ oc label node <NODENAME> network.operator.openshift.io/dpu=
Configuring support for hardware offloading in the tenant cluster
Configure support for hardware offloading in the tenant cluster by using this procedure.
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
-
Create a MachineConfigPool for all the DPU workers. Save the YAML in the
dputenantmachineconfig.yaml
file:apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: dpu-host spec: machineConfigSelector: matchExpressions: - key: machineconfiguration.openshift.io/role operator: In values: - worker - dpu-host nodeSelector: matchLabels: node-role.kubernetes.io/dpu-host: ""
-
Run the following command:
$ oc create -f dputenantmachineconfig.yaml
-
-
Label the DPU nodes:
$ oc label node <NODENAME> node-role.kubernetes.io/dpu-host=
-
Install and configure the SR-IOV Network Operator:
This procedure describes how to install the SR-IOV Network Operator using the web console. However, if the console is not reachable due to the ingress being down follow the guidance on "CLI: Installing the SR-IOV Network Operator".
-
In the OpenShift Container Platform web console, click Administration → Namespaces.
-
Click Create Namespace.
-
In the Name field, enter
openshift-sriov-network-operator
, and click Create. -
In the OpenShift Container Platform web console, click Operators → OperatorHub.
-
Select SR-IOV Network Operator from the list of available Operators, and click Install.
-
On the Install Operator page, under A specific namespace on the cluster, select openshift-sriov-network-operator.
-
Click Install.
-
Verify that the SR-IOV Network Operator is installed successfully:
-
Navigate to the Operators → Installed Operators page.
-
Ensure that SR-IOV Network Operator is listed in the openshift-sriov-network-operator project with a Status of InstallSucceeded.
During installation an Operator might display a Failed status.
If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.If the operator does not appear as installed, to troubleshoot further:
-
Inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.
-
Navigate to the Workloads → Pods page and check the logs for pods in the
openshift-sriov-network-operator
project.
-
-
-
-
Add this machine config pool to the SriovNetworkPoolConfig custom resource.
-
Create a file, such as
sriov-pool-config.yaml
, with the following content:apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkPoolConfig metadata: name: default namespace: openshift-sriov-network-operator spec: ovsHardwareOffloadConfig: name: dpu-host
- The name here is the same as the machine config pool (MCP) name created in step 1.
-
Apply the configuration:
$ oc create -f sriov-pool-config.yaml
After applying the
sriov-pool-config.yaml
the nodes reboot and you need to wait until MCP on the dpu-host is up to date again.
-
-
Create a SriovNetworkNodePolicy to configure the virtual functions (VFs) on the hosts.
-
Save the YAML in the
SriovNetworkNodePolicy.yaml
file:apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: policy-mlnx-bf namespace: openshift-sriov-network-operator spec: resourceName: mlnx_bf nodeSelector: node-role.kubernetes.io/dpu-host: "" priority: 99 numVfs: 4 nicSelector: vendor: "15b3" deviceId: "a2d6" pfNames: ['ens1f0#1-3'] rootDevices: ['0000:3b:00.0']
-
The name for the custom resource object.
-
The resource name of the SR-IOV network device plug-in. You can create multiple SR-IOV network node policies for a resource name.
-
The node selector specifies the nodes to configure. Ensure this is consistent with the
nodeSelector
of the MCP created in step 1. -
Optional: The priority is an integer value between 0 and 99. A smaller value receives higher priority. For example, a priority of 10 is a higher priority than 99. The default value is 99.
-
The number of the virtual functions (VF) to create for the SR-IOV physical network device. For a Mellanox NIC, the number of VFs cannot be larger than 128.
-
The NIC selector identifies the device for the Operator to configure. You do not have to specify values for all the parameters. It is recommended to identify the network device with enough precision to avoid selecting a device unintentionally.
-
The vendor hexadecimal code of the SR-IOV network device. Vendor id
15b3
is for Mellanox devices. -
The device hexadecimal code of the SR-IOV network device. For example,
a2d6
is the device ID for a Bluefield-2 DPU device. -
An array of one or more physical function (PF) names for the device. The setting
ens1f0#1-3
in this example ensures 1 virtual function is reserved for the management port. -
An array of one or more PCI bus addresses for the PF of the device. Provide the address in the following format:
0000:02:00.1
.
-
-
Create the SriovNetworkNodePolicy object:
$ oc create -f SriovNetworkNodePolicy.yaml
After applying
SriovNetworkNodePolicy.yaml
, the nodes reboot and you need to wait until thedpu-host
machine config pools are up to date again.
-
-
Optional: Follow these optional steps if virtual functions are not being created on the tenant cluster.
-
Create the following Machine Config:
$ cat <<EOF > realloc.yaml apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: dpu-host name: pci-realloc spec: config: ignition: version: 3.2.0 kernelArguments: - pci=realloc
-
Apply the Machine Config and wait until all the nodes are rebooted:
$ oc create -f realloc.yaml
-
-
Create a Cluster Network Operator (CNO) ConfigMap in the tenant cluster setting the mode to
dpu-host
.-
Save the YAML in the
sriovdpuconfigmap.yaml
file:apiVersion: v1 kind: ConfigMap metadata: name: dpu-mode-config namespace: openshift-network-operator data: mode: "dpu-host" immutable: true
-
Run the following command:
$ oc create -f sriovdpuconfigmap.yaml
-
-
Create a machine config to disable Open vSwitch (OVS).
-
Create a YAML file for example
disable-ovs.yaml
:apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: dpu-host name: disable-ovs spec: config: ignition: version: 3.1.0 systemd: units: - mask: true name: ovs-vswitchd.service - enabled: false name: ovs-configuration.service
-
Add this machine config to the cluster by running the following command:
$ oc create -f disable-ovs.yaml
-
-
Set the environment variable
OVNKUBE_NODE_MGMT_PORT_NETDEV
for each DPU host.-
Save the YAML in the
setenvovnkube.yaml
file:apiVersion: v1 kind: ConfigMap metadata: name: env-overrides namespace: openshift-ovn-kubernetes data: x86-worker-node0: | OVNKUBE_NODE_MGMT_PORT_NETDEV=ens1f0v0
ens1f0v0
is the virtual function (VF) name that is assigned to theovnkube
node management port on the host.
-
Run the following command:
$ oc create -f setenvovnkube.yaml
-
-
Label the DPU nodes in the tenant cluster. Run the following command :
$ oc label node <NODENAME> network.operator.openshift.io/dpu-host=
Adding worker nodes to the tenant cluster
Use this procedure to add worker nodes to the tenant cluster by generating a new discovery ISO.
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
-
Navigate to the console.redhat.com/openshift.
-
From the list of created clusters find your created tenant cluster.
-
Select your tenant cluster.
-
Click the
Add hosts
button and select the installation media.-
Select
Minimal image file: Provision with virtual media
to download a smaller image that will fetch the data needed to boot. The nodes must have virtual media capability. This is the recommended method. -
Select
Full image file: Provision with physical
media to download the larger full image. -
Select
iPXE: Provision from your network server
. Use this when you have an iPXE server that has already been setup.
-
-
Add an SSH public key so that you can connect to the cluster nodes as the
core
user. Having a login to the cluster nodes can provide you with debugging information during the installation. -
Optional: If the cluster hosts are behind a firewall that requires the use of a proxy, select Configure cluster-wide proxy settings. Enter the username, password, IP address and port for the HTTP and HTTPS URLs of the proxy server.
-
Optional: Configure cluster-wide trusted certificate if needed. If the cluster hosts are in a network with a re-encrypting (MITM) proxy or the cluster needs to trust certificates for other purposes (e.g. container image registries).
-
Click
Generate Discovery ISO
. -
Download the discovery ISO.
-
Boot the tenant cluster worker host(s) with the discovery image. See Booting hosts with the discovery image for additional details.
Disabling DNS, the Ingress Controller, and monitoring on the worker nodes
Apply the following commands to disable the DNS, Ingress Operator and monitoring on the tenant worker nodes.
-
Install the OpenShift CLI (
oc
). -
Log in to infrastructure cluster as a user with
cluster-admin
privileges..
-
Run the following command to patch the
dns.operator/default
and ensure that the DNS Operator is scheduled on a OpenShift Container Platform node with the labelnode-role.kubernetes.io/master
:$ oc patch dns.operator/default --type=merge --patch='{"spec":{"nodePlacement":{"nodeSelector":{"node-role.kubernetes.io/master":""}}}}'
-
Run the following command to patch the
ingresscontroller/default
and ensure that the Ingress Operator is scheduled on a OpenShift Container Platform node with the labelnode-role.kubernetes.io/master
:$ oc patch ingresscontroller/default --type=merge --patch='{"spec":{"nodePlacement":{"nodeSelector":{"matchLabels":{"node-role.kubernetes.io/master":""}}}}}' -n openshift-ingress-operator
-
Disable monitoring on the worker nodes by following these steps:
-
Create the following YAML and save it in a file named
monitor-patch-cm.yaml
:apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusOperator: nodeSelector: node-role.kubernetes.io/master: "" prometheusK8s: nodeSelector: node-role.kubernetes.io/master: "" alertmanagerMain: nodeSelector: node-role.kubernetes.io/master: "" kubeStateMetrics: nodeSelector: node-role.kubernetes.io/master: "" grafana: nodeSelector: node-role.kubernetes.io/master: "" telemeterClient: nodeSelector: node-role.kubernetes.io/master: "" k8sPrometheusAdapter: nodeSelector: node-role.kubernetes.io/master: "" openshiftStateMetrics: nodeSelector: node-role.kubernetes.io/master: "" thanosQuerier: nodeSelector: node-role.kubernetes.io/master: ""
-
Run the following command:
$ oc create -f monitor-patch-cm.yaml
-
Creating a subscription to the Node Maintenance Operator in the tenant cluster
Create a subscription to the Node Maintenance Operator following this procedure.
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
-
Create a
Subscription
CR:-
Define the
Subscription
CR and save the YAML file, for example,node-maintenance-subscription.yaml
:apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: node-maintenance-operator namespace: openshift-operators spec: channel: stable InstallPlaneApproval: Automatic name: node-maintenance-operator source: redhat-operators sourceNamespace: openshift-marketplace StartingCSV: node-maintenance-operator.v4.12.0
-
To create the
Subscription
CR, run the following command:$ oc create -f node-maintenance-subscription.yaml
-
-
Verify that the installation succeeded by inspecting the CSV resource:
$ oc get csv -n openshift-operators
Example output
NAME DISPLAY VERSION REPLACES PHASE node-maintenance-operator.v4.12 Node Maintenance Operator 4.12 Succeeded
-
Verify that the Node Maintenance Operator is running:
$ oc get deploy -n openshift-operators
Example output
NAME READY UP-TO-DATE AVAILABLE AGE node-maintenance-operator-controller-manager 1/1 1 1 10d
Final infrastructure cluster configuration
Run the following steps against the infrastructure cluster.
-
Add the
kubeconfig
of the tenant cluster as a secret.$ oc create secret generic tenant-cluster-1-kubeconf -n tenantcluster-dpu --from-file=config=/root/kubeconfig.tenant
-
Add the per node configuration override for the
ovnkube-node
by listing all the DPU nodes under:data
:kind: ConfigMap apiVersion: v1 metadata: name: env-overrides namespace: tenantcluster-dpu data: worker-bf: | TENANT_K8S_NODE=x86-worker-node0 DPU_IP=192.168.111.29 MGMT_IFNAME=pf0vf0
-
Update the
OvnkubeConfig
CR by adding thekubeConfigFile
field. Wait until theovnkube
pods are created.apiVersion: dpu.openshift.io/v1alpha1 kind: OVNKubeConfig metadata: name: ovnkubeconfig-sample spec: kubeConfigFile: tenant-cluster-1-kubeconf poolName: dpu nodeSelector: matchLabels: node-role.kubernetes.io/dpu-worker: ''
In the DPU Network Operator there are several status messages that can be used to verify the Operator’s working status:
-
McpReady
indicates that the MachineConfig and MachineConfigPool are ready. -
TenantObjsSynced
indicates that the tenant ConfigMap and secrets are synced to the infrastructure nodes. -
OvnKubeReady
indicates that theovnkube-node
DaemonSet is ready.
Modify log rotation of CRI-O
The BlueField-2 DPU has 16G eMMC, and when OpenShift Container Platform is running over time the logs can fill up the disc. In OpenShift Container Platform, the rotation is carried out by default at approximately 50M and 5 files. The pod log file size and number are managed by the kubelet on each node which then passes the setting on to CRI-O. The default settings currently used in OpenShift Container Platform are as follows:
--container-log-max-size 50Mi (default 50Mi)
--container-log-max-files 5 (default 5)
The current kubelet configuration used by a node is located at /etc/kubernetes/kubelet.conf
. The default parameters are overwritten from CRI-O:
containerLogMaxSize: 50Mi
containerLogMaxFiles: 5
Pod/container logs are already rotated automatically by default. The kubelet.conf
file is managed by the machine-config operator so additional steps are necessary.
Limit the container logs on the infrastructure cluster worker nodes following this procedure.
-
Obtain the MachineConfig for the kubelet:
oc get machineconfig | grep -i kubelet
Example output
01-master-kubelet 46327eb10a201b48be7fb3da96a2ffe322539af8 3.2.0 47m 01-worker-kubelet 46327eb10a201b48be7fb3da96a2ffe322539af8 3.2.0 47m
-
Set up a custom kubelet for the worker nodes (nodes with DPUs) on the infrastructure cluster.
-
Label the worker machineconfigpool with custom-kubelet custom tag of logrotation:
$ oc label machineconfigpool worker custom-kubelet=logrotation
-
Create a YAML file
logrotation.yaml
defining akubeletconfig
custom resource (CR) for your configuration change:apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: cr-logrotation spec: machineConfigPoolSelector: matchLabels: custom-kubelet: logrotation kubeletConfig: containerLogMaxFiles: 3 containerLogMaxSize: 10Mi
-
Set this value to the same as the
custom-kubelet
value configured in step 2a. -
Set the desired maximum number (for example 3) of container log files that can be present for a container.
-
Set the desired maximum size (for example 10Mi) of container log file before it is rotated.
-
-
Create the CR object:
$ oc create -f logrotation.yaml
-
-
Verify the kubeletconfig:
$ oc get kubeletconfig
-
Verify the kubeletconfig:
$ oc get kubeletconfig -o yaml
-
Verify that three machineconfigs now exist:
$ oc get machineconfig | grep -i kubelet
Example output
01-master-kubelet 46327eb10a201b48be7fb3da96a2ffe322539af8 3.2.0 74m 01-worker-kubelet 46327eb10a201b48be7fb3da96a2ffe322539af8 3.2.0 74m 99-worker-generated-kubelet 46327eb10a201b48be7fb3da96a2ffe322539af8 3.2.0 11m
-
Enter into a debug session on one of the worker nodes. This step instantiates a debug pod called <node_name>-debug:
$ oc debug node/<node_name>
-
Set
/host
as the root directory within the debug shell. The debug pod mounts the host’s root file system in/host
within the pod. By changing the root directory to/host
, you can run binaries contained in the host’s executable paths:# chroot /host
-
Verify the settings in
kubelet.conf
:# grep -e containerLogMaxSize -e containerLogMaxFiles /etc/kubernetes/kubelet.conf
Example output
"containerLogMaxSize": "10Mi", "containerLogMaxFiles": 3,
Comments