How to install the NVIDIA GPU Operator with OpenShift

Solution Verified - Updated -

Issue

With the general availability of the NVIDIA driver container and NVIDIA GPU operator, NVIDIA GPUs are now enabled on OpenShift 4 RHEL CoreOS worker nodes.

NVIDIA GPU enablement on an OpenShift RHEL CoreOS worker node requires the enablement of the NVIDIA stack that includes the NVIDIA driver. The operator manages the install, configuration and lifecycle of the stack. Without the NVIDIA Operator work, we did not have GPU support on OpenShift 4 with RHEL CoreOS nodes. The NVIDIA GPU Operator changes that.

The NVIDIA GPU stack comprises: NVIDIA (CUDA) driver container, NVIDIA runtime plugin for CRI-O, NVIDIA device plugin for Kubernetes.

The NVIDIA GPU operator builds upon the work of the Special Resources Operator (SRO) and the Node Feature Discovery (NFD).
The SRO is a community operator contributed and maintained by Red Hat. NFD ships as part of OpenShift starting 4.2. The NVIDIA GPU operator and the NVIDIA GPU stack is shipped and maintained by NVIDIA.

Note that. starting OpenShift 3.11, NVIDIA GPUs have been enabled on RHEL7 (worker nodes). This requires the manual deployment of the aforementioned NVIDIA stack.

Environment

  • Red Hat OpenShift Container Platform 4.2
  • Red Hat OpenShift Container Platform 4.3

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In