Red Hat Training

A Red Hat training course is available for OpenShift Container Platform

Chapter 19. Using Device Plug-ins

19.1. What Device Plug-ins Do

Device plug-ins allow you to use a particular device type (GPU, InfiniBand, or other similar computing resources that require vendor-specific initialization and setup) in your OpenShift Container Platform pod without needing to write custom code. The device plug-in provides a consistent and portable solution to consume hardware devices across clusters. The device plug-in provides support for these devices through an extension mechanism, which makes these devices available to containers, provides health checks of these devices, and securely shares them.

Important

OpenShift Container Platform supports the device plug-in API, but the device plug-in containers are supported by individual vendors.

A device plug-in is a gRPC service running on the nodes (external to atomic-openshift-node.service) that is responsible for managing specific hardware resources. Any device plug-in must support following remote procedure calls (RPCs):

service DevicePlugin {
      // GetDevicePluginOptions returns options to be communicated with Device
      // Manager
      rpc GetDevicePluginOptions(Empty) returns (DevicePluginOptions) {}

      // ListAndWatch returns a stream of List of Devices
      // Whenever a Device state change or a Device disappears, ListAndWatch
      // returns the new list
      rpc ListAndWatch(Empty) returns (stream ListAndWatchResponse) {}

      // Allocate is called during container creation so that the Device
      // Plug-in can run device specific operations and instruct Kubelet
      // of the steps to make the Device available in the container
      rpc Allocate(AllocateRequest) returns (AllocateResponse) {}

      // PreStartContainer is called, if indicated by Device Plug-in during
      // registration phase, before each container start. Device plug-in
      // can run device specific operations such as reseting the device
      // before making devices available to the container
      rpc PreStartContainer(PreStartContainerRequest) returns (PreStartContainerResponse) {}
}

19.1.1. Example Device Plug-ins

Note

For easy device plug-in reference implementation, there is a stub device plug-in in the Device Manager code: vendor/k8s.io/kubernetes/pkg/kubelet/cm/deviceplugin/device_plugin_stub.go.

19.2. Methods for Deploying a Device Plug-in

  • Daemonsets are the recommended approach for device plug-in deployments.
  • Upon start, the device plug-in will try to create a UNIX domain socket at /var/lib/kubelet/device-plugin/ on the node to serve RPCs from Device Manager.
  • Since device plug-ins need to manage hardware resources, access to the host file system, as well as socket creation, they must be run in a privileged security context.
  • More specific details regarding deployment steps can be found with each device plug-in implementation.