libcuda.so.1 not found when installing NVIDIA gpu-operator
Issue
- NVIDIA driver and runtime pods in the GPU operator namespace fail:
NAME READY STATUS RESTARTS AGE
nvidia-container-toolkit-daemonset-8xq45 1/1 Running 0 26h
nvidia-dcgm-exporter-2w44v 1/1 Running 1 26h
nvidia-device-plugin-daemonset-8vzrx 1/1 Running 0 26h
nvidia-device-plugin-validation 0/1 Completed 0 26h
nvidia-driver-daemonset-r962k 0/1 CrashLoopBackOff 310 26h
nvidia-driver-validation 0/1 Completed 0 26h
- Driver errors may be present in nvidia-dcgm-exporter and nvidia-driver-validation pods:
./vectorAdd_nvrtc: error while loading shared libraries: libcuda.so.1: cannot open shared object file: No such file or directory
Error: Failed to initialize NVML
time="2020-05-13T22:45:32Z" level=fatal msg="Error starting nv-hostengine: DCGM initialization error"
Environment
- OpenShift Container Platform
- 4
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.