RHEL AI: "ilab config init" command failing with error "Unexpected error from cudaGetDeviceCount"
Issue
ilab config init
command failing with error.
Generating config file and profiles:
/var/home/instruct/.config/instructlab/config.yaml
/var/home/instruct/.local/share/instructlab/internal/system_profiles
/opt/app-root/lib64/python3.11/site-packages/torch/cuda/__init__.py:128: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 802: system not yet initialized (Triggered internally at /mount/work-dir/torch-2.4.1/torch-2.4.1/c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
- Post upgrade to
RHEL AI 1.3
, commandilab config init
is failing.
Environment
- Red Hat Enterprise Linux AI 1.3
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.