RHEL AI: "ilab config init" command failing with error "Unexpected error from cudaGetDeviceCount"

Solution Verified - Updated -

Issue

  • ilab config init command failing with error.
Generating config file and profiles:
    /var/home/instruct/.config/instructlab/config.yaml
    /var/home/instruct/.local/share/instructlab/internal/system_profiles
/opt/app-root/lib64/python3.11/site-packages/torch/cuda/__init__.py:128: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 802: system not yet initialized (Triggered internally at /mount/work-dir/torch-2.4.1/torch-2.4.1/c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
  • Post upgrade to RHEL AI 1.3, command ilab config init is failing.

Environment

  • Red Hat Enterprise Linux AI 1.3

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content