RHEL AI: ilab train command failing with error "Provided path to model does not exist"
Issue
- RHEL AI: ilab train command failing with error.
$ ilab model train --data-path ~/.local/share/instructlab/datasets/2025-02-12_110827/knowledge_train_msgs_2025-02-12T11_12_30.jsonl --num-epochs 1 --device=cuda
LoRA is disabled (rank=0), ignoring all additional LoRA args
[2025-02-12 11:33:20,000] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
INFO 2025-02-12 11:33:23,335 numexpr.utils:149: Note: NumExpr detected 48 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
INFO 2025-02-12 11:33:23,335 numexpr.utils:162: NumExpr defaulting to 16 threads.
INFO 2025-02-12 11:33:25,022 datasets:59: PyTorch version 2.5.1 available.
--- Logging error ---
Traceback (most recent call last):
File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/accelerated_train.py", line 233, in accelerated_train
run_training(train_args=train_args, torch_args=torch_args)
File "/opt/app-root/lib64/python3.11/site-packages/instructlab/training/__init__.py", line 36, in run_training
return run_training(torch_args=torch_args, train_args=train_args)
.
.
FileNotFoundError: Provided path to model does not exist. Please make sure that you've passed a valid model and that it has appropriate permissions: /var/home/instruct/.cache/instructlab/models/granite-3.1-8b-starter-v1
.
.
Message: 'Failed during training loop: '
Arguments: (FileNotFoundError("Provided path to model does not exist. Please make sure that you've passed a valid model and that it has appropriate permissions: /var/home/instruct/.cache/instructlab/models/granite-3.1-8b-starter-v1"),)
Accelerated Training failed with 1
Environment
- Red Hat Enterprise Linux 1.4
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.