RHEL AI: ilab model serve command failing with error "CUDA out of memory."
Issue
ilab model servecommand failing with error.
$ $ ilab model serve --model-path ~/.cache/instructlab/models/mixtral-8x7b-instruct-v0-1
.
.
[rank0]: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB. GPU
- Using
mixtralLLM for inference serving is causing memory error.
Environment
- Red Hat Enterprise Linux AI 1.1
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.