RHEL AI: ilab model serve command failing with error "CUDA out of memory."

Solution Verified - Updated -

Issue

  • ilab model serve command failing with error.
$ $ ilab model serve --model-path ~/.cache/instructlab/models/mixtral-8x7b-instruct-v0-1
.
.
[rank0]: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB. GPU 
  • Using mixtral LLM for inference serving is causing memory error.

Environment

  • Red Hat Enterprise Linux AI 1.1

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content