RHEL AI: ilab model serve command failing with error "CUDA out of memory."

Issue

ilab model serve command failing with error.

$ $ ilab model serve --model-path ~/.cache/instructlab/models/mixtral-8x7b-instruct-v0-1
.
.
[rank0]: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB. GPU

Using mixtral LLM for inference serving is causing memory error.

Environment

Red Hat Enterprise Linux AI 1.1

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Here are the common uses of Markdown.

Code blocks

~~~
Code surrounded in tildes is easier to read
~~~

Links/URLs

[Red Hat Customer Portal](https://access.redhat.com)

Select Your Language

RHEL AI: ilab model serve command failing with error "CUDA out of memory."

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links