Deploying a model by using the Distributed Inference Server with llm-d [Developer preview]

Distributed Inference Server with llm-d is a Kubernetes-native, open-source framework designed for serving large language models (LLMs) at scale. You can use Distributed Inference Server with llm-d to simplify the deployment of generative AI, focusing on high performance and cost-effectiveness across various hardware accelerators.

Key features of Distributed Inference Server with llm-d ...

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Here are the common uses of Markdown.

Code blocks

~~~
Code surrounded in tildes is easier to read
~~~

Links/URLs

[Red Hat Customer Portal](https://access.redhat.com)

Select Your Language

Deploying a model by using the Distributed Inference Server with llm-d [Developer preview]

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links