Deploying a model by using the Distributed Inference Server with llm-d [Developer preview]
Updated -
Distributed Inference Server with llm-d is a Kubernetes-native, open-source framework designed for serving large language models (LLMs) at scale. You can use Distributed Inference Server with llm-d to simplify the deployment of generative AI, focusing on high performance and cost-effectiveness across various hardware accelerators.
Key features of Distributed Inference Server with llm-d ...
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.