RHSB-2025-001 vLLM Distributed KV cache features - (CVE-2025-47277)
Updated
Was this information helpful?
Executive summary
The vLLM project provides a high-performance and easy-to-use library for Large Language Models. Certain vLLM capabilities require additional planning to ensure secure deployment and usage. In particular, Inter-Node Communication in vLLM is insecure by default and can only be protected by restricting nodes to an isolated network.
A vulnerability was discovered in the vLLM project. The issue is assigned to CVE-2025-47277, rated with a severity impact of Moderate. By default, Red Hat products are configured to restrict vLLM nodes to an isolated network. However, this vulnerability could become relevant if customers change the specific configurations, and therefore, Red Hat products are affected.
The following Red Hat products include vLLM:
-
Red Hat AI Inference Server
-
Red Hat Enterprise Linux AI (RHEL AI)
-
Red Hat OpenShift AI (RHOAI)
Technical details
Impacted Configuration
Users are affected if their vLLM deployment meets all of the following conditions:
-
Using the vLLM V0 engine, which was the default prior to 0.8.0. In 0.8.0 and later, vLLM must be run with the VLLM_USE_V1 environment variable set to 0. This is not a default configuration in Red Hat products.
-
Using affected versions of vLLM with the PyNcclPipe feature with the V0 engine from 0.6.5 to 0.8.4, inclusive. This is not a default configuration in Red Hat products.
-
Node is connected to networks other than the isolated vLLM Inter-Node Communication network. Red Hat products do not enable multi-node features in their default configuration.
-
Relying on the –kv-ip parameter of the KVTransferConfig to prevent the vLLM server from receiving data from untrusted networks. This is not a default configuration in Red Hat products.
No other configurations, including the default configurations in the aforementioned products, are affected.
Background
vLLM can be configured to run in a multi-node scenario. This distributes model execution across GPUs spread across multiple hosts. Another optional mechanism that can be used in a multi-node scenario is the distribution and sharing of the KV cache.
The V0 engine has multiple experimental options for transferring the KV cache between hosts. One of these options, the PyNcclPipe implementation, is the subject of this vulnerability.
Red Hat products do not enable multi-node vLLM deployments by default.
Associated risks
vLLM is a community-driven project of the PyTorch Foundation. The security philosophy of vLLM, which inherits some key attributes from PyTorch, is described in this security guide. The security guide describes that multi-node communications must be isolated on a single purpose network. This philosophy heavily emphasizes performance optimization where system administrators are responsible for ensuring a secure environment.
PyNcclPipe vulnerability
The KVTransferConfig includes a --kv-ip CLI parameter intended to let operators specify the network interface to which the KV cache should bind. However, when using PyNcclPipe, this parameter is ignored and instead binds to 0.0.0.0, exposing it on all network interfaces. In environments where the vLLM node is connected to multiple networks, including untrusted networks, this behavior could allow an attacker on an untrusted network to execute arbitrary commands on the server.
Red Hat products deploy vLLM in containers in pods with a single network interface, eliminating the ability to use the –kv-ip parameter as an effective method to restrict access. Access controls to network applications in a pod should be restricted using NetworkPolicy objects in Openshift environments, or by creating dedicated isolated networks using podman.
Recommendations
KV cache transfer in the V0 engine is experimental and is not recommended for use in any production environment. If you are interested in this type of functionality, we encourage you to follow developments within the recently announced llm-d project.
Acknowledgements
This issue was reported independently by three different parties:
-
@kikayli (Zhuque Lab, Tencent)
-
Russell Bryant (Red Hat)
Comments