Understanding Openshift `externalTrafficPolicy: local` and Source IP Preservation

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
  • Red Hat OpenShift on AWS (ROSA)
  • Red Hat OpenShift Dedicated (OSD)
  • Azure Red Hat OpenShift (ARO)

Issue

  • What is externalTrafficPolicy?
  • What are externalTrafficPolicy settings?
  • How the Local policy helps in preserving the client's source IP?

Resolution

In summary:
- Use externalTrafficPolicy: Local when you need to preserve the source IP of external clients.
- Use internalTrafficPolicy: Local to optimize internal traffic routing to reduce hops and latency.

Root Cause

externalTrafficPolicy is the setting associated with Openshift/Kubernetes Services, particularly for the LoadBalancer and NodePort service types. These policies determine how traffic is routed to the backend Pods and how the source IP of the client is preserved.

externalTrafficPolicy:
- Used to control the behavior of external traffic, typically originating from outside the cluster.
- Options:
- Cluster (default): Traffic is routed to any node in the cluster, regardless of whether the node has a running Pod for the service or not. If the node does not have a Pod for the service, the traffic is then forwarded to a node that does. This can cause the client's source IP to be masked by the node's IP, but it ensures even load distribution across all nodes.
- Local: Traffic is routed only to nodes that have running Pods for the service. If the node receiving the traffic does not have a Pod for the service, the traffic is dropped. This preserves the client's source IP but can lead to uneven load distribution if a particular node has more active connections.
- When to use:
- Cluster: When source IP preservation is not needed and even load distribution is preferred.
- Local: When source IP preservation is crucial, such as for logging or security purposes.

For example, you have a Openshift/Kubernetes cluster with 3 nodes: Node A, Node B, and Node C. You have a Service of type LoadBalancer or NodePort that exposes your application to the outside world. External traffic (from a client or user outside the cluster) comes in, and the LoadBalancer or NodePort routes this traffic to one of the nodes. Let's say it routes the traffic to Node A.

Now, Node A has to decide what to do with this traffic. If externalTrafficPolicy is set to Cluster (the default), Node A will send the traffic to any node that has a running Pod for the Service, even if Node A itself doesn't have one. This might involve an extra hop, and during this process, the original source IP of the client can get masked or replaced by Node A's IP. If externalTrafficPolicy is set to Local, Node A will check if it has a running Pod for the Service.

If Node A has a running Pod for the Service, it will send the traffic directly to its local Pod. Since there's no need for the traffic to be forwarded to another node, the original source IP of the client is preserved. If Node A does not have a running Pod for the Service, it will drop the traffic. This means the client's request will not be fulfilled, and they might experience this as a timeout or failed request.

The key advantage of Local policy is the preservation of the client's source IP. This can be crucial for applications or systems that rely on the original source IP for logging, security filtering, or other purposes. However, the trade-off is that if a node does not have a Pod for the Service and it receives traffic, that traffic will be dropped.

In simpler terms: With Local policy, external traffic is only served by local Pods on the node it initially reaches. If there's no local Pod, the request fails. This ensures the client's original IP address remains unchanged, but at the potential cost of some failed requests if traffic is not evenly distributed or if some nodes don't have Pods for the Service.

How the Local policy helps in preserving the client's source IP:

When a packet (a unit of data) is sent over a network, it has both a source IP (indicating where it came from) and a destination IP (indicating where it's going). As this packet travels through various network devices and systems, certain operations can change the original source IP. One such operation is NAT (Network Address Translation).

In the context of Openshift/Kubernetes:

  1. Without NAT (Local Policy):

    • When externalTrafficPolicy is set to Local, the idea is to avoid any additional hops that would require NAT operations.
    • If the traffic reaches a node that has a running Pod for the service, the traffic is directly forwarded to the Pod. Since there's no need for NAT, the source IP remains unchanged.
    • If the traffic reaches a node that does NOT have a running Pod for the service, performing NAT to send the traffic to another node would change the source IP. To avoid this and preserve the source IP, Openshift/Kubernetes simply drops the traffic.
  2. With NAT (Cluster Policy):

    • When externalTrafficPolicy is set to Cluster, Openshift/Kubernetes is more less strict about where the traffic goes. If a node receives traffic but doesn't have a Pod for the service, it will forward (or NAT) the traffic to a node that does. This NAT operation changes the source IP to the IP of the node that did the forwarding.

So, the act of dropping the traffic in the Local policy is not what preserves the source IP. Instead, it's the decision to avoid NAT altogether. By dropping traffic that would require NAT (i.e., traffic that arrives at a node without a local Pod), Openshift/Kubernetes ensures that only traffic that can be handled without changing the source IP is processed. This guarantees that when a packet is received by a Pod, its source IP is the original IP of the client, not the IP of a node that forwarded it.

PS: internalTrafficPolicy (introduced in Kubernetes v1.21): This setting is similar to externalTrafficPolicy but used to control the behavior of internal traffic, which originates from inside the cluster. While externalTrafficPolicy deals with traffic coming from outside the cluster, internalTrafficPolicy deals with traffic originating from within the cluster itself.

PSPS: You cannot set externalTrafficPolicy: local for an Openshift Service when application namespace has an EgressIP [prior to Openshift 4.16] since the External Source trying to reach the Openshift Service will make it to the Pod but not route back. Changing it to externalTrafficPolicy: cluster allows communication to work back to the External Source. Also, removing the EgressIP label from the namespace, removing the EgressIP, allows it to work. In one example, customer is using OCP 4.14.27 with OVNKubernetes and MetalLB in Layer2 mode and experienced this scenario.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments