OpenShift 4: Rate limiting and kubelet healthcheck failures during advanced load testing + conntrack tuning on OpenShift 4

Solution Verified - Updated -

Issue

  • It was observed on OpenShift 4.14 using OvnKubernetes that Traffic from a load test was rate limited to around 35k requests per second to a single host accepting UDP traffic over the primary interface of the node.
  • Traffic originating from kubelet on the host to pods started to degrade at this threshold, leading to lost SYN packets and failed healthprobes on all pods on affected hosts receiving traffic
  • observe chaintoolong messaging in logs on host:
sudo conntrack -S | grep -v chaintoolong=0
cpu=0           found=9466 invalid=5 insert=0 insert_failed=18776 drop=2 early_drop=0 error=0 search_restart=0 clash_resolve=0 chaintoolong=18774
  • Pods flapping READY status from all projects/namespaces on any pod hosted on the node accepting load test.

Environment

  • Red Hat OpenShift Container Platform (RHOCP) 4.14.14 and later (observed still required in 4.16+)
  • Openshift-OVN-kubernetes CNI
  • Tigera/Calico CNI

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content