Liveness and readiness probes fail across multiple pods on the same node due to Twistlock Defender nfqueue packet inspection delays
Issue
-
Pods across multiple namespaces intermittently fail liveness and readiness probes on the same worker node. The failures are not limited to a single application; platform components such as openshift-dns, network-check-target, and application pods all fail probes simultaneously. Containers killed after liveness probe failure exit with code 137 (SIGKILL from kubelet).
-
Kubelet logs show the following probe failure pattern:
"Probe failed" probeType="Liveness" probeResult="failure" output="Get \"http://<pod-ip>:<port>/health\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
-
The "awaiting headers" message indicates that the TCP connection was established successfully but no HTTP response was received. This distinguishes the issue from a network connectivity failure where the connection itself would fail.
-
There are no OOM kill events in dmesg for the Twistlock Defender. The Defender pod continues running throughout the failure window.
Environment
- Red Hat OpenShift Container Platform (RHOCP) 4.x
- Twistlock / Prisma Cloud Defender (DaemonSet)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.