Troubleshooting Throttled Errors in RHACS Caused by Zombie Processes in Network Namespace

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4.15
  • Red Hat Advanced Cluster Security (RHACS)
    • 4.5

Issue

  • The Collector logs repeatedly show the following error message every few minutes:

    [ERROR ...] [Throttled] Could not determine network namespace: No such file or directory
    

Resolution

  • This is a known issue that has been reported in ROX-26763, and the Red Hat engineering team is working on it.

  • If this bug is required, open a support case on the Red Hat Customer Portal referring to this solution.

  • A fix is being developed to:

    • Reduce the severity level of messages related to zombie process detection.

    • Potentially implement a threshold-based warning system for abnormal zombie process accumulation.

Root Cause

This message appears when the Collector encounters zombie processes in the cluster. While the presence of a few zombie processes is normal in container environments, the ERROR-level logging can be misleading and make it difficult to identify actual problems.

Diagnostic Steps

  • Check for zombie processes on the host system

    ps aux | awk '\$8 ~ /^[Zz]/'
    
  • To identify parent processes of zombies

    ps -A -ostat,pid,ppid | grep -e '[Zz]'
    
  • Monitor the number of zombie processes over time. A normal situation includes:

    • A small number of zombies (1-3).

    • Stable or periodically clearing zombie count.

    • Parent processes actively running.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments