Seeing occurrences of high read I/O on docker dedicated device. Need some help with RCA
Environment
- OpenShift Container Platform 3.4.0
- various applications including kibana-proxy
Issue
We're seeing incidents of high read I/O (150MB/sec+) on the dedicated docker storage device across different nodes in different environments. The troubleshooting and diagnosis indicates a number of different pods so it doesn't seem consistent, frequently but not at specific times.
Resolution
In later OpenShift versions kibana-proxy memory limit was increased from 96M to 256M to avoid OOM conditions
Other applications should be evaluated on their memory usage and requirements.
Root Cause
This appears to be a symptom of containers reaching their memory limit.
From the analysis of VMCore captured when kibana-proxy reached this state.
The process was gradually allocating 'anonymous memory' (data and stack). As the memory allocation got close to the cgroup's limit, the process began to reclaim pages that contain cached file data. Those were the only kind of pages that could be reclaimed because - due to lack of a swap device - it was impossible to swap out and reclaim modified (dirty) 'anonymous memory' pages. After a while the process got to a point where it was only possible to allocate more pages to data and stack if cached pages from the '/usr/bin/node' binary image were reclaimed aggressively. Eventually the number of pages from the '/usr/bin/node' binary image that could be kept cached in memory got so small that the likelihood of a page fault during instruction fetch increased significantly. In order to handle the page fault it was necessary to reclaim pages that contain cached file data - probably from the '/usr/bin/node' binary image again - so the process ended up thrashing while trying to fault in pages from the binary image.
This would explain the simultaneous increase in page faults and disk read I/O.
Diagnostic Steps
VMCore was capture while kibana-proxy container reached the memory limit and before it was oom killed.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
