OOM-killer is killing the pods before reaching the set memory limits
Issue
- We have enough memory set in limits but still pods are getting restarted with the following error patten:
Feb 18 14:51:02 <dummy_host_name> kubenswrapper[3098]: I0218 14:51:02.611147 3098 prober.go:107] "Probe failed" probeType="Readiness" pod="cp-8778096/deployment-ocr-service-56d8485bc5-285sr" podUID=fb7a0eb9-3cf2-4fa4-83ef-92d528a00c8f containerName="ocr-service" probeResult=failure output=""
Feb 18 14:51:03 <dummy_host_name> kubenswrapper[3098]: I0218 14:51:03.510236 3098 kubelet.go:2457] "SyncLoop (PLEG): event for pod" pod="cp-8778096/deployment-ocr-service-56d8485bc5-285sr" event=&{ID:fb7a0eb9-3cf2-4fa4-83ef-92d528a00c8f Type:ContainerDied Data:0a0c10184525fc884caf41926d6ba8b199ad5e6599c493ec8f1824a97d80dd83}
Feb 18 14:51:04 <dummy_host_name> kubenswrapper[3098]: I0218 14:51:04.517220 3098 kubelet.go:2457] "SyncLoop (PLEG): event for pod" pod="cp-8778096/deployment-ocr-service-56d8485bc5-285sr" event=&{ID:fb7a0eb9-3cf2-4fa4-83ef-92d528a00c8f Type:ContainerStarted Data:99f4ae4a1e0a9fb228d6a0c41b70fa2b12a3e8aaf154c8b8ac4810feaa9e96b2}
Feb 18 14:51:04 <dummy_host_name> kubenswrapper[3098]: I0218 14:51:04.517547 3098 kubelet.go:2529] "SyncLoop (probe)" probe="readiness" status="" pod="cp-8778096/deployment-ocr-service-56d8485bc5-285sr"
Feb 18 14:52:12 <dummy_host_name> kubenswrapper[3098]: I0218 14:52:12.829921 3098 kubelet.go:2529] "SyncLoop (probe)" probe="readiness" status="ready" pod="cp-8778096/deployment-ocr-service-56d8485bc5-285sr"
- I have investigated the recent OOMKilled events affecting the
deployment-ocr-service- 56d8485bc5pods, and based on my findings, it appears that the issue is related to cgroup memory enforcement or kernel-level memory constraints. Below is a summary of my observations along with the commands used to gather this information.
sh-5.1# journalctl -k | grep -i 'oom'
Feb 18 14:50:20 <dummy_host_name> kernel: grpcpp_sync_ser invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=999
Feb 18 14:50:20 <dummy_host_name> kernel: oom_kill_process.cold+0xb/0x10
Feb 18 14:50:20 <dummy_host_name> kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Feb 18 14:50:20 <dummy_host_name> kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-b17829636736dcb7562ca347ec3941d5b37d3937e3d3398a05ebba950e87402d.scope,mems_allowed=0-3,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podac0670f0_c67f_4e95_bef1_4794f27f2724.slice/crio-b17829636736dcb7562ca347ec3941d5b37d3937e3d3398a05ebba950e87402d.scope,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podac0670f0_c67f_4e95_bef1_4794f27f2724.slice/crio-b17829636736dcb7562ca347ec3941d5b37d3937e3d3398a05ebba950e87402d.scope,task=server,pid=2952361,uid=1007380000
Feb 18 14:50:20 <dummy_host_name> kernel: Memory cgroup out of memory: Killed process 2952361 (server) total-vm:3878812kB, anon-rss:2054688kB, file-rss:87732kB, shmem-rss:0kB, UID:1007380000 pgtables:5204kB oom_score_adj:999
Feb 18 16:20:50 <dummy_host_name> kernel: slkd-events invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=999
Feb 18 16:20:50 <dummy_host_name> kernel: oom_kill_process.cold+0xb/0x10
Feb 18 16:20:50 <dummy_host_name> kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Feb 18 16:20:50 <dummy_host_name> kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope,mems_allowed=0-3,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope/aqua-general,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope/aqua-general,task=slkaudit,pid=2153011,uid=0
Feb 18 16:20:50 <dummy_host_name> kernel: Memory cgroup out of memory: Killed process 2153011 (slkaudit) total-vm:1605916kB, anon-rss:355512kB, file-rss:13020kB, shmem-rss:0kB, UID:0 pgtables:864kB oom_score_adj:999
Feb 18 19:43:57 <dummy_host_name> kernel: slkd-events invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=999
Feb 18 19:43:57 <dummy_host_name> kernel: oom_kill_process.cold+0xb/0x10
Feb 18 19:43:57 <dummy_host_name> kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Feb 18 19:43:57 <dummy_host_name> kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope,mems_allowed=0-3,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope/aqua-general,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope/aqua-general,task=slkaudit,pid=1132572,uid=0
Feb 18 19:43:57 <dummy_host_name> kernel: Memory cgroup out of memory: Killed process 1132572 (slkaudit) total-vm:1599468kB, anon-rss:355992kB, file-rss:12316kB, shmem-rss:0kB, UID:0 pgtables:868kB oom_score_adj:999
Feb 19 10:42:55 <dummy_host_name> kernel: slkd-events invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=999
Feb 19 10:42:55 <dummy_host_name> kernel: oom_kill_process.cold+0xb/0x10
Feb 19 10:42:55 <dummy_host_name> kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Feb 19 10:42:55 <dummy_host_name> kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope,mems_allowed=0-3,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope/aqua-general,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope/aqua-general,task=slkaudit,pid=3695530,uid=0
Feb 19 10:42:55 <dummy_host_name> kernel: Memory cgroup out of memory: Killed process 3695530 (slkaudit) total-vm:1598620kB, anon-rss:350092kB, file-rss:12788kB, shmem-rss:0kB, UID:0 pgtables:828kB oom_score_adj:999
Feb 19 14:51:41 <dummy_host_name> kernel: grpcpp_sync_ser invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=999
Feb 19 14:51:41 <dummy_host_name> kernel: oom_kill_process.cold+0xb/0x10
Feb 19 14:51:41 <dummy_host_name> kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Feb 19 14:51:41 <dummy_host_name> kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-d13fb3d68717cc4629440257bff24b8636d53b3869718ed5e96e66d2608215a0.scope,mems_allowed=0-3,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podac0670f0_c67f_4e95_bef1_4794f27f2724.slice/crio-d13fb3d68717cc4629440257bff24b8636d53b3869718ed5e96e66d2608215a0.scope,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podac0670f0_c67f_4e95_bef1_4794f27f2724.slice/crio-d13fb3d68717cc4629440257bff24b8636d53b3869718ed5e96e66d2608215a0.scope,task=server,pid=4076948,uid=1007380000
Feb 19 14:51:41 <dummy_host_name> kernel: Memory cgroup out of memory: Killed process 4076948 (server) total-vm:4113332kB, anon-rss:2048136kB, file-rss:86048kB, shmem-rss:0kB, UID:1007380000 pgtables:5168kB oom_score_adj:999
Feb 19 15:00:15 <dummy_host_name> kernel: slkd-events invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=999
Feb 19 15:00:15 <dummy_host_name> kernel: oom_kill_process.cold+0xb/0x10
Feb 19 15:00:15 <dummy_host_name> kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Feb 19 15:00:15 <dummy_host_name> kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope,mems_allowed=0-3,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope/aqua-general,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope/aqua-general,task=slkaudit,pid=2579454,uid=0
Feb 19 15:00:15 <dummy_host_name> kernel: Memory cgroup out of memory: Killed process 2579454 (slkaudit) total-vm:1600916kB, anon-rss:346308kB, file-rss:12700kB, shmem-rss:0kB, UID:0 pgtables:828kB oom_score_adj:999
This suggests that the kernel enforced a memory limit on the pod using cgroups when there was no memory pressure on the node:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Wed, 19 Feb 2025 12:31:36 -0500 Thu, 21 Nov 2024 21:58:34 -0500 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 19 Feb 2025 12:31:36 -0500 Thu, 21 Nov 2024 21:58:34 -0500 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 19 Feb 2025 12:31:36 -0500 Thu, 21 Nov 2024 21:58:34 -0500 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Wed, 19 Feb 2025 12:31:36 -0500 Thu, 21 Nov 2024 21:58:39 -0500 KubeletReady kubelet is posting ready status=
Below are the set cgroup memory limits:
sh-5.1# cd /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/
sh-5.1# cat memory.usage_in_bytes
883216384
sh-5.1# cat memory.limit_in_bytes
1073741824
sh-5.1# dmesg | grep -i 'cgroup'
[7222562.290606] Memory cgroup out of memory: Killed process 170485 (nginx) total-vm:175116kB, anon-rss:7912kB, file-rss:10180kB, shmem-rss:44kB, UID:1007380000 pgtables:108kB oom_score_adj:999
[7224209.636204] mem_cgroup_out_of_memory+0x13a/0x150
[7224209.636224] __mem_cgroup_charge+0x29/0x80
[7224209.636424] Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod011c2b90_9ef3_47a2_9a62_431ed65d968a.slice/crio-e1b51ab7fd34877929d4d16b6dd04a22df0654f0d9f9a7a439b021095d90ede6.scope:
[7224209.636601] Memory cgroup out of memory: Killed process 2571113 (nginx) total-vm:174988kB, anon-rss:7900kB, file-rss:10180kB, shmem-rss:44kB, UID:1007380000 pgtables:108kB oom_score_adj:999
[7224342.441710] mem_cgroup_out_of_memory+0x13a/0x150
[7224342.441724] __mem_cgroup_charge+0x29/0x80
[7224342.441939] Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod011c2b90_9ef3_47a2_9a62_431ed65d968a.slice/crio-e1b51ab7fd34877929d4d16b6dd04a22df0654f0d9f9a7a439b021095d90ede6.scope:
[7224342.442175] Memory cgroup out of memory: Killed process 382470 (nginx) total-vm:175008kB, anon-rss:7900kB, file-rss:10180kB, shmem-rss:44kB, UID:1007380000 pgtables:108kB oom_score_adj:999
[7225390.508329] mem_cgroup_out_of_memory+0x13a/0x150
[7225390.508342] __mem_cgroup_charge+0x29/0x80
[7225390.508531] Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod011c2b90_9ef3_47a2_9a62_431ed65d968a.slice/crio-e1b51ab7fd34877929d4d16b6dd04a22df0654f0d9f9a7a439b021095d90ede6.scope:
[7225390.508782] Memory cgroup out of memory: Killed process 3954356 (nginx) total-vm:175084kB, anon-rss:7904kB, file-rss:10180kB, shmem-rss:44kB, UID:1007380000 pgtables:108kB oom_score_adj:999
- This just confirms that the pod hit its memory limit before being killed.
- System-wide OOM Investigation (Kernel Ring Buffer):
[7663408.736973] Memory cgroup out of memory: Killed process 1132572 (slkaudit) total-vm:1599468kB, anon-rss:355992kB, file-rss:12316kB, shmem-rss:0kB, UID:0 pgtables:868kB oom_score_adj:999
[7717345.785738] slkd-events invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=999
[7717345.785765] oom_kill_process.cold+0xb/0x10
[7717345.785844] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[7717345.785894] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope,mems_allowed=0-3,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope/aqua-general,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope/aqua-general,task=slkaudit,pid=3695530,uid=0
[7717345.786058] Memory cgroup out of memory: Killed process 3695530 (slkaudit) total-vm:1598620kB, anon-rss:350092kB, file-rss:12788kB, shmem-rss:0kB, UID:0 pgtables:828kB oom_score_adj:999
[7732271.195839] grpcpp_sync_ser invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=999
[7732271.195890] oom_kill_process.cold+0xb/0x10
[7732271.196170] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[7732271.196184] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-d13fb3d68717cc4629440257bff24b8636d53b3869718ed5e96e66d2608215a0.scope,mems_allowed=0-3,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podac0670f0_c67f_4e95_bef1_4794f27f2724.slice/crio-d13fb3d68717cc4629440257bff24b8636d53b3869718ed5e96e66d2608215a0.scope,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podac0670f0_c67f_4e95_bef1_4794f27f2724.slice/crio-d13fb3d68717cc4629440257bff24b8636d53b3869718ed5e96e66d2608215a0.scope,task=server,pid=4076948,uid=1007380000
[7732271.196337] Memory cgroup out of memory: Killed process 4076948 (server) total-vm:4113332kB, anon-rss:2048136kB, file-rss:86048kB, shmem-rss:0kB, UID:1007380000 pgtables:5168kB oom_score_adj:999
[7732785.916979] slkd-events invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=999
[7732785.917007] oom_kill_process.cold+0xb/0x10
[7732785.917091] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[7732785.917143] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope,mems_allowed=0-3,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope/aqua-general,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope/aqua-general,task=slkaudit,pid=2579454,uid=0
[7732785.917288] Memory cgroup out of memory: Killed process 2579454 (slkaudit) total-vm:1600916kB, anon-rss:346308kB, file-rss:12700kB, shmem-rss:0kB, UID:0 pgtables:828kB oom_score_adj:999
- This indicates that the Linux OOM Killer terminated the process due to insufficient memory but in reality pods never hit the set the memory limit.
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.