OOM-killer is killing the pods before reaching the set memory limits

Solution Verified - Updated -

Issue

  • We have enough memory set in limits but still pods are getting restarted with the following error patten:
Feb 18 14:51:02 <dummy_host_name> kubenswrapper[3098]: I0218 14:51:02.611147 3098 prober.go:107] "Probe failed" probeType="Readiness" pod="cp-8778096/deployment-ocr-service-56d8485bc5-285sr" podUID=fb7a0eb9-3cf2-4fa4-83ef-92d528a00c8f containerName="ocr-service" probeResult=failure output=""
Feb 18 14:51:03 <dummy_host_name> kubenswrapper[3098]: I0218 14:51:03.510236 3098 kubelet.go:2457] "SyncLoop (PLEG): event for pod" pod="cp-8778096/deployment-ocr-service-56d8485bc5-285sr" event=&{ID:fb7a0eb9-3cf2-4fa4-83ef-92d528a00c8f Type:ContainerDied Data:0a0c10184525fc884caf41926d6ba8b199ad5e6599c493ec8f1824a97d80dd83}
Feb 18 14:51:04 <dummy_host_name> kubenswrapper[3098]: I0218 14:51:04.517220 3098 kubelet.go:2457] "SyncLoop (PLEG): event for pod" pod="cp-8778096/deployment-ocr-service-56d8485bc5-285sr" event=&{ID:fb7a0eb9-3cf2-4fa4-83ef-92d528a00c8f Type:ContainerStarted Data:99f4ae4a1e0a9fb228d6a0c41b70fa2b12a3e8aaf154c8b8ac4810feaa9e96b2}
Feb 18 14:51:04 <dummy_host_name> kubenswrapper[3098]: I0218 14:51:04.517547 3098 kubelet.go:2529] "SyncLoop (probe)" probe="readiness" status="" pod="cp-8778096/deployment-ocr-service-56d8485bc5-285sr"
Feb 18 14:52:12 <dummy_host_name> kubenswrapper[3098]: I0218 14:52:12.829921 3098 kubelet.go:2529] "SyncLoop (probe)" probe="readiness" status="ready" pod="cp-8778096/deployment-ocr-service-56d8485bc5-285sr"  
  • I have investigated the recent OOMKilled events affecting the deployment-ocr-service- 56d8485bc5 pods, and based on my findings, it appears that the issue is related to cgroup memory enforcement or kernel-level memory constraints. Below is a summary of my observations along with the commands used to gather this information.
sh-5.1# journalctl -k | grep -i 'oom'

Feb 18 14:50:20 <dummy_host_name> kernel: grpcpp_sync_ser invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=999
Feb 18 14:50:20 <dummy_host_name> kernel:  oom_kill_process.cold+0xb/0x10
Feb 18 14:50:20 <dummy_host_name> kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Feb 18 14:50:20 <dummy_host_name> kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-b17829636736dcb7562ca347ec3941d5b37d3937e3d3398a05ebba950e87402d.scope,mems_allowed=0-3,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podac0670f0_c67f_4e95_bef1_4794f27f2724.slice/crio-b17829636736dcb7562ca347ec3941d5b37d3937e3d3398a05ebba950e87402d.scope,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podac0670f0_c67f_4e95_bef1_4794f27f2724.slice/crio-b17829636736dcb7562ca347ec3941d5b37d3937e3d3398a05ebba950e87402d.scope,task=server,pid=2952361,uid=1007380000
Feb 18 14:50:20 <dummy_host_name> kernel: Memory cgroup out of memory: Killed process 2952361 (server) total-vm:3878812kB, anon-rss:2054688kB, file-rss:87732kB, shmem-rss:0kB, UID:1007380000 pgtables:5204kB oom_score_adj:999
Feb 18 16:20:50 <dummy_host_name> kernel: slkd-events invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=999
Feb 18 16:20:50 <dummy_host_name> kernel:  oom_kill_process.cold+0xb/0x10
Feb 18 16:20:50 <dummy_host_name> kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Feb 18 16:20:50 <dummy_host_name> kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope,mems_allowed=0-3,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope/aqua-general,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope/aqua-general,task=slkaudit,pid=2153011,uid=0
Feb 18 16:20:50 <dummy_host_name> kernel: Memory cgroup out of memory: Killed process 2153011 (slkaudit) total-vm:1605916kB, anon-rss:355512kB, file-rss:13020kB, shmem-rss:0kB, UID:0 pgtables:864kB oom_score_adj:999
Feb 18 19:43:57 <dummy_host_name> kernel: slkd-events invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=999
Feb 18 19:43:57 <dummy_host_name> kernel:  oom_kill_process.cold+0xb/0x10
Feb 18 19:43:57 <dummy_host_name> kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Feb 18 19:43:57 <dummy_host_name> kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope,mems_allowed=0-3,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope/aqua-general,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope/aqua-general,task=slkaudit,pid=1132572,uid=0
Feb 18 19:43:57 <dummy_host_name> kernel: Memory cgroup out of memory: Killed process 1132572 (slkaudit) total-vm:1599468kB, anon-rss:355992kB, file-rss:12316kB, shmem-rss:0kB, UID:0 pgtables:868kB oom_score_adj:999
Feb 19 10:42:55 <dummy_host_name> kernel: slkd-events invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=999
Feb 19 10:42:55 <dummy_host_name> kernel:  oom_kill_process.cold+0xb/0x10
Feb 19 10:42:55 <dummy_host_name> kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Feb 19 10:42:55 <dummy_host_name> kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope,mems_allowed=0-3,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope/aqua-general,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope/aqua-general,task=slkaudit,pid=3695530,uid=0
Feb 19 10:42:55 <dummy_host_name> kernel: Memory cgroup out of memory: Killed process 3695530 (slkaudit) total-vm:1598620kB, anon-rss:350092kB, file-rss:12788kB, shmem-rss:0kB, UID:0 pgtables:828kB oom_score_adj:999
Feb 19 14:51:41 <dummy_host_name> kernel: grpcpp_sync_ser invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=999
Feb 19 14:51:41 <dummy_host_name> kernel:  oom_kill_process.cold+0xb/0x10
Feb 19 14:51:41 <dummy_host_name> kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Feb 19 14:51:41 <dummy_host_name> kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-d13fb3d68717cc4629440257bff24b8636d53b3869718ed5e96e66d2608215a0.scope,mems_allowed=0-3,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podac0670f0_c67f_4e95_bef1_4794f27f2724.slice/crio-d13fb3d68717cc4629440257bff24b8636d53b3869718ed5e96e66d2608215a0.scope,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podac0670f0_c67f_4e95_bef1_4794f27f2724.slice/crio-d13fb3d68717cc4629440257bff24b8636d53b3869718ed5e96e66d2608215a0.scope,task=server,pid=4076948,uid=1007380000
Feb 19 14:51:41 <dummy_host_name> kernel: Memory cgroup out of memory: Killed process 4076948 (server) total-vm:4113332kB, anon-rss:2048136kB, file-rss:86048kB, shmem-rss:0kB, UID:1007380000 pgtables:5168kB oom_score_adj:999
Feb 19 15:00:15 <dummy_host_name> kernel: slkd-events invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=999
Feb 19 15:00:15 <dummy_host_name> kernel:  oom_kill_process.cold+0xb/0x10
Feb 19 15:00:15 <dummy_host_name> kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Feb 19 15:00:15 <dummy_host_name> kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope,mems_allowed=0-3,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope/aqua-general,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope/aqua-general,task=slkaudit,pid=2579454,uid=0
Feb 19 15:00:15 <dummy_host_name> kernel: Memory cgroup out of memory: Killed process 2579454 (slkaudit) total-vm:1600916kB, anon-rss:346308kB, file-rss:12700kB, shmem-rss:0kB, UID:0 pgtables:828kB oom_score_adj:999

This suggests that the kernel enforced a memory limit on the pod using cgroups when there was no memory pressure on the node:

 Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Wed, 19 Feb 2025 12:31:36 -0500   Thu, 21 Nov 2024 21:58:34 -0500   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 19 Feb 2025 12:31:36 -0500   Thu, 21 Nov 2024 21:58:34 -0500   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 19 Feb 2025 12:31:36 -0500   Thu, 21 Nov 2024 21:58:34 -0500   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Wed, 19 Feb 2025 12:31:36 -0500   Thu, 21 Nov 2024 21:58:39 -0500   KubeletReady                 kubelet is posting ready status=

Below are the set cgroup memory limits:

sh-5.1# cd /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/
sh-5.1# cat memory.usage_in_bytes
883216384
sh-5.1# cat memory.limit_in_bytes
1073741824

sh-5.1# dmesg | grep -i 'cgroup'

[7222562.290606] Memory cgroup out of memory: Killed process 170485 (nginx) total-vm:175116kB, anon-rss:7912kB, file-rss:10180kB, shmem-rss:44kB, UID:1007380000 pgtables:108kB oom_score_adj:999
[7224209.636204]  mem_cgroup_out_of_memory+0x13a/0x150
[7224209.636224]  __mem_cgroup_charge+0x29/0x80
[7224209.636424] Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod011c2b90_9ef3_47a2_9a62_431ed65d968a.slice/crio-e1b51ab7fd34877929d4d16b6dd04a22df0654f0d9f9a7a439b021095d90ede6.scope:
[7224209.636601] Memory cgroup out of memory: Killed process 2571113 (nginx) total-vm:174988kB, anon-rss:7900kB, file-rss:10180kB, shmem-rss:44kB, UID:1007380000 pgtables:108kB oom_score_adj:999
[7224342.441710]  mem_cgroup_out_of_memory+0x13a/0x150
[7224342.441724]  __mem_cgroup_charge+0x29/0x80
[7224342.441939] Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod011c2b90_9ef3_47a2_9a62_431ed65d968a.slice/crio-e1b51ab7fd34877929d4d16b6dd04a22df0654f0d9f9a7a439b021095d90ede6.scope:
[7224342.442175] Memory cgroup out of memory: Killed process 382470 (nginx) total-vm:175008kB, anon-rss:7900kB, file-rss:10180kB, shmem-rss:44kB, UID:1007380000 pgtables:108kB oom_score_adj:999
[7225390.508329]  mem_cgroup_out_of_memory+0x13a/0x150
[7225390.508342]  __mem_cgroup_charge+0x29/0x80
[7225390.508531] Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod011c2b90_9ef3_47a2_9a62_431ed65d968a.slice/crio-e1b51ab7fd34877929d4d16b6dd04a22df0654f0d9f9a7a439b021095d90ede6.scope:
[7225390.508782] Memory cgroup out of memory: Killed process 3954356 (nginx) total-vm:175084kB, anon-rss:7904kB, file-rss:10180kB, shmem-rss:44kB, UID:1007380000 pgtables:108kB oom_score_adj:999
  • This just confirms that the pod hit its memory limit before being killed.
  • System-wide OOM Investigation (Kernel Ring Buffer):
[7663408.736973] Memory cgroup out of memory: Killed process 1132572 (slkaudit) total-vm:1599468kB, anon-rss:355992kB, file-rss:12316kB, shmem-rss:0kB, UID:0 pgtables:868kB oom_score_adj:999
[7717345.785738] slkd-events invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=999
[7717345.785765]  oom_kill_process.cold+0xb/0x10
[7717345.785844] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[7717345.785894] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope,mems_allowed=0-3,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope/aqua-general,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope/aqua-general,task=slkaudit,pid=3695530,uid=0
[7717345.786058] Memory cgroup out of memory: Killed process 3695530 (slkaudit) total-vm:1598620kB, anon-rss:350092kB, file-rss:12788kB, shmem-rss:0kB, UID:0 pgtables:828kB oom_score_adj:999
[7732271.195839] grpcpp_sync_ser invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=999
[7732271.195890]  oom_kill_process.cold+0xb/0x10
[7732271.196170] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[7732271.196184] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-d13fb3d68717cc4629440257bff24b8636d53b3869718ed5e96e66d2608215a0.scope,mems_allowed=0-3,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podac0670f0_c67f_4e95_bef1_4794f27f2724.slice/crio-d13fb3d68717cc4629440257bff24b8636d53b3869718ed5e96e66d2608215a0.scope,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podac0670f0_c67f_4e95_bef1_4794f27f2724.slice/crio-d13fb3d68717cc4629440257bff24b8636d53b3869718ed5e96e66d2608215a0.scope,task=server,pid=4076948,uid=1007380000
[7732271.196337] Memory cgroup out of memory: Killed process 4076948 (server) total-vm:4113332kB, anon-rss:2048136kB, file-rss:86048kB, shmem-rss:0kB, UID:1007380000 pgtables:5168kB oom_score_adj:999
[7732785.916979] slkd-events invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=999
[7732785.917007]  oom_kill_process.cold+0xb/0x10
[7732785.917091] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[7732785.917143] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope,mems_allowed=0-3,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope/aqua-general,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod9f2217bb_781a_4435_9c5d_281dd2e7966a.slice/crio-2a9ba9453e0677014bcec918d96d39211a56cd412e4fd658c5f4899e9e1fc4ea.scope/aqua-general,task=slkaudit,pid=2579454,uid=0
[7732785.917288] Memory cgroup out of memory: Killed process 2579454 (slkaudit) total-vm:1600916kB, anon-rss:346308kB, file-rss:12700kB, shmem-rss:0kB, UID:0 pgtables:828kB oom_score_adj:999
  • This indicates that the Linux OOM Killer terminated the process due to insufficient memory but in reality pods never hit the set the memory limit.

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content