OpenShift node becomes unusable when Cgroups limits are overstepped and OOMKiller takes action

Solution Verified - Updated -

Issue

  • Node becomes unusable and kernel panics while dmesg shows mem_cgroup_out_of_memory messages like this:
[70832.855067] [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[70832.865451] [2526562]     0 2526562    35869      701   172032        0         -1000 conmon
[70832.876714] [2526563]     0 2526563   383592     5494   249856        0         -1000 runc
[70832.886029] [2526971]     0 2526971     5026     1122    69632        0         -1000 6
[70832.895072] [2659569]     0 2659569    35869      688   184320        0         -1000 conmon
[70832.904535] [2791719]     0 2791719   272658     4591   172032        0         -1000 runc
[70832.913943] [1564010]     0 1564010    35869      635   172032        0         -1000 conmon
[70832.923440] [1610674]     0 1610674   272658     2710   172032        0         -1000 runc
[70832.932833] [613308]     0 613308    17436      177   167936        0         -1000 conmon
[70832.942152] [1216290]     0 1216290    17436     1300   180224        0         -1000 conmon
[70832.951612] Out of memory and no killable processes...
[70846.871930] runc invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, oom_score_adj=-1000
[70846.883324] runc cpuset=/ mems_allowed=0
[70846.887287] CPU: 52 PID: 4120334 Comm: runc Tainted: P        W  OE    --------- -  - 4.18.0-193.41.1.el8_2.x86_64 #1
[70846.897906] Hardware name: Dell Inc. PowerEdge R6515/0R4CNN, BIOS 1.4.8 05/06/2020
[70846.905486] Call Trace:
[70846.907951]  dump_stack+0x5c/0x80
[70846.911277]  dump_header+0x6e/0x27a
[70846.914775]  out_of_memory.cold.31+0x39/0x87
[70846.919058]  mem_cgroup_out_of_memory+0x49/0x80
[70846.923598]  try_charge+0x58c/0x780
[70846.927097]  __memcg_kmem_charge_memcg+0x33/0x90
[70846.931730]  new_slab+0x96d/0xb80
[70846.935054]  ? finish_task_switch+0xd7/0x2b0
[70846.939336]  ___slab_alloc+0x36b/0x4e0
[70846.943097]  ? vm_area_alloc+0x1a/0x40
[70846.946861]  ? futex_wait_queue_me+0xd3/0x120
[70846.951226]  ? futex_wait+0x18a/0x240
[70846.954900]  ? vm_area_alloc+0x1a/0x40
[70846.958660]  __slab_alloc+0x1c/0x30
[70846.962162]  kmem_cache_alloc+0x183/0x1b0
[70846.966183]  vm_area_alloc+0x1a/0x40
[70846.969771]  mmap_region+0x325/0x630
[70846.973375]  do_mmap+0x38b/0x500
[70846.976649]  vm_mmap_pgoff+0xd2/0x120
[70846.980362]  ksys_mmap_pgoff+0x59/0x270
[70846.984229]  do_syscall_64+0x5b/0x1a0
[70846.987905]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[70846.992966] RIP: 0033:0x7f13453e58c7
[70846.996553] Code: 54 41 89 d4 55 48 89 fd 53 4c 89 cb 48 85 ff 74 52 49 89 d9 45 89 f8 45 89 f2 44 89 e2 4c 89 ee 48 89 ef b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 79 5b 5d 41 5c 41 5d 41 5e 41 5f c3 66 0f 1f
[70847.015339] RSP: 002b:00007ffd97300c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
[70847.022921] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f13453e58c7
[70847.030068] RDX: 0000000000000000 RSI: 0000000000801000 RDI: 0000000000000000
[70847.037210] RBP: 0000000000000000 R08: 00000000ffffffff R09: 0000000000000000
[70847.044352] R10: 0000000000020022 R11: 0000000000000246 R12: 0000000000000000
[70847.051503] R13: 0000000000801000 R14: 0000000000020022 R15: 00000000ffffffff
[70847.058703] Memory limit reached of cgroup /kubepods.slice/kubepods-pod4e711907_b623_4a65_889c_88b0fdfc2c08.slice
[70847.068984] memory: usage 51660kB, limit 51200kB, failcnt 61410
[70847.075074] memory+swap: usage 51660kB, limit 9007199254740988kB, failcnt 0
[70847.082871] kmem: usage 8848kB, limit 9007199254740988kB, failcnt 0
  • The node has more than enough memory available.

  • Node goes into NotReady state until kubelet is restarted manually

  • High CPU and memory usage by kubelet:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
(...)
   4776 root      20   0   74.4g   1.4g  63396 S 167.8   0.2 866:17.30 kubelet
(...)

Environment

  • Red Hat OpenShift Container Platform
    • 4.5
    • 4.6
    • 4.7

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content