Too many page allocation failure happen with __kmalloc() requests from ib_setup_port_attrs() which is a function from [ib_core] (Infiniband) driver

Solution Verified - Updated -

Issue

  • Too many page allocation failure happen with __kmalloc() requests from ib_setup_port_attrs() which is a function from [ib_core] (Infiniband) driver.
$ grep page\ allocation sos_commands/kernel/dmesg | awk '{c="";for(i=2;i<=NF;i++) c=c $i" "; print c}'| sort | uniq -c
      1 (ostnamed): page allocation failure: order:10, mode:0x60c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0-1 
    173 pinns: page allocation failure: order:10, mode:0x60c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0-1 
  • Almost all page allocation failures are the ones that happened with __kmalloc() requests from ib_setup_port_attrs() which is a function from [ib_core] (Infiniband) driver.
$ cat sos_commands/kernel/dmesg | grep Call\ Trace -A12 --no-group-separator | awk '{c="";for(i=2;i<=NF;i++) c=c $i" "; print c}' | grep ib_setup_port_attrs -B1 --no-group-separator | paste - - | wc -l
173
[2153330.863053] pinns: page allocation failure: order:10, mode:0x60c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0-1
[2153330.877775] CPU: 8 PID: 456256 Comm: pinns Not tainted 4.18.0-372.51.1.el8_6.x86_64 #1
[2153330.886828] Hardware name: Lenovo ThinkSystem SR630 V2/7Z71CTO1WW, BIOS AFE118P-1.33 08/24/2022
[2153330.896753] Call Trace:
[2153330.899689]  dump_stack+0x41/0x60
[2153330.903595]  warn_alloc.cold.120+0x7b/0x115
[2153330.908475]  ? _cond_resched+0x15/0x30
[2153330.912860]  ? __alloc_pages_direct_compact+0x15f/0x170
[2153330.918903]  __alloc_pages_slowpath+0xc59/0xca0
[2153330.924167]  ? kernfs_activate+0x63/0x80
[2153330.928752]  __alloc_pages_nodemask+0x2e2/0x320
[2153330.934013]  kmalloc_order+0x28/0x90
[2153330.938211]  kmalloc_order_trace+0x1d/0xb0
[2153330.942988]  __kmalloc+0x203/0x250
[2153330.946994]  ib_setup_port_attrs+0xd4/0x8b0 [ib_core] <<-------
[2153330.952862]  ? klist_add_tail+0x3b/0x70
[2153330.957361]  add_one_compat_dev.part.25+0x1ab/0x220 [ib_core]
[2153330.963997]  rdma_dev_init_net+0xf5/0x1a0 [ib_core]
[2153330.969664]  ops_init+0x3a/0x110
[2153330.973476]  setup_net+0xee/0x260
[2153330.977381]  copy_net_ns+0xc3/0x190
[2153330.981479]  create_new_namespaces+0x174/0x210
[2153330.986650]  unshare_nsproxy_namespaces+0x55/0xb0
[2153330.992109]  ksys_unshare+0x19b/0x360
[2153330.996401]  __x64_sys_unshare+0xe/0x20
[2153331.000887]  do_syscall_64+0x5b/0x1b0
[2153331.005190]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[2153331.011044] RIP: 0033:0x44662b
[2153331.014665] Code: 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
[2153331.035858] RSP: 002b:00007ffe594c5a58 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
[2153331.044524] RAX: ffffffffffffffda RBX: 000000004c000000 RCX: 000000000044662b
[2153331.052699] RDX: 0000000000487288 RSI: 00000000004871f8 RDI: 000000004c000000
[2153331.060885] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
[2153331.069065] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000a
[2153331.077240] R13: 00007ffe594c6e82 R14: 00007ffe594c5c28 R15: 00007ffe594c6e8e
[2153331.085604] Mem-Info:
[2153331.088352] active_anon:5918 inactive_anon:5384979 isolated_anon:0
                  active_file:583650 inactive_file:182969 isolated_file:0
                  unevictable:74420 dirty:106 writeback:30
                  slab_reclaimable:125277 slab_unreclaimable:930359
                  mapped:444746 shmem:19994 pagetables:33625 bounce:0
                  free:355081 free_pcp:505 free_cma:0
[2153331.130177] Node 0 active_anon:11932kB inactive_anon:10202700kB active_file:1377576kB inactive_file:422012kB unevictable:139924kB isolated(anon):0kB isolated(file):0kB mapped:1044600kB dirty:332kB writeback:112kB shmem:22924kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 3954688kB writeback_tmp:0kB kernel_stack:46488kB pagetables:71372kB all_unreclaimable? no
[2153331.167465] Node 1 active_anon:11740kB inactive_anon:11339288kB active_file:969828kB inactive_file:304432kB unevictable:157756kB isolated(anon):0kB isolated(file):0kB mapped:734384kB dirty:92kB writeback:8kB shmem:57052kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 5771264kB writeback_tmp:0kB kernel_stack:37864kB pagetables:63128kB all_unreclaimable? no
[2153331.204267] Node 0 DMA free:11264kB min:40kB low:52kB high:64kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[2153331.231629] lowmem_reserve[]: 0 2643 15658 15658 15658
[2153331.238082] Node 0 DMA32 free:126636kB min:7264kB low:9880kB high:12496kB active_anon:120kB inactive_anon:2203092kB active_file:13360kB inactive_file:6592kB unevictable:40kB writepending:0kB present:2773584kB managed:2706816kB mlocked:40kB bounce:0kB free_pcp:8kB local_pcp:0kB free_cma:0kB
[2153331.268571] lowmem_reserve[]: 0 0 13014 13014 13014
[2153331.274768] Node 0 Normal free:436844kB min:36972kB low:50288kB high:63604kB active_anon:11812kB inactive_anon:7999932kB active_file:1372748kB inactive_file:412608kB unevictable:139884kB writepending:444kB present:13631488kB managed:13327036kB mlocked:136812kB bounce:0kB free_pcp:1636kB local_pcp:0kB free_cma:0kB
[2153331.307700] lowmem_reserve[]: 0 0 0 0 0
[2153331.312777] Node 1 Normal free:816420kB min:45828kB low:62336kB high:78844kB active_anon:11740kB inactive_anon:11339288kB active_file:990312kB inactive_file:297528kB unevictable:157756kB writepending:100kB present:16777216kB managed:16508516kB mlocked:157756kB bounce:0kB free_pcp:588kB local_pcp:148kB free_cma:0kB
[2153331.345884] lowmem_reserve[]: 0 0 0 0 0
[2153331.350993] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
[2153331.365550] Node 0 DMA32: 373*4kB (UME) 231*8kB (UME) 212*16kB (UME) 189*32kB (UME) 87*64kB (UME) 76*128kB (UME) 61*256kB (UME) 32*512kB (UME) 23*1024kB (UME) 21*2048kB (M) 0*4096kB = 126636kB
[2153331.386264] Node 0 Normal: 580*4kB (UEH) 154*8kB (UMEH) 75*16kB (UMEH) 61*32kB (UMEH) 1250*64kB (UMEH) 783*128kB (UME) 421*256kB (UME) 144*512kB (UME) 47*1024kB (ME) 5*2048kB (M) 0*4096kB = 426800kB
[2153331.407610] Node 1 Normal: 70*4kB (UE) 6842*8kB (UMEH) 5789*16kB (UMEH) 2699*32kB (UMEH) 1626*64kB (UMEH) 956*128kB (UMEH) 525*256kB (UMEH) 241*512kB (M) 49*1024kB (UMH) 23*2048kB (M) 0*4096kB = 815512kB

    ... The zones no longer have any contiguous blocks of order 10 (4096kB) pages as shown above ...

[2153331.429568] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[2153331.440249] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[2153331.450622] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[2153331.461261] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[2153331.471597] 808412 total pagecache pages
[2153331.476881] 0 pages in swap cache
[2153331.481482] Swap cache stats: add 0, delete 0, find 0/0
[2153331.488223] Free swap  = 0kB
[2153331.492340] Total swap = 0kB
[2153331.496457] 8299570 pages RAM
[2153331.500663] 0 pages HighMem/MovableOnly
[2153331.505836] 160138 pages reserved
[2153331.510419] 0 pages hwpoisoned

Environment

  • Red Hat OpenShift Container Platform 4.11
  • Red Hat CoreOS 8.6.z - kernel-4.18.0-372.51.1.el8_6

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content