Too many page allocation failure happen with __kmalloc() requests from ib_setup_port_attrs() which is a function from [ib_core] (Infiniband) driver
Issue
- Too many page allocation failure happen with __kmalloc() requests from ib_setup_port_attrs() which is a function from [ib_core] (Infiniband) driver.
$ grep page\ allocation sos_commands/kernel/dmesg | awk '{c="";for(i=2;i<=NF;i++) c=c $i" "; print c}'| sort | uniq -c
1 (ostnamed): page allocation failure: order:10, mode:0x60c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0-1
173 pinns: page allocation failure: order:10, mode:0x60c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0-1
- Almost all page allocation failures are the ones that happened with __kmalloc() requests from ib_setup_port_attrs() which is a function from [ib_core] (Infiniband) driver.
$ cat sos_commands/kernel/dmesg | grep Call\ Trace -A12 --no-group-separator | awk '{c="";for(i=2;i<=NF;i++) c=c $i" "; print c}' | grep ib_setup_port_attrs -B1 --no-group-separator | paste - - | wc -l
173
[2153330.863053] pinns: page allocation failure: order:10, mode:0x60c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0-1
[2153330.877775] CPU: 8 PID: 456256 Comm: pinns Not tainted 4.18.0-372.51.1.el8_6.x86_64 #1
[2153330.886828] Hardware name: Lenovo ThinkSystem SR630 V2/7Z71CTO1WW, BIOS AFE118P-1.33 08/24/2022
[2153330.896753] Call Trace:
[2153330.899689] dump_stack+0x41/0x60
[2153330.903595] warn_alloc.cold.120+0x7b/0x115
[2153330.908475] ? _cond_resched+0x15/0x30
[2153330.912860] ? __alloc_pages_direct_compact+0x15f/0x170
[2153330.918903] __alloc_pages_slowpath+0xc59/0xca0
[2153330.924167] ? kernfs_activate+0x63/0x80
[2153330.928752] __alloc_pages_nodemask+0x2e2/0x320
[2153330.934013] kmalloc_order+0x28/0x90
[2153330.938211] kmalloc_order_trace+0x1d/0xb0
[2153330.942988] __kmalloc+0x203/0x250
[2153330.946994] ib_setup_port_attrs+0xd4/0x8b0 [ib_core] <<-------
[2153330.952862] ? klist_add_tail+0x3b/0x70
[2153330.957361] add_one_compat_dev.part.25+0x1ab/0x220 [ib_core]
[2153330.963997] rdma_dev_init_net+0xf5/0x1a0 [ib_core]
[2153330.969664] ops_init+0x3a/0x110
[2153330.973476] setup_net+0xee/0x260
[2153330.977381] copy_net_ns+0xc3/0x190
[2153330.981479] create_new_namespaces+0x174/0x210
[2153330.986650] unshare_nsproxy_namespaces+0x55/0xb0
[2153330.992109] ksys_unshare+0x19b/0x360
[2153330.996401] __x64_sys_unshare+0xe/0x20
[2153331.000887] do_syscall_64+0x5b/0x1b0
[2153331.005190] entry_SYSCALL_64_after_hwframe+0x61/0xc6
[2153331.011044] RIP: 0033:0x44662b
[2153331.014665] Code: 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
[2153331.035858] RSP: 002b:00007ffe594c5a58 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
[2153331.044524] RAX: ffffffffffffffda RBX: 000000004c000000 RCX: 000000000044662b
[2153331.052699] RDX: 0000000000487288 RSI: 00000000004871f8 RDI: 000000004c000000
[2153331.060885] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
[2153331.069065] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000a
[2153331.077240] R13: 00007ffe594c6e82 R14: 00007ffe594c5c28 R15: 00007ffe594c6e8e
[2153331.085604] Mem-Info:
[2153331.088352] active_anon:5918 inactive_anon:5384979 isolated_anon:0
active_file:583650 inactive_file:182969 isolated_file:0
unevictable:74420 dirty:106 writeback:30
slab_reclaimable:125277 slab_unreclaimable:930359
mapped:444746 shmem:19994 pagetables:33625 bounce:0
free:355081 free_pcp:505 free_cma:0
[2153331.130177] Node 0 active_anon:11932kB inactive_anon:10202700kB active_file:1377576kB inactive_file:422012kB unevictable:139924kB isolated(anon):0kB isolated(file):0kB mapped:1044600kB dirty:332kB writeback:112kB shmem:22924kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 3954688kB writeback_tmp:0kB kernel_stack:46488kB pagetables:71372kB all_unreclaimable? no
[2153331.167465] Node 1 active_anon:11740kB inactive_anon:11339288kB active_file:969828kB inactive_file:304432kB unevictable:157756kB isolated(anon):0kB isolated(file):0kB mapped:734384kB dirty:92kB writeback:8kB shmem:57052kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 5771264kB writeback_tmp:0kB kernel_stack:37864kB pagetables:63128kB all_unreclaimable? no
[2153331.204267] Node 0 DMA free:11264kB min:40kB low:52kB high:64kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[2153331.231629] lowmem_reserve[]: 0 2643 15658 15658 15658
[2153331.238082] Node 0 DMA32 free:126636kB min:7264kB low:9880kB high:12496kB active_anon:120kB inactive_anon:2203092kB active_file:13360kB inactive_file:6592kB unevictable:40kB writepending:0kB present:2773584kB managed:2706816kB mlocked:40kB bounce:0kB free_pcp:8kB local_pcp:0kB free_cma:0kB
[2153331.268571] lowmem_reserve[]: 0 0 13014 13014 13014
[2153331.274768] Node 0 Normal free:436844kB min:36972kB low:50288kB high:63604kB active_anon:11812kB inactive_anon:7999932kB active_file:1372748kB inactive_file:412608kB unevictable:139884kB writepending:444kB present:13631488kB managed:13327036kB mlocked:136812kB bounce:0kB free_pcp:1636kB local_pcp:0kB free_cma:0kB
[2153331.307700] lowmem_reserve[]: 0 0 0 0 0
[2153331.312777] Node 1 Normal free:816420kB min:45828kB low:62336kB high:78844kB active_anon:11740kB inactive_anon:11339288kB active_file:990312kB inactive_file:297528kB unevictable:157756kB writepending:100kB present:16777216kB managed:16508516kB mlocked:157756kB bounce:0kB free_pcp:588kB local_pcp:148kB free_cma:0kB
[2153331.345884] lowmem_reserve[]: 0 0 0 0 0
[2153331.350993] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
[2153331.365550] Node 0 DMA32: 373*4kB (UME) 231*8kB (UME) 212*16kB (UME) 189*32kB (UME) 87*64kB (UME) 76*128kB (UME) 61*256kB (UME) 32*512kB (UME) 23*1024kB (UME) 21*2048kB (M) 0*4096kB = 126636kB
[2153331.386264] Node 0 Normal: 580*4kB (UEH) 154*8kB (UMEH) 75*16kB (UMEH) 61*32kB (UMEH) 1250*64kB (UMEH) 783*128kB (UME) 421*256kB (UME) 144*512kB (UME) 47*1024kB (ME) 5*2048kB (M) 0*4096kB = 426800kB
[2153331.407610] Node 1 Normal: 70*4kB (UE) 6842*8kB (UMEH) 5789*16kB (UMEH) 2699*32kB (UMEH) 1626*64kB (UMEH) 956*128kB (UMEH) 525*256kB (UMEH) 241*512kB (M) 49*1024kB (UMH) 23*2048kB (M) 0*4096kB = 815512kB
... The zones no longer have any contiguous blocks of order 10 (4096kB) pages as shown above ...
[2153331.429568] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[2153331.440249] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[2153331.450622] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[2153331.461261] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[2153331.471597] 808412 total pagecache pages
[2153331.476881] 0 pages in swap cache
[2153331.481482] Swap cache stats: add 0, delete 0, find 0/0
[2153331.488223] Free swap = 0kB
[2153331.492340] Total swap = 0kB
[2153331.496457] 8299570 pages RAM
[2153331.500663] 0 pages HighMem/MovableOnly
[2153331.505836] 160138 pages reserved
[2153331.510419] 0 pages hwpoisoned
Environment
- Red Hat OpenShift Container Platform 4.11
- Red Hat CoreOS 8.6.z - kernel-4.18.0-372.51.1.el8_6
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.