Red Hat Enterprise Linux crashed while freeing slab objects during heavy memory fragmentation or during low memory
Red Hat Insights can detect this issue
Environment
- Red Hat Enterprise Linux 8
-
InfiniBand
- Issue is specific to
ib_core
module so any InfiniBand workload that uses theib_core
module
- Issue is specific to
Issue
-
Red Hat Enterprise Linux system crashed while allocating or freeing slab objects with a backtrace similar to one of the following:
[445807.422054] general protection fault: 0000 [#1] SMP NOPTI [445807.428420] CPU: 0 PID: 1 Comm: systemd Kdump: loaded Tainted: G O --------- - - 4.18.0-305.el8.x86_64 #1 [445807.440228] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 01/23/2021 [445807.449705] RIP: 0010:kmem_cache_alloc_trace+0xdb/0x270 [...] [445807.565631] Call Trace: [445807.568872] allocate_cgrp_cset_links+0x72/0xb0 [445807.574197] find_css_set+0x296/0x6b0 [445807.578604] cgroup_migrate_prepare_dst+0x48/0x240 [445807.584140] ? wp_page_copy+0x2b7/0x4c0 [445807.588724] cgroup_attach_task+0x111/0x220 [445807.593665] ? _cond_resched+0x15/0x30 [445807.598147] ? rcu_sync_enter+0x53/0xd0 [445807.602674] __cgroup1_procs_write.constprop.16+0x100/0x140 [445807.609001] cgroup_file_write+0x8a/0x150 [445807.613738] ? __check_object_size+0xa8/0x16b [445807.618792] kernfs_fop_write+0x116/0x190 [445807.623485] vfs_write+0xa5/0x1a0 [445807.627465] ksys_write+0x4f/0xb0 [445807.631433] do_syscall_64+0x5b/0x1a0 [445807.635739] entry_SYSCALL_64_after_hwframe+0x65/0xca [435280.287765] BUG: unable to handle kernel paging request at fffff305c1496248 [435280.299353] PGD 0 P4D 0 [435280.305424] Oops: 0000 [#1] SMP NOPTI [435280.314121] CPU: 90 PID: 2402112 Comm: kworker/90:0 Kdump: loaded Tainted: G O --------- - - 4.18.0-305.el8.x86_64 #1 [435280.335559] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 01/23/2021 [435280.347447] Workqueue: events free_work [435280.354740] RIP: 0010:kfree+0x69/0x450 [...] [435280.515956] Call Trace: [435280.526281] free_work+0x21/0x30 [435280.536465] process_one_work+0x1a7/0x360 [435280.546578] ? create_worker+0x1a0/0x1a0 [435280.554909] worker_thread+0x30/0x390 [435280.562422] ? create_worker+0x1a0/0x1a0 [435280.569282] kthread+0x116/0x130 [435280.577810] ? kthread_flush_work_fn+0x10/0x10 [435280.585225] ret_from_fork+0x1f/0x40
Resolution
The fix is available in:
Red Hat release | Kernel Version | Errata |
---|---|---|
8.6 | kernel-4.18.0-372.9.1.el8 | RHSA-2022:1988 |
8.5 | kernel-4.18.0-348.20.1.el8_5 | RHSA-2022:0825 |
8.4 | kernel-4.18.0-305.40.1.el8_4 | RHSA-2022:0777 |
Root Cause
- Heavy memory fragmentation or memory exhaustion while adding InfiniBand ports can trigger a double free on a slab object within InfiniBand.
- When an InfiniBand port is added to an InfiniBand setup or during a network namespace change within InfiniBand, an
ib_port
structure is allocated and initialized within the kernel. From here, the kernel attempts to allocate memory for the port's partition key and the partition key's attributes. - The allocation of this larger structure is done via slab wherein the InfiniBand subsystem asks the kernel for a slab object to hold the partition key and its attributes from the
kmalloc-64
slab. - If the allocation fails because of a lack of a large enough contiguous chunk of free memory, the InfiniBand catches the failure and works through its error handling code path.
- A bug was found within this error handling code path that allows a double free of the port's partition key, leading to memory corruption and a kernel panic.
Diagnostic Steps
Note The following analysis is taken from a specific instance. Context and data points of crash may vary. For example, the issue could be triggered due to low memory rather than memory fragmentation.
- Setup
kdump
to capture vmcores for the system if this is not yet done so. - Setup
crash
to be able to view the contents of the vmcore (similar to gdb with an application core). - Add
slub_debug=FZUP
to the kernel command line in order to catch slab corruption earlier when it occurs rather than later on. - Wait until the issue is reproduced. Once reproduced, load the vmcore into
crash
. -
Review the cause of the crash and the associated stack
-
First, review the general crash details and the backtrace of process where the crash originated
KERNEL: /path/to/vmlinux DUMPFILE: /path/to/vmcore [PARTIAL DUMP] CPUS: 96 DATE: Mon Sep 27 02:59:56 EDT 2021 UPTIME: 6 days, 09:26:22 LOAD AVERAGE: 37.94, 41.56, 53.06 TASKS: 9684 NODENAME: HOSTNAME RELEASE: 4.18.0-305.el8.x86_64 VERSION: #1 SMP Thu Apr 29 08:54:30 EDT 2021 MACHINE: x86_64 (2200 Mhz) MEMORY: 191.7 GB PANIC: "general protection fault: 0000 [#1] SMP NOPTI" PID: 639918 COMMAND: "(ostnamed)" TASK: ffff93a12ed32080 [THREAD_INFO: ffff93a12ed32080] CPU: 36 STATE: TASK_RUNNING (PANIC) crash> bt PID: 639918 TASK: ffff93a12ed32080 CPU: 36 COMMAND: "(ostnamed)" #0 [ffffb89f527e7a58] machine_kexec at ffffffffba86156e #1 [ffffb89f527e7ab0] __crash_kexec at ffffffffba98f99d #2 [ffffb89f527e7b78] crash_kexec at ffffffffba99088d #3 [ffffb89f527e7b90] oops_end at ffffffffba82434d #4 [ffffb89f527e7bb0] general_protection at ffffffffbb2010ce [exception RIP: ib_port_release+0x58] <--- (a) used in 6. below RIP: ffffffffc0b6f028 RSP: ffffb89f527e7c68 RFLAGS: 00010202 RAX: 6b6b6b6b6b6b6b6b RBX: ffff93b47e0a0040 RCX: 000000000023000d RDX: 000000000023000e RSI: 000000000023000d RDI: ffff93b62bc8c040 RBP: ffff93b47e0a0008 R8: 0000000000000000 R9: ffff93b1e0a8db00 R10: ffff93b1e0a8db38 R11: 0000000000000001 R12: ffff93b47e0a0008 R13: ffff93b5837f86c0 R14: ffff93b47e0a0008 R15: 00000000fffffff4 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #5 [ffffb89f527e7c78] kobject_release at ffffffffbb1285d8 #6 [ffffb89f527e7ca0] ib_setup_port_attrs at ffffffffc0b70548 [ib_core] #7 [ffffb89f527e7d58] add_one_compat_dev at ffffffffc0b739f7 [ib_core] #8 [ffffb89f527e7d90] rdma_dev_init_net at ffffffffc0b73ff5 [ib_core] #9 [ffffb89f527e7dd0] ops_init at ffffffffbaf8b89a #10 [ffffb89f527e7e08] setup_net at ffffffffbaf8ba4e #11 [ffffb89f527e7e58] copy_net_ns at ffffffffbaf8c723 #12 [ffffb89f527e7e88] create_new_namespaces at ffffffffba905c70 #13 [ffffb89f527e7eb8] unshare_nsproxy_namespaces at ffffffffba905f15 #14 [ffffb89f527e7ee0] ksys_unshare at ffffffffba8e034f #15 [ffffb89f527e7f30] __x64_sys_unshare at ffffffffba8e051e #16 [ffffb89f527e7f38] do_syscall_64 at ffffffffba80420b #17 [ffffb89f527e7f50] entry_SYSCALL_64_after_hwframe at ffffffffbb2000ad RIP: 00007fb712af9aab RSP: 00007ffd4cc8d948 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 000055be964de458 RCX: 00007fb712af9aab RDX: 0000000000000000 RSI: 00007ffd4cc8d8b0 RDI: 0000000040000000 RBP: 00007ffd4cc8d970 R8: 0000000000000000 R9: 000055be8dac1808 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 R13: 0000000000000000 R14: 00000000fffffff5 R15: 000055be8dd23800 ORIG_RAX: 0000000000000110 CS: 0033 SS: 002b
-
Summarizing 5., The above stack shows a kernel panic while the process was attempting to change network namespaces. This caused the InfiniBand ports to be recreated within the new namespace(s) until an error occurred causing the freshly created port to be released.
-
-
The cause of the crash is from the kernel attempting to interact with a poison value set by a slub_debug option.
crash> dis ib_port_release+0x58 # (a) from above 0xffffffffc0b6f028 <ib_port_release+0x58>: mov (%rax),%rdi crash> bt | grep RAX | head -n 1 RAX: 6b6b6b6b6b6b6b6b RBX: ffff93b47e0a0040 RCX: 000000000023000d ^^^^^^^^^^^^^^^^
- The panic occurred at
ib_port_release+0x58
where the kernel attempted to dereference the value in%rax
. This is a known value set by the poison option inslub_debug
meaning the kernel is attempting to free an already freed slab object.
- The panic occurred at
-
In order to find the object freed, the assembly, mapped with the C source code, must be walked.
-
The
dis
command can provide disassembly of assembly instructions along with the source code associated with areas of assemblycrash> dis -rl ib_port_release+0x58 | tail /usr/src/debug/kernel-4.18.0-305.el8/linux-4.18.0-305.el8.x86_64/drivers/infiniband/core/sysfs.c: 684 0xffffffffc0b6f013 <ib_port_release+0x43>: mov 0x98(%rbp),%rdi 0xffffffffc0b6f01a <ib_port_release+0x4a>: test %rdi,%rdi 0xffffffffc0b6f01d <ib_port_release+0x4d>: je 0xffffffffc0b6f070 <ib_port_release+0xa0> /usr/src/debug/kernel-4.18.0-305.el8/linux-4.18.0-305.el8.x86_64/drivers/infiniband/core/sysfs.c: 685 0xffffffffc0b6f01f <ib_port_release+0x4f>: mov 0x18(%rdi),%rax <--- derefernce 0x18 off %rdi passed from above 0xffffffffc0b6f023 <ib_port_release+0x53>: test %rax,%rax 0xffffffffc0b6f026 <ib_port_release+0x56>: je 0xffffffffc0b6f060 <ib_port_release+0x90> /usr/src/debug/kernel-4.18.0-305.el8/linux-4.18.0-305.el8.x86_64/drivers/infiniband/core/sysfs.c: 686 0xffffffffc0b6f028 <ib_port_release+0x58>: mov (%rax),%rdi <--- panicked here. %rdi isn't overrwritten and can be used
-
The above maps to:
drivers/infiniband/core/sysfs.c: 671 static void ib_port_release(struct kobject *kobj) 672 { [...] 684 if (p->pkey_group) { 685 if (p->pkey_group->attrs) { <--- attrs is POISON_VALUE 686 for (i = 0; (a = p->pkey_group->attrs[i]); ++i)
-
The offsets of these structures and their attributes can help confirm the assembly above maps to the code above;
crash> struct -o ib_port.pkey_group struct ib_port { [0x98] struct attribute_group *pkey_group; <--- maps to "mov 0x98(%rbp),%rdi" and "if (p->pkey_group) {" } crash> struct -o attribute_group.attrs struct attribute_group { [0x18] struct attribute **attrs; <--- maps to "mov 0x18(%rdi),%rax" and "if (p->pkey_group->attrs)" }
-
The above shows the
ib_port* p
was valid, thep->pkey_group
was valid, andp->pkey_group->attrs
was valid. Theattrs
member in this structure is a double pointer and thus likely a pointer to a list of pointers. The first entry in this list was the POISON_VALUE and thus already freed. -
With this, the assembly can be walked to derive the slab object in question. As noted above, the slab object is derefernced from
%rdi
which is not overwritten. As such, the%rdi
value can be found from the backtrace.crash> bt | grep RDI | head -n 1 RDX: 000000000023000e RSI: 000000000023000d RDI: ffff93b62bc8c040 ^^^^^^^^^^^^^^^^ crash> dis -rl ib_port_release+0x58 | tail /usr/src/debug/kernel-4.18.0-305.el8/linux-4.18.0-305.el8.x86_64/drivers/infiniband/core/sysfs.c: 684 0xffffffffc0b6f013 <ib_port_release+0x43>: mov 0x98(%rbp),%rdi <--- %rdi = p->pkey_group = 0xffff93b62bc8c040 0xffffffffc0b6f01a <ib_port_release+0x4a>: test %rdi,%rdi 0xffffffffc0b6f01d <ib_port_release+0x4d>: je 0xffffffffc0b6f070 <ib_port_release+0xa0> /usr/src/debug/kernel-4.18.0-305.el8/linux-4.18.0-305.el8.x86_64/drivers/infiniband/core/sysfs.c: 685 0xffffffffc0b6f01f <ib_port_release+0x4f>: mov 0x18(%rdi),%rax <--- %rdi = p->pkey_group = 0xffff93b62bc8c040 0xffffffffc0b6f023 <ib_port_release+0x53>: test %rax,%rax <--- %rax is POISON_VALUE so the test returns true 0xffffffffc0b6f026 <ib_port_release+0x56>: je 0xffffffffc0b6f060 <ib_port_release+0x90> /usr/src/debug/kernel-4.18.0-305.el8/linux-4.18.0-305.el8.x86_64/drivers/infiniband/core/sysfs.c: 686 0xffffffffc0b6f028 <ib_port_release+0x58>: mov (%rax),%rdi <--- crashed because %rax is POISON_VALUE
-
Summarizing 7., the pointer to the corrupted slab object is
0xffff93b62bc8c040
, as it was extracted from0x98(%rbp)
, stored in%rdi
, and not overwritten.
-
-
With the slab object pointer in hand, it needs to be verified.
-
Below, the structure is identified from the correct slab (
kmalloc-64
) and thus valid but is free:crash> kmem 0xffff93b62bc8c040 CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME ffff939487c0f3c0 64 970365 983264 30727 16k kmalloc-64 SLAB MEMORY NODE TOTAL ALLOCATED FREE ffffe4040aaf2300 ffff93b62bc8c000 1 32 18 14 FREE / [ALLOCATED] ffff93b62bc8c000 <--- lacking '[]' so not currently allocated.
-
One of the
slub_debug
flags enables storing the backtrace at the time of freeing and allocating a slab object within the object. Checking this, the stacks look like the following:crash> rd ffff93b62bc8c000 64 -s ffff93b62bc8c000: bbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbb ffff93b62bc8c010: bbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbb ffff93b62bc8c020: bbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbb ffff93b62bc8c030: bbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbb ffff93b62bc8c040: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b ffff93b62bc8c050: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b ffff93b62bc8c060: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b ffff93b62bc8c070: 6b6b6b6b6b6b6b6b a56b6b6b6b6b6b6b ffff93b62bc8c080: bbbbbbbbbbbbbbbb ffff93b62bc8e040 ffff93b62bc8c090: ib_setup_port_attrs+0x534 <--- start of the tracking structure (b) __slab_alloc+0x1c ffff93b62bc8c0a0: kmem_cache_alloc_trace+0x22e ib_setup_port_attrs+0x534 <--- function where allocation occurred ffff93b62bc8c0b0: add_one_compat_dev+0x1a7 rdma_dev_init_net+0xf5 ffff93b62bc8c0c0: ops_init+0x3a setup_net+0xee ffff93b62bc8c0d0: copy_net_ns+0xc3 create_new_namespaces+0x170 ffff93b62bc8c0e0: unshare_nsproxy_namespaces+0x55 ksys_unshare+0x18f ffff93b62bc8c0f0: __x64_sys_unshare+0xe do_syscall_64+0x5b ffff93b62bc8c100: entry_SYSCALL_64_after_hwframe+0x65 0000000000000000 ffff93b62bc8c110: entry_SYSCALL_64_after_hwframe+0x65 0009c3ae00000022 ffff93b62bc8c120: 0000000120e81b5f ib_setup_port_attrs+0x601 ffff93b62bc8c130: kfree+0x40b ib_setup_port_attrs+0x601 <--- function where free occurred ffff93b62bc8c140: add_one_compat_dev+0x1a7 rdma_dev_init_net+0xf5 ffff93b62bc8c150: ops_init+0x3a setup_net+0xee ffff93b62bc8c160: copy_net_ns+0xc3 create_new_namespaces+0x170 ffff93b62bc8c170: unshare_nsproxy_namespaces+0x55 ksys_unshare+0x18f ffff93b62bc8c180: __x64_sys_unshare+0xe do_syscall_64+0x5b ffff93b62bc8c190: entry_SYSCALL_64_after_hwframe+0x65 0000000000000000 ffff93b62bc8c1a0: 0000000000000000 0000000000000000 ffff93b62bc8c1b0: 0009c3ae00000024 0000000120e81b7f ffff93b62bc8c1c0: 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a ffff93b62bc8c1d0: 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a ffff93b62bc8c1e0: 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a ffff93b62bc8c1f0: 5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a
-
The same structure storing the backtraces in slab objects also stores process and cpu info. Checking this;
crash> track.cpu,pid,when -d ffff93b62bc8c090 2 # (b) the start of the tracking structure above cpu = 34 pid = 639918 <--- when = 4847049567 cpu = 36 pid = 639918 <--- when = 4847049599
-
From all data points in 8., the vmcore shows the same process, 639918 both allocated and freed from
ib_setup_port_attrs()
-
-
Given the vmcore shows the same process both allocating and freeing the slab object, the probability of some external entity corrupting the slab object goes down while the probability of a kernel bug goes up some. As such, the relevant code path should be inspected to determine if a kernel bug could cause a double free.
-
Start in the function object was alloacted in,
ib_setup_port_attrs()
as determined from 8. abovedrivers/infiniband/core/sysfs.c: 1354 int ib_setup_port_attrs(struct ib_core_device *coredev) 1355 { 1356 struct ib_device *device = rdma_device_to_ibdev(&coredev->dev); 1357 unsigned int port; 1358 int ret; [...] 1365 rdma_for_each_port (device, port) { 1366 ret = add_port(coredev, port); <---
-
Jump into
add_port()
wherep->pkey_group
is allocated,p->pkey_group->attrs
fails to be allocated, so the kernel walks through the error handling code pathdrivers/infiniband/core/sysfs.c: 1042 static int add_port(struct ib_core_device *coredev, int port_num) 1043 { 1044 struct ib_device *device = rdma_device_to_ibdev(&coredev->dev); 1045 bool is_full_dev = &device->coredev == coredev; 1046 struct ib_port *p; 1047 struct ib_port_attr attr; 1048 int i; 1049 int ret; 1050 1051 ret = ib_query_port(device, port_num, &attr); <--- grab the device's attributes [...] 1055 p = kzalloc(sizeof *p, GFP_KERNEL); <--- allocate port here 1056 if (!p) 1057 return -ENOMEM; <--- port is allocated so we did not take this return 1058 [...] 1062 ret = kobject_init_and_add(&p->kobj, &port_type, <--- initialize part of the port with the 1063 coredev->ports_kobj, kernel object, "port_type" 1064 "%d", port_num); [...] 1126 if (attr.pkey_tbl_len) { 1127 p->pkey_group = kzalloc(sizeof(*p->pkey_group), GFP_KERNEL); <--- allocate pkey_group 1128 if (!p->pkey_group) { <--- pkey_group was allocated in vmcore, so did not take this 1129 ret = -ENOMEM; 1130 goto err_remove_gid_type; 1131 } [...] 1134 p->pkey_group->attrs = alloc_group_attrs(show_port_pkey, <--- allocate pkey_group->attrs 1135 attr.pkey_tbl_len); 1136 if (!p->pkey_group->attrs) { <--- attrs was not allocated, so take this 1137 ret = -ENOMEM; 1138 goto err_free_pkey_group; <--- follow this goto statement 1139 } [...] 1179 err_free_pkey_group: 1180 kfree(p->pkey_group); <--- frees up the pkey_group pointer. 1181 [...] continue falling through the error handling code path until the end: 1221 err_put: 1222 kobject_put(&p->kobj); <--- calls the "release" function within 1223 return ret; "port_type" assigned in line 1062 above 1224 }
-
Within the error handling code path, the
p->pkey_group
structure is freed in line 1180. Note freeing the structure does not overwrite the pointer. Checking into therelease
function forport_type
is shown below:drivers/infiniband/core/sysfs.c: 723 static struct kobj_type port_type = { 724 .release = ib_port_release, <--- release function maps to ib_port_release where 725 .sysfs_ops = &port_sysfs_ops, the kernel panicked as seen in 7. above 726 .default_attrs = port_default_attrs 727 }; drivers/infiniband/core/sysfs.c: 671 static void ib_port_release(struct kobject *kobj) 672 { 673 struct ib_port *p = container_of(kobj, struct ib_port, kobj); 674 struct attribute *a; 675 int i; [...] 684 if (p->pkey_group) { <--- p->pkey_group is already free but the pointer is not NULL 685 if (p->pkey_group->attrs) { 686 for (i = 0; (a = p->pkey_group->attrs[i]); ++i) 687 kfree(a);
-
In the above, while attempting to handle the allocation failure, the
p->pkey_group
is freed, then the kernel attempts to free structures in it later. This occurs because the address of the recently freedp->pkey_group
is not cleared out in lines 1179-1181 above. -
In fact, in
ib_port_release()
, the kernel clears thep->pkey_group
after attempting to free it;drivers/infiniband/core/sysfs.c: 671 static void ib_port_release(struct kobject *kobj) 672 { 673 struct ib_port *p = container_of(kobj, struct ib_port, kobj); 674 struct attribute *a; 675 int i; [...] 684 if (p->pkey_group) { 685 if (p->pkey_group->attrs) { 686 for (i = 0; (a = p->pkey_group->attrs[i]); ++i) 687 kfree(a); 688 689 kfree(p->pkey_group->attrs); 690 } 691 692 kfree(p->pkey_group); <--- frees the pkey_group 693 p->pkey_group = NULL; <--- clears the pointer to the recently freed object.
-
-
The allocation failure that causes the kernel to enter the problem code path can be observed in the vmcore's kernel ring buffer as well before the crash.
-
The below segment of the kernel ring buffer shows an allocation attempt in the
alloc_group_attrs()
function failing and warning with apage allocation failure
message;crash> log [...] [552379.499187] (ostnamed): page allocation failure: order:7, mode:0x60c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0-1 [552379.499192] CPU: 36 PID: 639918 Comm: (ostnamed) Kdump: loaded Tainted: G OE --------- - - 4.18.0-305.el8.x86_64 #1 [552379.499192] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 01/23/2021 [552379.499193] Call Trace: [552379.499200] dump_stack+0x5c/0x80 [552379.499204] warn_alloc.cold.115+0x7b/0x10d [552379.499208] ? _cond_resched+0x15/0x30 [552379.499210] ? __alloc_pages_direct_compact+0x157/0x160 [552379.499211] __alloc_pages_slowpath+0xcd8/0xd20 [552379.499215] ? arch_stack_walk+0xa5/0xf0 [552379.499218] ? stack_trace_save+0x4b/0x70 [552379.499219] __alloc_pages_nodemask+0x283/0x2c0 [552379.499222] kmalloc_order+0x24/0xf0 [552379.499223] kmalloc_order_trace+0x1d/0xa0 [552379.499227] __kmalloc+0x1ee/0x240 [552379.499241] ? ib_port_register_module_stat+0xb0/0xb0 [ib_core] [552379.499247] alloc_group_attrs+0x40/0x120 [ib_core] <--- allocation attempt that failed in line [552379.499253] ib_setup_port_attrs+0x561/0x690 [ib_core] 1134 in 9. above [552379.499260] add_one_compat_dev.part.22+0x1a7/0x220 [ib_core] [552379.499266] rdma_dev_init_net+0xf5/0x1a0 [ib_core] [552379.499269] ops_init+0x3a/0x100 [552379.499271] setup_net+0xee/0x250 [552379.499272] copy_net_ns+0xc3/0x180 [552379.499275] create_new_namespaces+0x170/0x210 [552379.499276] unshare_nsproxy_namespaces+0x55/0xa0 [552379.499279] ksys_unshare+0x18f/0x350 [552379.499281] __x64_sys_unshare+0xe/0x20 [552379.499283] do_syscall_64+0x5b/0x1a0 [552379.499285] entry_SYSCALL_64_after_hwframe+0x65/0xca [552379.499287] RIP: 0033:0x7fb712af9aab [...]
-
In this specific scenario, the allocation attempt failed almost certainly due to memory fragmentation:
crash> kmem -z | grep Normal NODE: 0 ZONE: 2 ADDR: ffff93abbffd6b80 NAME: "Normal" NODE: 1 ZONE: 2 ADDR: ffff93c3bffd4b80 NAME: "Normal" crash> p ((struct zone *)0xffff93abbffd6b80)->free_area | grep nr_free | pr -Tn -N 0 0 nr_free = 0xd5fb7 1 nr_free = 0x4dbd3 2 nr_free = 0x316af 3 nr_free = 0x1e8b 4 nr_free = 0x0 5 nr_free = 0x0 6 nr_free = 0x0 7 nr_free = 0x0 8 nr_free = 0x0 9 nr_free = 0x0 10 nr_free = 0x0 crash> p ((struct zone *)0xffff93c3bffd4b80)->free_area | grep nr_free | pr -Tn -N 0 0 nr_free = 0x3318 1 nr_free = 0x2f50 2 nr_free = 0xa76 3 nr_free = 0x4c4 4 nr_free = 0x73 5 nr_free = 0x19 6 nr_free = 0x3 7 nr_free = 0x0 8 nr_free = 0x0 9 nr_free = 0x0 10 nr_free = 0x0
-
The above output gets the addresses to the zones of memory and prints the same data as what is in
/proc/buddyinfo
. The page allocation failure was for order 7, which, according to the above output, the system had no contiguous memory of order 7 at the time.
-
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments