Red Hat Enterprise Linux crashed while freeing slab objects during heavy memory fragmentation or during low memory

Solution Verified - Updated 2024-06-13T22:24:30+00:00 -

Red Hat Insights can detect this issue

Proactively detect and remediate issues impacting your systems.

Environment

Red Hat Enterprise Linux 8
InfiniBand
- Issue is specific to ib_core module so any InfiniBand workload that uses the ib_core module

Issue

Red Hat Enterprise Linux system crashed while allocating or freeing slab objects with a backtrace similar to one of the following:

[445807.422054] general protection fault: 0000 [#1] SMP NOPTI
[445807.428420] CPU: 0 PID: 1 Comm: systemd Kdump: loaded Tainted: G           O     --------- -  - 4.18.0-305.el8.x86_64 #1
[445807.440228] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 01/23/2021
[445807.449705] RIP: 0010:kmem_cache_alloc_trace+0xdb/0x270
[...]
[445807.565631] Call Trace:
[445807.568872]  allocate_cgrp_cset_links+0x72/0xb0
[445807.574197]  find_css_set+0x296/0x6b0
[445807.578604]  cgroup_migrate_prepare_dst+0x48/0x240
[445807.584140]  ? wp_page_copy+0x2b7/0x4c0
[445807.588724]  cgroup_attach_task+0x111/0x220
[445807.593665]  ? _cond_resched+0x15/0x30
[445807.598147]  ? rcu_sync_enter+0x53/0xd0
[445807.602674]  __cgroup1_procs_write.constprop.16+0x100/0x140
[445807.609001]  cgroup_file_write+0x8a/0x150
[445807.613738]  ? __check_object_size+0xa8/0x16b
[445807.618792]  kernfs_fop_write+0x116/0x190
[445807.623485]  vfs_write+0xa5/0x1a0
[445807.627465]  ksys_write+0x4f/0xb0
[445807.631433]  do_syscall_64+0x5b/0x1a0
[445807.635739]  entry_SYSCALL_64_after_hwframe+0x65/0xca

[435280.287765] BUG: unable to handle kernel paging request at fffff305c1496248
[435280.299353] PGD 0 P4D 0
[435280.305424] Oops: 0000 [#1] SMP NOPTI
[435280.314121] CPU: 90 PID: 2402112 Comm: kworker/90:0 Kdump: loaded Tainted: G           O     --------- -  - 4.18.0-305.el8.x86_64 #1
[435280.335559] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 01/23/2021
[435280.347447] Workqueue: events free_work
[435280.354740] RIP: 0010:kfree+0x69/0x450
[...]
[435280.515956] Call Trace:
[435280.526281]  free_work+0x21/0x30
[435280.536465]  process_one_work+0x1a7/0x360
[435280.546578]  ? create_worker+0x1a0/0x1a0
[435280.554909]  worker_thread+0x30/0x390
[435280.562422]  ? create_worker+0x1a0/0x1a0
[435280.569282]  kthread+0x116/0x130
[435280.577810]  ? kthread_flush_work_fn+0x10/0x10
[435280.585225]  ret_from_fork+0x1f/0x40

Resolution

The fix is available in:

Red Hat release	Kernel Version	Errata
8.6	kernel-4.18.0-372.9.1.el8	RHSA-2022:1988
8.5	kernel-4.18.0-348.20.1.el8_5	RHSA-2022:0825
8.4	kernel-4.18.0-305.40.1.el8_4	RHSA-2022:0777

Root Cause

Heavy memory fragmentation or memory exhaustion while adding InfiniBand ports can trigger a double free on a slab object within InfiniBand.
When an InfiniBand port is added to an InfiniBand setup or during a network namespace change within InfiniBand, an ib_port structure is allocated and initialized within the kernel. From here, the kernel attempts to allocate memory for the port's partition key and the partition key's attributes.
The allocation of this larger structure is done via slab wherein the InfiniBand subsystem asks the kernel for a slab object to hold the partition key and its attributes from the kmalloc-64 slab.
If the allocation fails because of a lack of a large enough contiguous chunk of free memory, the InfiniBand catches the failure and works through its error handling code path.
A bug was found within this error handling code path that allows a double free of the port's partition key, leading to memory corruption and a kernel panic.

Diagnostic Steps

Note The following analysis is taken from a specific instance. Context and data points of crash may vary. For example, the issue could be triggered due to low memory rather than memory fragmentation.

Setup kdump to capture vmcores for the system if this is not yet done so.
Setup crash to be able to view the contents of the vmcore (similar to gdb with an application core).
Add slub_debug=FZUP to the kernel command line in order to catch slab corruption earlier when it occurs rather than later on.
Wait until the issue is reproduced. Once reproduced, load the vmcore into crash.

Review the cause of the crash and the associated stack

First, review the general crash details and the backtrace of process where the crash originated

      KERNEL: /path/to/vmlinux
    DUMPFILE: /path/to/vmcore  [PARTIAL DUMP]
        CPUS: 96
        DATE: Mon Sep 27 02:59:56 EDT 2021
      UPTIME: 6 days, 09:26:22
LOAD AVERAGE: 37.94, 41.56, 53.06
       TASKS: 9684
    NODENAME: HOSTNAME
     RELEASE: 4.18.0-305.el8.x86_64
     VERSION: #1 SMP Thu Apr 29 08:54:30 EDT 2021
     MACHINE: x86_64  (2200 Mhz)
      MEMORY: 191.7 GB
       PANIC: "general protection fault: 0000 [#1] SMP NOPTI"
         PID: 639918
     COMMAND: "(ostnamed)"
        TASK: ffff93a12ed32080  [THREAD_INFO: ffff93a12ed32080]
         CPU: 36
       STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 639918  TASK: ffff93a12ed32080  CPU: 36  COMMAND: "(ostnamed)"
 #0 [ffffb89f527e7a58] machine_kexec at ffffffffba86156e
 #1 [ffffb89f527e7ab0] __crash_kexec at ffffffffba98f99d
 #2 [ffffb89f527e7b78] crash_kexec at ffffffffba99088d
 #3 [ffffb89f527e7b90] oops_end at ffffffffba82434d
 #4 [ffffb89f527e7bb0] general_protection at ffffffffbb2010ce
    [exception RIP: ib_port_release+0x58]             <--- (a) used in 6. below
    RIP: ffffffffc0b6f028  RSP: ffffb89f527e7c68  RFLAGS: 00010202
    RAX: 6b6b6b6b6b6b6b6b  RBX: ffff93b47e0a0040  RCX: 000000000023000d
    RDX: 000000000023000e  RSI: 000000000023000d  RDI: ffff93b62bc8c040
    RBP: ffff93b47e0a0008   R8: 0000000000000000   R9: ffff93b1e0a8db00
    R10: ffff93b1e0a8db38  R11: 0000000000000001  R12: ffff93b47e0a0008
    R13: ffff93b5837f86c0  R14: ffff93b47e0a0008  R15: 00000000fffffff4
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffffb89f527e7c78] kobject_release at ffffffffbb1285d8
 #6 [ffffb89f527e7ca0] ib_setup_port_attrs at ffffffffc0b70548 [ib_core]
 #7 [ffffb89f527e7d58] add_one_compat_dev at ffffffffc0b739f7 [ib_core]
 #8 [ffffb89f527e7d90] rdma_dev_init_net at ffffffffc0b73ff5 [ib_core]
 #9 [ffffb89f527e7dd0] ops_init at ffffffffbaf8b89a
#10 [ffffb89f527e7e08] setup_net at ffffffffbaf8ba4e
#11 [ffffb89f527e7e58] copy_net_ns at ffffffffbaf8c723
#12 [ffffb89f527e7e88] create_new_namespaces at ffffffffba905c70
#13 [ffffb89f527e7eb8] unshare_nsproxy_namespaces at ffffffffba905f15
#14 [ffffb89f527e7ee0] ksys_unshare at ffffffffba8e034f
#15 [ffffb89f527e7f30] __x64_sys_unshare at ffffffffba8e051e
#16 [ffffb89f527e7f38] do_syscall_64 at ffffffffba80420b
#17 [ffffb89f527e7f50] entry_SYSCALL_64_after_hwframe at ffffffffbb2000ad
    RIP: 00007fb712af9aab  RSP: 00007ffd4cc8d948  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 000055be964de458  RCX: 00007fb712af9aab
    RDX: 0000000000000000  RSI: 00007ffd4cc8d8b0  RDI: 0000000040000000
    RBP: 00007ffd4cc8d970   R8: 0000000000000000   R9: 000055be8dac1808
    R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000000001
    R13: 0000000000000000  R14: 00000000fffffff5  R15: 000055be8dd23800
    ORIG_RAX: 0000000000000110  CS: 0033  SS: 002b

Summarizing 5., The above stack shows a kernel panic while the process was attempting to change network namespaces. This caused the InfiniBand ports to be recreated within the new namespace(s) until an error occurred causing the freshly created port to be released.

The cause of the crash is from the kernel attempting to interact with a poison value set by a slub_debug option.
```
    crash> dis ib_port_release+0x58       # (a) from above
    0xffffffffc0b6f028 <ib_port_release+0x58>:  mov    (%rax),%rdi
    crash> bt | grep RAX | head -n 1
        RAX: 6b6b6b6b6b6b6b6b  RBX: ffff93b47e0a0040  RCX: 000000000023000d
             ^^^^^^^^^^^^^^^^
```
- The panic occurred at ib_port_release+0x58 where the kernel attempted to dereference the value in %rax. This is a known value set by the poison option in slub_debug meaning the kernel is attempting to free an already freed slab object.

In order to find the object freed, the assembly, mapped with the C source code, must be walked.

The dis command can provide disassembly of assembly instructions along with the source code associated with areas of assembly

crash> dis -rl ib_port_release+0x58 | tail
/usr/src/debug/kernel-4.18.0-305.el8/linux-4.18.0-305.el8.x86_64/drivers/infiniband/core/sysfs.c: 684
0xffffffffc0b6f013 <ib_port_release+0x43>:  mov    0x98(%rbp),%rdi
0xffffffffc0b6f01a <ib_port_release+0x4a>:  test   %rdi,%rdi
0xffffffffc0b6f01d <ib_port_release+0x4d>:  je     0xffffffffc0b6f070 <ib_port_release+0xa0>
/usr/src/debug/kernel-4.18.0-305.el8/linux-4.18.0-305.el8.x86_64/drivers/infiniband/core/sysfs.c: 685
0xffffffffc0b6f01f <ib_port_release+0x4f>:  mov    0x18(%rdi),%rax   <--- derefernce 0x18 off %rdi passed from above
0xffffffffc0b6f023 <ib_port_release+0x53>:  test   %rax,%rax
0xffffffffc0b6f026 <ib_port_release+0x56>:  je     0xffffffffc0b6f060 <ib_port_release+0x90>
/usr/src/debug/kernel-4.18.0-305.el8/linux-4.18.0-305.el8.x86_64/drivers/infiniband/core/sysfs.c: 686
0xffffffffc0b6f028 <ib_port_release+0x58>:  mov    (%rax),%rdi       <--- panicked here. %rdi isn't overrwritten and can be used

The above maps to:

drivers/infiniband/core/sysfs.c:
671 static void ib_port_release(struct kobject *kobj)
672 {
[...]
684         if (p->pkey_group) {
685                 if (p->pkey_group->attrs) {      <--- attrs is POISON_VALUE
686                         for (i = 0; (a = p->pkey_group->attrs[i]); ++i)

The offsets of these structures and their attributes can help confirm the assembly above maps to the code above;

crash> struct -o ib_port.pkey_group
struct ib_port {
  [0x98] struct attribute_group *pkey_group;    <--- maps to "mov    0x98(%rbp),%rdi" and "if (p->pkey_group) {"
}

crash> struct -o attribute_group.attrs
struct attribute_group {
  [0x18] struct attribute **attrs;         <--- maps to "mov    0x18(%rdi),%rax" and "if (p->pkey_group->attrs)"
}

The above shows the ib_port* p was valid, the p->pkey_group was valid, and p->pkey_group->attrs was valid. The attrs member in this structure is a double pointer and thus likely a pointer to a list of pointers. The first entry in this list was the POISON_VALUE and thus already freed.

With this, the assembly can be walked to derive the slab object in question. As noted above, the slab object is derefernced from %rdi which is not overwritten. As such, the %rdi value can be found from the backtrace.

crash> bt | grep RDI | head -n 1
    RDX: 000000000023000e  RSI: 000000000023000d  RDI: ffff93b62bc8c040
                                                       ^^^^^^^^^^^^^^^^
crash> dis -rl ib_port_release+0x58 | tail
/usr/src/debug/kernel-4.18.0-305.el8/linux-4.18.0-305.el8.x86_64/drivers/infiniband/core/sysfs.c: 684
0xffffffffc0b6f013 <ib_port_release+0x43>:  mov    0x98(%rbp),%rdi       <--- %rdi = p->pkey_group = 0xffff93b62bc8c040
0xffffffffc0b6f01a <ib_port_release+0x4a>:  test   %rdi,%rdi
0xffffffffc0b6f01d <ib_port_release+0x4d>:  je     0xffffffffc0b6f070 <ib_port_release+0xa0>

/usr/src/debug/kernel-4.18.0-305.el8/linux-4.18.0-305.el8.x86_64/drivers/infiniband/core/sysfs.c: 685
0xffffffffc0b6f01f <ib_port_release+0x4f>:  mov    0x18(%rdi),%rax       <--- %rdi = p->pkey_group = 0xffff93b62bc8c040
0xffffffffc0b6f023 <ib_port_release+0x53>:  test   %rax,%rax             <--- %rax is POISON_VALUE so the test returns true
0xffffffffc0b6f026 <ib_port_release+0x56>:  je     0xffffffffc0b6f060 <ib_port_release+0x90>

/usr/src/debug/kernel-4.18.0-305.el8/linux-4.18.0-305.el8.x86_64/drivers/infiniband/core/sysfs.c: 686
0xffffffffc0b6f028 <ib_port_release+0x58>:  mov    (%rax),%rdi           <--- crashed because %rax is POISON_VALUE

Summarizing 7., the pointer to the corrupted slab object is 0xffff93b62bc8c040, as it was extracted from 0x98(%rbp), stored in %rdi, and not overwritten.

With the slab object pointer in hand, it needs to be verified.

Below, the structure is identified from the correct slab (kmalloc-64) and thus valid but is free:

crash> kmem 0xffff93b62bc8c040
CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
ffff939487c0f3c0       64     970365    983264  30727    16k  kmalloc-64
  SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
  ffffe4040aaf2300  ffff93b62bc8c000     1     32         18    14
  FREE / [ALLOCATED]
   ffff93b62bc8c000     <--- lacking '[]' so not currently allocated.

One of the slub_debug flags enables storing the backtrace at the time of freeing and allocating a slab object within the object. Checking this, the stacks look like the following:

crash> rd ffff93b62bc8c000 64 -s
ffff93b62bc8c000:  bbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbb 
ffff93b62bc8c010:  bbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbb 
ffff93b62bc8c020:  bbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbb 
ffff93b62bc8c030:  bbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbb 
ffff93b62bc8c040:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b 
ffff93b62bc8c050:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b 
ffff93b62bc8c060:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b 
ffff93b62bc8c070:  6b6b6b6b6b6b6b6b a56b6b6b6b6b6b6b 
ffff93b62bc8c080:  bbbbbbbbbbbbbbbb ffff93b62bc8e040 

ffff93b62bc8c090:  ib_setup_port_attrs+0x534             <--- start of the tracking structure (b)
                                              __slab_alloc+0x1c 
ffff93b62bc8c0a0:  kmem_cache_alloc_trace+0x22e ib_setup_port_attrs+0x534   <--- function where allocation occurred
ffff93b62bc8c0b0:  add_one_compat_dev+0x1a7 rdma_dev_init_net+0xf5 
ffff93b62bc8c0c0:  ops_init+0x3a    setup_net+0xee   
ffff93b62bc8c0d0:  copy_net_ns+0xc3 create_new_namespaces+0x170 
ffff93b62bc8c0e0:  unshare_nsproxy_namespaces+0x55 ksys_unshare+0x18f 
ffff93b62bc8c0f0:  __x64_sys_unshare+0xe do_syscall_64+0x5b 
ffff93b62bc8c100:  entry_SYSCALL_64_after_hwframe+0x65 0000000000000000 
ffff93b62bc8c110:  entry_SYSCALL_64_after_hwframe+0x65 0009c3ae00000022 
ffff93b62bc8c120:  0000000120e81b5f 
                                    ib_setup_port_attrs+0x601 
ffff93b62bc8c130:  kfree+0x40b      ib_setup_port_attrs+0x601          <--- function where free occurred
ffff93b62bc8c140:  add_one_compat_dev+0x1a7 rdma_dev_init_net+0xf5 
ffff93b62bc8c150:  ops_init+0x3a    setup_net+0xee   
ffff93b62bc8c160:  copy_net_ns+0xc3 create_new_namespaces+0x170 
ffff93b62bc8c170:  unshare_nsproxy_namespaces+0x55 ksys_unshare+0x18f 
ffff93b62bc8c180:  __x64_sys_unshare+0xe do_syscall_64+0x5b 
ffff93b62bc8c190:  entry_SYSCALL_64_after_hwframe+0x65 0000000000000000 
ffff93b62bc8c1a0:  0000000000000000 0000000000000000 
ffff93b62bc8c1b0:  0009c3ae00000024 0000000120e81b7f 
ffff93b62bc8c1c0:  5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a 
ffff93b62bc8c1d0:  5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a 
ffff93b62bc8c1e0:  5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a 
ffff93b62bc8c1f0:  5a5a5a5a5a5a5a5a 5a5a5a5a5a5a5a5a

The same structure storing the backtraces in slab objects also stores process and cpu info. Checking this;

crash> track.cpu,pid,when -d  ffff93b62bc8c090 2    # (b) the start of the tracking structure above
  cpu = 34
  pid = 639918      <---
  when = 4847049567

  cpu = 36
  pid = 639918      <--- 
  when = 4847049599

From all data points in 8., the vmcore shows the same process, 639918 both allocated and freed from ib_setup_port_attrs()

Given the vmcore shows the same process both allocating and freeing the slab object, the probability of some external entity corrupting the slab object goes down while the probability of a kernel bug goes up some. As such, the relevant code path should be inspected to determine if a kernel bug could cause a double free.

Start in the function object was alloacted in, ib_setup_port_attrs() as determined from 8. above

drivers/infiniband/core/sysfs.c:
1354 int ib_setup_port_attrs(struct ib_core_device *coredev)
1355 {
1356         struct ib_device *device = rdma_device_to_ibdev(&coredev->dev);
1357         unsigned int port;
1358         int ret;
[...]
1365         rdma_for_each_port (device, port) {
1366                 ret = add_port(coredev, port);    <---

Jump into add_port() where p->pkey_group is allocated, p->pkey_group->attrs fails to be allocated, so the kernel walks through the error handling code path

drivers/infiniband/core/sysfs.c:
1042 static int add_port(struct ib_core_device *coredev, int port_num)
1043 {
1044         struct ib_device *device = rdma_device_to_ibdev(&coredev->dev);
1045         bool is_full_dev = &device->coredev == coredev;
1046         struct ib_port *p;
1047         struct ib_port_attr attr;
1048         int i;
1049         int ret;
1050 
1051         ret = ib_query_port(device, port_num, &attr);    <--- grab the device's attributes 
[...]
1055         p = kzalloc(sizeof *p, GFP_KERNEL);    <--- allocate port here
1056         if (!p)
1057                 return -ENOMEM;    <--- port is allocated so we did not take this return
1058 
[...]
1062         ret = kobject_init_and_add(&p->kobj, &port_type,   <--- initialize part of the port with the
1063                                    coredev->ports_kobj,         kernel object, "port_type"
1064                                    "%d", port_num);
[...]
1126         if (attr.pkey_tbl_len) {
1127                 p->pkey_group = kzalloc(sizeof(*p->pkey_group), GFP_KERNEL);   <--- allocate pkey_group
1128                 if (!p->pkey_group) {              <--- pkey_group was allocated in vmcore, so did not take this
1129                         ret = -ENOMEM;
1130                         goto err_remove_gid_type;
1131                 }
[...]
1134                 p->pkey_group->attrs = alloc_group_attrs(show_port_pkey,      <--- allocate pkey_group->attrs
1135                                                          attr.pkey_tbl_len);
1136                 if (!p->pkey_group->attrs) {     <--- attrs was not allocated, so take this 
1137                         ret = -ENOMEM;
1138                         goto err_free_pkey_group;    <--- follow this goto statement
1139                 }
[...]
1179 err_free_pkey_group:                                                                                                      
1180         kfree(p->pkey_group);       <--- frees up the pkey_group pointer. 
1181
[...]    continue falling through the error handling code path until the end:
1221 err_put:
1222         kobject_put(&p->kobj);    <--- calls the "release" function within
1223         return ret;                    "port_type" assigned in line 1062 above
1224 }

Within the error handling code path, the p->pkey_group structure is freed in line 1180. Note freeing the structure does not overwrite the pointer. Checking into the release function for port_type is shown below:

drivers/infiniband/core/sysfs.c:
 723 static struct kobj_type port_type = {
 724         .release       = ib_port_release,     <--- release function maps to ib_port_release where
 725         .sysfs_ops     = &port_sysfs_ops,          the kernel panicked as seen in 7. above
 726         .default_attrs = port_default_attrs
 727 };

drivers/infiniband/core/sysfs.c:
 671 static void ib_port_release(struct kobject *kobj)
 672 {
 673         struct ib_port *p = container_of(kobj, struct ib_port, kobj);
 674         struct attribute *a;
 675         int i;
[...]
 684         if (p->pkey_group) {    <--- p->pkey_group is already free but the pointer is not NULL
 685                 if (p->pkey_group->attrs) {
 686                         for (i = 0; (a = p->pkey_group->attrs[i]); ++i)
 687                                 kfree(a);

In the above, while attempting to handle the allocation failure, the p->pkey_group is freed, then the kernel attempts to free structures in it later. This occurs because the address of the recently freed p->pkey_group is not cleared out in lines 1179-1181 above.

In fact, in ib_port_release(), the kernel clears the p->pkey_group after attempting to free it;

drivers/infiniband/core/sysfs.c:
 671 static void ib_port_release(struct kobject *kobj)
 672 {
 673         struct ib_port *p = container_of(kobj, struct ib_port, kobj);
 674         struct attribute *a;
 675         int i;
[...]
 684         if (p->pkey_group) {
 685                 if (p->pkey_group->attrs) {
 686                         for (i = 0; (a = p->pkey_group->attrs[i]); ++i)
 687                                 kfree(a);
 688 
 689                         kfree(p->pkey_group->attrs);
 690                 }
 691 
 692                 kfree(p->pkey_group);      <--- frees the pkey_group
 693                 p->pkey_group = NULL;      <--- clears the pointer to the recently freed object.

The allocation failure that causes the kernel to enter the problem code path can be observed in the vmcore's kernel ring buffer as well before the crash.

The below segment of the kernel ring buffer shows an allocation attempt in the alloc_group_attrs() function failing and warning with a page allocation failure message;

crash> log 
[...]
[552379.499187] (ostnamed): page allocation failure: order:7, mode:0x60c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0-1
[552379.499192] CPU: 36 PID: 639918 Comm: (ostnamed) Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-305.el8.x86_64 #1
[552379.499192] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 01/23/2021
[552379.499193] Call Trace:
[552379.499200]  dump_stack+0x5c/0x80
[552379.499204]  warn_alloc.cold.115+0x7b/0x10d
[552379.499208]  ? _cond_resched+0x15/0x30
[552379.499210]  ? __alloc_pages_direct_compact+0x157/0x160
[552379.499211]  __alloc_pages_slowpath+0xcd8/0xd20
[552379.499215]  ? arch_stack_walk+0xa5/0xf0
[552379.499218]  ? stack_trace_save+0x4b/0x70
[552379.499219]  __alloc_pages_nodemask+0x283/0x2c0
[552379.499222]  kmalloc_order+0x24/0xf0
[552379.499223]  kmalloc_order_trace+0x1d/0xa0
[552379.499227]  __kmalloc+0x1ee/0x240
[552379.499241]  ? ib_port_register_module_stat+0xb0/0xb0 [ib_core]
[552379.499247]  alloc_group_attrs+0x40/0x120 [ib_core]                <--- allocation attempt that failed in line
[552379.499253]  ib_setup_port_attrs+0x561/0x690 [ib_core]                  1134 in 9. above
[552379.499260]  add_one_compat_dev.part.22+0x1a7/0x220 [ib_core]
[552379.499266]  rdma_dev_init_net+0xf5/0x1a0 [ib_core]
[552379.499269]  ops_init+0x3a/0x100
[552379.499271]  setup_net+0xee/0x250
[552379.499272]  copy_net_ns+0xc3/0x180
[552379.499275]  create_new_namespaces+0x170/0x210
[552379.499276]  unshare_nsproxy_namespaces+0x55/0xa0
[552379.499279]  ksys_unshare+0x18f/0x350
[552379.499281]  __x64_sys_unshare+0xe/0x20
[552379.499283]  do_syscall_64+0x5b/0x1a0
[552379.499285]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[552379.499287] RIP: 0033:0x7fb712af9aab
[...]

In this specific scenario, the allocation attempt failed almost certainly due to memory fragmentation:

crash> kmem -z | grep Normal
NODE: 0  ZONE: 2  ADDR: ffff93abbffd6b80  NAME: "Normal"
NODE: 1  ZONE: 2  ADDR: ffff93c3bffd4b80  NAME: "Normal"

crash> p ((struct zone *)0xffff93abbffd6b80)->free_area | grep nr_free | pr -Tn -N 0
    0       nr_free = 0xd5fb7
    1       nr_free = 0x4dbd3
    2       nr_free = 0x316af
    3       nr_free = 0x1e8b
    4       nr_free = 0x0
    5       nr_free = 0x0
    6       nr_free = 0x0
    7       nr_free = 0x0
    8       nr_free = 0x0
    9       nr_free = 0x0
   10       nr_free = 0x0
crash> p ((struct zone *)0xffff93c3bffd4b80)->free_area | grep nr_free | pr -Tn -N 0
    0       nr_free = 0x3318
    1       nr_free = 0x2f50
    2       nr_free = 0xa76
    3       nr_free = 0x4c4
    4       nr_free = 0x73
    5       nr_free = 0x19
    6       nr_free = 0x3
    7       nr_free = 0x0
    8       nr_free = 0x0
    9       nr_free = 0x0
   10       nr_free = 0x0

The above output gets the addresses to the zones of memory and prints the same data as what is in /proc/buddyinfo. The page allocation failure was for order 7, which, according to the above output, the system had no contiguous memory of order 7 at the time.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Select Your Language

Red Hat Enterprise Linux crashed while freeing slab objects during heavy memory fragmentation or during low memory

Red Hat Insights can detect this issue

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Red Hat Insights can detect this issue

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links