System with dattobd module gets panic with "kernel BUG at mm/slub.c:1532!"

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 7
  • 3rd party module dattobd loaded

Issue

  • Kernel panicking with kernel BUG at mm/slub.c:1532!
  • Panic at RIP new_slab()

Resolution

  • Red Hat neither ships nor supports this module. Engage the respective vendor of the module dattobd for further investigation.
  • As a workaround, you may disable the module dattobd.

Root Cause

  • The kernel panicked after a BUG_ON debug condition was met while attempting to allocate a slab object.
  • The debug condition checks the page allocation flags being requested by the underlying dattobd driver which are invalid.

Diagnostic Steps

Pre-requisites

  1. Deploy kdump in Order to Collect a vmcore:

  2. Prepare crash Environment for vmcore Analysis:

Vmcore Analysis

  1. System Information:

    crash> sys |grep -eREL -ePAN -eLOAD
    LOAD AVERAGE: 0.81, 0.49, 0.1
    RELEASE: 3.10.0-1160.88.1.el7.x86_64
    PANIC: "kernel BUG at mm/slub.c:1532!"  
    
    crash> sys -i |head -5
    DMI_BIOS_VENDOR: VMware, Inc.
    DMI_BIOS_VERSION: VMW71.00V.13989454.B64.1906190538
    DMI_BIOS_DATE: 06/19/2019
    DMI_SYS_VENDOR: VMware, Inc.
    DMI_PRODUCT_NAME: VMware7,1
    
  2. Backtrace of the panic task shows that the panic occurred at RIP new_slab+0x4a9 :

    crash> bt
    PID: 4420     TASK: ffff96e1acb24200  CPU: 1    COMMAND: "libmicrohttpd"
     #0 [ffff96e1e4e0b820] machine_kexec at ffffffffa4869514
     #1 [ffff96e1e4e0b880] __crash_kexec at ffffffffa4929e82
     #2 [ffff96e1e4e0b950] crash_kexec at ffffffffa4929f78
     #3 [ffff96e1e4e0b968] oops_end at ffffffffa4fbc818
     #4 [ffff96e1e4e0b990] die at ffffffffa4831c0b
     #5 [ffff96e1e4e0b9c0] do_trap at ffffffffa4fbbf30
     #6 [ffff96e1e4e0ba10] do_invalid_op at ffffffffa482e374
     #7 [ffff96e1e4e0bac0] invalid_op at ffffffffa4fc88ae
        [exception RIP: new_slab+0x4a9]
        RIP: ffffffffa4a34669  RSP: ffff96e1e4e0bb78  RFLAGS: 00010086
        RAX: ffff96e1acb24200  RBX: 0000000000000000  RCX: 0000000000000000
        RDX: 00000000ffffffff  RSI: 00000000ffffffff  RDI: ffff96e1ffc75100
        RBP: ffff96e1e4e0bba8   R8: 0000000000000000   R9: 00000001000d000d
        R10: ffffffffa4b67700  R11: 0000000000000000  R12: 00000000ffffffff
        R13: ffff96e1fffed258  R14: ffffffffffffffff  R15: ffff96e1ffc76040
        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
     #8 [ffff96e1e4e0bbb0] ___slab_alloc at ffffffffa4a34a6c
     #9 [ffff96e1e4e0bc88] __slab_alloc at ffffffffa4fae458
    #10 [ffff96e1e4e0bcc8] kmem_cache_alloc_node at ffffffffa4a34c4b
    #11 [ffff96e1e4e0bd18] blk_alloc_queue_node at ffffffffa4b67700
    #12 [ffff96e1e4e0bd48] blk_alloc_queue at ffffffffa4b67a45
    #13 [ffff96e1e4e0bd58] __tracer_setup_snap at ffffffffc0136b97 [dattobd]
    #14 [ffff96e1e4e0bd88] tracer_setup_active_snap at ffffffffc01377b3 [dattobd]
    #15 [ffff96e1e4e0bde0] __ioctl_setup at ffffffffc0130355 [dattobd]
    #16 [ffff96e1e4e0be38] ctrl_ioctl at ffffffffc0130bf1 [dattobd]
    #17 [ffff96e1e4e0be90] do_vfs_ioctl at ffffffffa4a71918
    #18 [ffff96e1e4e0bf10] sys_ioctl at ffffffffa4a71bb1
    #19 [ffff96e1e4e0bf50] system_call_fastpath at ffffffffa4fc539a
        RIP: 00007f508d0374a7  RSP: 00007f5075aa1808  RFLAGS: 00000216
        RAX: 0000000000000010  RBX: 0000000000000013  RCX: ffffffffffffffff
        RDX: 00007f5075aa1d00  RSI: 0000000040289101  RDI: 0000000000000013
        RBP: 00007f505c003af8   R8: 0000000000000000   R9: 0000000000000000
        R10: 63202c32312d6d64  R11: 0000000000000246  R12: 0000000000000010
        R13: 0000000000000000  R14: 0000000000000400  R15: 00007f505d4521f8
        ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b
    
  3. This is the disassembly of the 3rd party dattobd driver code that calls into the RHEL kernel code:

    crash> dis -rl ffffffffc0136b97 | tail -3
    0xffffffffc0136b8d <__tracer_setup_snap+0x6d>:  mov    $0xffffffff,%edi   
    0xffffffffc0136b92 <__tracer_setup_snap+0x72>:  call   0xffffffffa4b67a30 <blk_alloc_queue>
    0xffffffffc0136b97 <__tracer_setup_snap+0x77>:  test   %rax,%rax
    
    blk_alloc_queue(gfp_t gfp_mask)  <= gfp_mask is set to 0xffffffff by the driver
    
  4. This is the disassembly of where we meet the BUG_ON debug condition:

    We execute a ud2 instruction at line 1532 of mm/slub.c after a conditional jump to the instruction. The jump condition tests 0xfe000006 against the contents of the %esi register (flags supplied by dattobd driver):

    crash> dis -rl ffffffffa4a34669 | head -15
    /usr/src/debug/kernel-3.10.0-1160.88.1.el7/linux-3.10.0-1160.88.1.el7.x86_64/mm/slub.c: 1525
    0xffffffffa4a341c0 <new_slab>:  nopl   0x0(%rax,%rax,1) [FTRACE NOP]
    0xffffffffa4a341c5 <new_slab+0x5>:  push   %rbp
    0xffffffffa4a341c6 <new_slab+0x6>:  mov    %rsp,%rbp
    0xffffffffa4a341c9 <new_slab+0x9>:  push   %r15
    0xffffffffa4a341cb <new_slab+0xb>:  push   %r14
    0xffffffffa4a341cd <new_slab+0xd>:  movslq %edx,%r14
    0xffffffffa4a341d0 <new_slab+0x10>: push   %r13
    0xffffffffa4a341d2 <new_slab+0x12>: push   %r12
    0xffffffffa4a341d4 <new_slab+0x14>: push   %rbx
    0xffffffffa4a341d5 <new_slab+0x15>: sub    $0x8,%rsp
    /usr/src/debug/kernel-3.10.0-1160.88.1.el7/linux-3.10.0-1160.88.1.el7.x86_64/mm/slub.c: 1532
    0xffffffffa4a341d9 <new_slab+0x19>: test   $0xfe000006,%esi                 <<<<----,--- test and conditional jump
    0xffffffffa4a341df <new_slab+0x1f>: jne    0xffffffffa4a34669 <new_slab+0x4a9>  <<--'
    - - - - - - - 8< - - - - - - - 
    /usr/src/debug/kernel-3.10.0-1160.88.1.el7/linux-3.10.0-1160.88.1.el7.x86_64/mm/slub.c: 1532
    0xffffffffa4a34669 <new_slab+0x4a9>:    ud2    
    

    The %esi register is the lower 32 bits of the %rsi register. %rsi held 00000000ffffffff at the time of the panic, so we are testing 0xffffffff against 0xfe000006. The test instruction performs a logical AND, sets CPU flags depending on the outcome, and discards the results. Here the zer0 flag ZF is turned off as a result of the test:

       f    e    0    0    0    0    0    6
    1111_1110_0000_0000_0000_0000_0000_0110
    
       f    f    f    f    f    f    f    f
    1111_1111_1111_1111_1111_1111_1111_1111
    
    
    AND result:
    
    1111_1110_0000_0000_0000_0000_0000_0110
    

    Since the AND is non-zero the ZF is turned off and we jump to the ud2
    instruction at offset 0x4a9.

  5. We are meeting a BUG_ON condition in which the allocation's gfp flags are tested against the __GFP_DMA32, __GFP_HIGHMEM, and ~__GFP_BITS_MASK flags which are not to be used with slab allocations:

    1524 static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
    1525 {
    1526         struct page *page;
    1527         void *start;
    1528         void *p;
    1529         int order;
    1530         int idx;
    1531 
    1532         BUG_ON(flags & GFP_SLAB_BUG_MASK);  <<<----
    
                          161 /* Do not use these with a slab allocator */
                          162 #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)
    
    Following the flags in the function calls in the backtrace, we see the bad GFP flags are first passed to blk_alloc_queue() from code belonging to the third party module dattobd:
    
    725 struct request_queue *blk_alloc_queue(gfp_t gfp_mask)                <= gfp_mask is set to 0xffffffff by the driver
                                                 ^^^^^^^^
    727         return blk_alloc_queue_node(gfp_mask, NUMA_NO_NODE, NULL);              <= gfp_mask passed to blk_alloc_queue_node()
    728 }                                   ^^^^^^^^  
    
    824 struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id,
                                                         ^^^^^^^^
    825                                            spinlock_t *lock)
    826 {
    - - - - - - - - - 8< - - - - - - - -
    831         q = kmem_cache_alloc_node(blk_requestq_cachep,
    832                                 gfp_mask | __GFP_ZERO, node_id);         <= gfp_mask passed to kmem_cache_alloc_node()
                                        ^^^^^^^^
    
       2650 void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
       2651 {                                                       ^^^^^^^^
       2652         void *ret = slab_alloc_node(s, gfpflags, node, _RET_IP_);        <= gfp_mask renamed gfpflags and passed to slab_alloc_node()
       2653                                        ^^^^^^^^
    
           2539 static __always_inline void *slab_alloc_node(struct kmem_cache *s,
           2540                 gfp_t gfpflags, int node, unsigned long addr)
           2541 {                     ^^^^^^^^
           - - - - - - - - - 8< - - - - - - - -
           2578                 object = __slab_alloc(s, gfpflags, node, addr, c);   <= gfpflags passed to __slab_alloc()
                                                         ^^^^^^^^
    
       2508 static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
                                                                  ^^^^^^^^
       2509                           unsigned long addr, struct kmem_cache_cpu *c)
       2510 {
           - - - - - - - - - 8< - - - - - - - -
               2524         p = ___slab_alloc(s, gfpflags, node, addr, c);       <= gfpflags passed to ___slab_alloc()
                                             ^^^^^^^^
    
                   2406 static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
                                                                               ^^^^^^^^
                   2407                           unsigned long addr, struct kmem_cache_cpu *c)
                   - - - - - - - - - 8< - - - - - - - -
                   2414                 goto new_slab;
                   - - - - - - - - - 8< - - - - - - - -                       
                   2472 new_slab:
                   - - - - - - - - - 8< - - - - - - - -
                   2482         freelist = new_slab_objects(s, gfpflags, node, &c); <= gfpflags passed to new_slab_objects()
                                                               ^^^^^^^^
    
           2309 static inline void *new_slab_objects(struct kmem_cache *s, gfp_t flags, <= gfpflags renamed flags
           2310                         int node, struct kmem_cache_cpu **pc)    ^^^^^
           2311 {
                   - - - - - - - - - 8< - - - - - - - -
                   2323         page = new_slab(s, flags, node);                                <= flags passed to new_slab()
                                                   ^^^^^
    
                               1524 static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
                               1525 {                                                        ^^^^^
                               1526         struct page *page;
                               1527         void *start;
                               1528         void *p;
                               1529         int order;
                               1530         int idx;
                               1531
                               1532         BUG_ON(flags & GFP_SLAB_BUG_MASK);  <= flags is still 0xffffffff, the BUG_ON condition is met
                               - - - - - - - - - 8< - - - - - - - -
    

    Backtrace of above code:

    new_slab+0x4a9
    ___slab_alloc at ffffffffa4a34a6c
    __slab_alloc at ffffffffa4fae458
    kmem_cache_alloc_node at ffffffffa4a34c4b
    blk_alloc_queue_node at ffffffffa4b67700
    blk_alloc_queue at ffffffffa4b67a45
    __tracer_setup_snap at ffffffffc0136b97 [dattobd]   <<<--- gfp_mask passed to blk_alloc_queue
    
  6. The kernel ring buffer contains logs indicating datto remounting XFS filesystems in no-recovery mode just prior to the crash:

        crash> log | tail -50
        [  131.075624] XFS (datto7): Mounting V5 filesystem in no-recovery mode. Filesystem will be inconsistent.
        [  131.112995] XFS (datto8): Mounting V5 filesystem in no-recovery mode. Filesystem will be inconsistent.
        [  131.152576] XFS (datto9): Mounting V5 filesystem in no-recovery mode. Filesystem will be inconsistent.
        [  131.192605] XFS (datto10): Mounting V5 filesystem in no-recovery mode. Filesystem will be inconsistent.
        [  131.230231] XFS (datto11): Mounting V5 filesystem in no-recovery mode. Filesystem will be inconsistent.
        [  131.272209] XFS (datto12): Mounting V5 filesystem in no-recovery mode. Filesystem will be inconsistent.
        [  131.313915] XFS (datto13): Mounting V5 filesystem in no-recovery mode. Filesystem will be inconsistent.
        [  131.382407] XFS (datto14): Mounting V5 filesystem in no-recovery mode. Filesystem will be inconsistent.
        [  131.414619] XFS (datto15): Mounting V5 filesystem in no-recovery mode. Filesystem will be inconsistent.
        [  131.432468] ------------[ cut here ]------------
        [  131.432555] kernel BUG at mm/slub.c:1532!
        [  131.432626] invalid opcode: 0000 [#1] SMP 
        - - - - - - - - - 8< - - - - - - - -
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments