Kernel Panic: "BUG: unable to handle kernel NULL pointer dereference at (null)" with RIP in afs_linux_write_begin+0x36/0x160

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 7

Issue

  • Server reboot with kernel panic message BUG: unable to handle kernel NULL pointer dereference at (null).
  • RIP in function afs_linux_write_begin+0x36/0x160.

Resolution

An exception occurred in unsigned openafs kernel module. Contact the openafs module vendor for further investigation.

Root Cause

The issue is because of dereferencing of a null address in RIP at afs_linux_write_begin+0x36/0x160 by the unsigned module openafs.

Diagnostic Steps

  • Kernel ring buffer from vmcore-dmesg.txt


    $ tail -n 40 vmcore-dmesg.txt [1476087.659656] BUG: unable to handle kernel NULL pointer dereference at (null) [1476087.659678] IP: [<ffffffffa06b54f6>] afs_linux_write_begin+0x36/0x160 [openafs] [1476087.659680] PGD 7c2bf2067 PUD 7c2bf3067 PMD 0 [1476087.659681] Oops: 0000 [#1] SMP [1476087.659702] Modules linked in: ext4 mbcache jbd2 overlay(T) squashfs loop rpcsec_gss_krb5 nfsv4 dns_resolver vfat fat uas usb_storage mpt3sas mpt2sas raid_class scsi_transport_sas mptctl mptbase dell_rbu osc(OE) mgc(OE) lustre(OE) lmv(OE) fld(OE) mdc(OE) fid(OE) lov(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_ssse3 sha512_generic nfsv3 nfs fscache crypto_null libcfs(OE) bonding ipmi_devintf iTCO_wdt iTCO_vendor_support dcdbas intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ipmi_ssif sb_edac edac_core sg i2c_i801 lpc_ich mei_me mei ipmi_si ipmi_msghandler shpchp wmi acpi_power_meter openafs(POE) binfmt_misc nfsd nfs_acl auth_rpcgss lockd grace sunrpc ip_tables xfs libcrc32c sd_mod [1476087.659712] crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel mgag200 drm_kms_helper mlx5_core syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci igb libahci drm dca libata i2c_algo_bit i40e i2c_core ptp megaraid_sas pps_core fjes dm_mirror dm_region_hash dm_log dm_mod [1476087.659715] CPU: 4 PID: 610331 Comm: R Tainted: POE ------------ T 3.10.0-514.16.1.el7.x86_64 #1 [1476087.659716] Hardware name: Dell Inc. PowerEdge R930/0Y0V4F, BIOS 2.5.2 005/24/2018 [1476087.659716] task: ffff8819507b2f10 ti: ffff8808941d8000 task.ti: ffff8808941d8000 [1476087.659726] RIP: 0010:[<ffffffffa06b54f6>] [<ffffffffa06b54f6>] afs_linux_write_begin+0x36/0x160 [openafs] [1476087.659727] RSP: 0018:ffff8808941dbc00 EFLAGS: 00010282 [1476087.659728] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000030 [1476087.659728] RDX: ffffea007c8339a0 RSI: ffff881fff25a0c8 RDI: 0000000000000297 [1476087.659729] RBP: ffff8808941dbc28 R08: ffffea007c8339a0 R09: ffff88207ffcfa80 [1476087.659730] R10: 0000000000000018 R11: 0000000000000003 R12: 00000000000003cd [1476087.659730] R13: ffff88233eebba00 R14: ffff8808941dbc90 R15: 0000000000000003 [1476087.659732] FS: 00002aed21060340(0000) GS:ffff881fff240000(0000) knlGS:0000000000000000 [1476087.659732] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1476087.659733] CR2: 0000000000000000 CR3: 00000007c2bf1000 CR4: 00000000003407e0 [1476087.659733] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1476087.659734] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [1476087.659734] Stack: [1476087.659736] 00000000000003cd 0000000000000003 0000000000000c33 ffff882dcc320950 [1476087.659737] ffff8808941dbe28 ffff8808941dbcf0 ffffffff81181b2e ffff8808941dbc98 [1476087.659738] ffff8808941dbe70 00000000000003cd 0000000000000c33 ffff8819507b2f10 [1476087.659738] Call Trace: [1476087.659746] [<ffffffff81181b2e>] generic_file_buffered_write+0x11e/0x2a0 [1476087.659749] [<ffffffff81183112>] __generic_file_aio_write+0x1e2/0x400 [1476087.659750] [<ffffffff811806ee>] ? __find_get_page+0x1e/0xa0 [1476087.659752] [<ffffffff81183389>] generic_file_aio_write+0x59/0xa0 [1476087.659762] [<ffffffffa06b3979>] afs_linux_aio_write+0x249/0x490 [openafs] [1476087.659763] [<ffffffff81180b9b>] ? unlock_page+0x2b/0x30 [1476087.659765] [<ffffffff811fdf3d>] do_sync_write+0x8d/0xd0 [1476087.659767] [<ffffffff811fe7ad>] vfs_write+0xbd/0x1e0 [1476087.659767] [<ffffffff811ff2cf>] SyS_write+0x7f/0xe0 [1476087.659770] [<ffffffff81697089>] system_call_fastpath+0x16/0x1b [1476087.659781] Code: 41 89 cf 41 56 4d 89 ce 41 55 49 89 fd 48 89 f7 48 89 d6 41 54 48 c1 fe 0c 49 89 d4 44 89 c2 53 e8 e0 c3 ac e0 48 89 c3 49 89 06 <48> 8b 00 a8 08 74 13 5b 41 5c 41 5d 41 5e 41 5f 31 c0 5d c3 66 [1476087.659790] RIP [<ffffffffa06b54f6>] afs_linux_write_begin+0x36/0x160 [openafs] [1476087.659791] RSP <ffff8808941dbc00> [1476087.659792] CR2: 0000000000000000
  • The panic task is 'R' with PID 610331

    CPU: 4 PID: 610331 Comm: R Tainted: POE  ------------ T 3.10.0-514.16.1.el7.x86_64 #1
    
  • The exception occurred at afs_linux_write_begin+0x36

    RIP  [<ffffffffa06b54f6>] afs_linux_write_begin+0x36/0x160 [openafs]
    
  • All codes

      0:    41 89 cf                mov    %ecx,%r15d
      3:    41 56                   push   %r14
      5:    4d 89 ce                mov    %r9,%r14
      8:    41 55                   push   %r13
      a:    49 89 fd                mov    %rdi,%r13
      d:    48 89 f7                mov    %rsi,%rdi
     10:    48 89 d6                mov    %rdx,%rsi
     13:    41 54                   push   %r12
     15:    48 c1 fe 0c             sar    $0xc,%rsi
     19:    49 89 d4                mov    %rdx,%r12
     1c:    44 89 c2                mov    %r8d,%edx
     1f:    53                      push   %rbx
     20:    e8 e0 c3 ac e0          callq  0xffffffffe0acc405
     25:    48 89 c3                mov    %rax,%rbx
     28:    49 89 06                mov    %rax,(%r14)
     2b:*   48 8b 00                mov    (%rax),%rax     <<<<< trapping instruction
     2e:    a8 08                   test   $0x8,%al
     30:    74 13                   je     0x45
     32:    5b                      pop    %rbx
     33:    41 5c                   pop    %r12
     35:    41 5d                   pop    %r13
     37:    41 5e                   pop    %r14
     39:    41 5f                   pop    %r15
     3b:    31 c0                   xor    %eax,%eax
     3d:    5d                      pop    %rbp
     3e:    c3                      retq   
     3f:    66                      data16
    
    
  • Code starting with the faulting instruction

      0:    48 8b 00                mov    (%rax),%rax
      3:    a8 08                   test   $0x8,%al
      5:    74 13                   je     0x1a
      7:    5b                      pop    %rbx
      8:    41 5c                   pop    %r12
      a:    41 5d                   pop    %r13
      c:    41 5e                   pop    %r14
      e:    41 5f                   pop    %r15
     10:    31 c0                   xor    %eax,%eax
     12:    5d                      pop    %rbp
     13:    c3                      retq   
     14:    66                      data16
    
    
  • The address stored in the register %rax at afs_linux_write_begin+0x36 is NULL.

    RAX: 0000000000000000
    
  • The function afs_linux_write_begin() is a part of an unsigned (E) module openafs.

    RIP  [<ffffffffa06b54f6>] afs_linux_write_begin+0x36/0x160 [openafs]
                                      ^                           ^
                                      |                           |
                              [ Function Name ]             [ Module Name ]
    
    
  • The above evidence suggests that there is an issue within unsigned (E) module openafs.

  • Since Red Hat does not have the source code of this module for detail investigation, contact the provider of openafs module for further investigation.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments