RHEL7 panic at sk_wait_data with cifs module in backtrace. Due to NULL sock.sk_wq value and sock.sk_flags contains SOCK_DEAD

Solution Unverified - Updated -

Issue

  • System panics in sk_wait_data() with a backtrace similar to the following:

    [1291903.919901] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    [1291903.919996] IP: [<ffffffff95c1eb15>] sk_wait_data+0xc5/0x120
    [1291903.920082] PGD 0 
    [1291903.920139] Oops: 0002 [#1] SMP 
    [1291903.920209] Modules linked in: lp tcp_diag udp_diag inet_diag bridge stp llc cmac arc4 md4 nls_utf8 cifs ccm dns_resolver vmw_vsock_vmci_transport vsock ppdev sb_edac iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel vmw_balloon lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr sg vmw_vmci i2c_piix4 parport_pc parport auth_rpcgss sunrpc binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic sr_mod cdrom crct10dif_pclmul crct10dif_common crc32c_intel serio_raw vmxnet3 vmw_pvscsi vmwgfx ata_generic pata_acpi floppy drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix libata drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod
    [1291903.920769] CPU: 0 PID: 5033 Comm: cifsd Kdump: loaded Not tainted 3.10.0-957.1.3.el7.x86_64 #1
    [1291903.920860] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015
    [1291903.920960] task: ffff88366f40d140 ti: ffff883671be8000 task.ti: ffff883671be8000
    [1291903.921045] RIP: 0010:[<ffffffff95c1eb15>]  [<ffffffff95c1eb15>] sk_wait_data+0xc5/0x120
    [1291903.921096] RSP: 0018:ffff883671bebb10  EFLAGS: 00010246
    [1291903.921112] RAX: 0000000000000000 RBX: ffff88366f6c1740 RCX: ffff883671bebfd8
    [1291903.921132] RDX: ffff883671bebfd8 RSI: 0000000000000200 RDI: ffffffff95c1d5f5
    [1291903.921152] RBP: ffff883671bebb60 R08: ffff883671be8000 R09: 0000000000000001
    [1291903.921171] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    [1291903.921191] R13: 0000000000000000 R14: ffff88366f6c17d0 R15: 0000000000000000
    [1291903.921250] FS:  0000000000000000(0000) GS:ffff8836ffc00000(0000) knlGS:0000000000000000
    [1291903.921338] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [1291903.922380] CR2: 0000000000000008 CR3: 0000000002854000 CR4: 00000000001607f0
    [1291903.923106] Call Trace:
    [1291903.923799]  [<ffffffff956c2d00>] ? wake_up_atomic_t+0x30/0x30
    [1291903.924490]  [<ffffffff95c91bd3>] tcp_recvmsg+0x6d3/0xb30
    [1291903.925183]  [<ffffffff95cc00e0>] inet_recvmsg+0x80/0xb0
    [1291903.925862]  [<ffffffff95c194f5>] sock_recvmsg+0xc5/0x100
    [1291903.926545]  [<ffffffff956d2eb2>] ? check_preempt_curr+0x92/0xa0
    [1291903.927252]  [<ffffffff956d2ed9>] ? ttwu_do_wakeup+0x19/0xe0
    [1291903.927952]  [<ffffffff956e011c>] ? update_curr+0x14c/0x1e0
    [1291903.928632]  [<ffffffff95c1956a>] kernel_recvmsg+0x3a/0x50
    [1291903.929338]  [<ffffffffc0808045>] cifs_readv_from_socket+0x205/0x310 [cifs]
    [1291903.930040]  [<ffffffff956d64f0>] ? try_to_wake_up+0x190/0x390
    [1291903.930749]  [<ffffffffc080855c>] cifs_demultiplex_thread+0x11c/0xaa0 [cifs]
    [1291903.931462]  [<ffffffffc0808440>] ? cifs_handle_standard+0x1b0/0x1b0 [cifs]
    [1291903.932185]  [<ffffffff956c1c31>] kthread+0xd1/0xe0
    [1291903.932895]  [<ffffffff956c1b60>] ? insert_kthread_work+0x40/0x40
    [1291903.933622]  [<ffffffff95d74c37>] ret_from_fork_nospec_begin+0x21/0x21
    [1291903.934348]  [<ffffffff956c1b60>] ? insert_kthread_work+0x40/0x40
    [1291903.935078] Code: 44 c2 49 39 c5 74 5f 31 f6 48 89 df e8 b5 ea ff ff 4c 8b a3 98 00 00 00 b8 00 00 00 00 4d 39 e6 4c 0f 44 e0 48 8b 83 48 02 00 00 <3e> 80 60 08 fd 48 8b bb e0 00 00 00 48 8d 75 b0 e8 d6 3d aa ff 
    [1291903.936655] RIP  [<ffffffff95c1eb15>] sk_wait_data+0xc5/0x120
    [1291903.937433]  RSP <ffff883671bebb10>
    [1291903.938234] CR2: 0000000000000008
    
  • Example of a similar panic seen with a ppc64le kernel.

  • An smb2.1 client on 4.14.0-49.6.1.el7a.ppc64le repeatedly crashes in the cifsd recvmsg code path due to accessing an orphaned sock structure.

    [202312.365627] Status code returned 0xc000000d STATUS_INVALID_PARAMETER
    [202312.365676] CIFS VFS: disabling echoes and oplocks
    [202312.365703] CIFS VFS: Send error in read = -22
    [202316.242199] CIFS VFS: Free previous auth_key.response = c000003fdc51ea00
    [202316.243656] Status code returned 0xc00000dc STATUS_INVALID_SERVER_STATE
    [202316.243710] CIFS VFS: Send error in read = -5
    [203412.081965] Status code returned 0xc000000d STATUS_INVALID_PARAMETER
    [203412.082026] CIFS VFS: Send error in read = -22
    [203412.296701] Status code returned 0xc000000d STATUS_INVALID_PARAMETER
    [203412.296744] CIFS VFS: disabling echoes and oplocks
    [203412.296770] CIFS VFS: Send error in read = -22
    [203414.797934] CIFS VFS: Free previous auth_key.response = c0002039710bba00
    [203414.799723] Status code returned 0xc00000dc STATUS_INVALID_SERVER_STATE
    [203414.799769] CIFS VFS: Send error in read = -5
    [267244.949246] CIFS VFS: disabling echoes and oplocks
    [267244.951363] Unable to handle kernel paging request for data at address 0x00000000
    [267244.951411] Faulting instruction address: 0xc000000000c68674
    [267244.951449] Oops: Kernel access of bad area, sig: 11 [#1]
    [267244.951477] LE SMP NR_CPUS=2048 NUMA PowerNV
    [267244.951509] Modules linked in: nvidia_uvm(POE) arc4 md4 nls_utf8 cifs ccm sctp_diag sctp libcrc32c tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag nfsv3 nfs_acl rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) cxl mlx4_en(OE) mlx4_ib(OE) ib_core(OE) mlx4_core(OE) mlx_compat(OE) devlink i2c_dev dm_mirror dm_region_hash dm_log dm_mod at24 ofpart sg powernv_flash shpchp mtd opal_prd ipmi_powernv uio_pdrv_genirq uio ibmpowernv i2c_opal auth_rpcgss knem(OE) sunrpc binfmt_misc tcp_htcp ip_tables ext4 mbcache jbd2 raid1 sd_mod nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) ast i2c_algo_bit drm_kms_helper ttm syscopyarea sysfillrect
    [267244.951918]  sysimgblt fb_sys_fops drm ahci libahci libata tg3 ipmi_devintf i2c_core ipmi_msghandler ptp pps_core [last unloaded: devlink]
    [267244.951991] CPU: 112 PID: 18188 Comm: cifsd Kdump: loaded Tainted: P           OE  ------------   4.14.0-49.6.1.el7a.ppc64le #1
    [267244.952053] task: c000003f7eb5b500 task.stack: c000003fd6a04000
    [267244.952088] NIP:  c000000000c68674 LR: c0000000001b5178 CTR: c0000000001920d0
    [267244.952130] REGS: c000003fd6a07600 TRAP: 0300   Tainted: P           OE  ------------    (4.14.0-49.6.1.el7a.ppc64le)
    [267244.952184] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28182b42  XER: 20040000
    [267244.952230] CFAR: c0000000001b5174 DAR: 0000000000000000 DSISR: 40000000 SOFTE: 0 
                    GPR00: c0000000001b5178 c000003fd6a07880 c0000000014c8200 0000000000000000 
                    GPR04: c000003fd6a07948 0000000000000001 0000000028188b42 0000000002000000 
                    GPR08: 0000000000000000 0000000000000000 0000000080000070 0000000000000001 
                    GPR12: 0000000000002000 c000000007a5d000 0000000000000000 c000203944461800 
                    GPR16: 0000000000000000 0000000000000000 c0002039444618c8 c000203944461d1c 
                    GPR20: 0000000000000000 0000000000000000 c000000001420a08 0000000000000000 
                    GPR24: 0000000000000000 0000000000000000 c000003fd6a07a30 c0002039444618c8 
                    GPR28: 0000000000000000 c000203944461890 0000000000000000 0000000000000001 
    [267244.952603] NIP [c000000000c68674] _raw_spin_lock_irqsave+0x44/0x100
    [267244.952641] LR [c0000000001b5178] remove_wait_queue+0x38/0xc0
    [267244.952675] Call Trace:
    [267244.952691] [c000003fd6a07880] [c000003fd6a07a30] 0xc000003fd6a07a30 (unreliable)
    [267244.952734] [c000003fd6a078c0] [c0000000001b5178] remove_wait_queue+0x38/0xc0
    [267244.952789] [c000003fd6a07900] [c000000000a34134] sk_wait_data+0x1f4/0x310
    [267244.952873] [c000003fd6a079a0] [c000000000b04b7c] tcp_recvmsg+0x67c/0xa70
    [267244.952955] [c000003fd6a07b00] [c000000000b4a840] inet_recvmsg+0x80/0x120
    [267244.953006] [c000003fd6a07b60] [c000000000a2561c] sock_recvmsg+0x7c/0xa0
    [267244.953055] [c000003fd6a07ba0] [c00800000da09b58] cifs_readv_from_socket+0x78/0x2e0 [cifs]
    [267244.953105] [c000003fd6a07c30] [c00800000da09e24] cifs_read_from_socket+0x64/0x80 [cifs]
    [267244.953154] [c000003fd6a07ce0] [c00800000da0a42c] cifs_demultiplex_thread+0x17c/0xd40 [cifs]
    [267244.953203] [c000003fd6a07dc0] [c000000000171ce8] kthread+0x168/0x1b0
    [267244.953240] [c000003fd6a07e30] [c00000000000b628] ret_from_kernel_thread+0x5c/0xb4
    [267244.953282] Instruction dump:
    [267244.953304] fbe1fff8 f8010010 f821ffc1 7c7e1b78 60000000 60000000 39200000 8bed028a 
    [267244.953347] 992d028a 39400000 994d028c 814d0008 <7d20f029> 2c090000 40c20010 7d40f12d 
    [267244.953393] ---[ end trace da6e0ba5208827e0 ]---
    [267245.962125] 
    [267245.962197] Sending IPI to other CPUs
    [267245.981920] IPI complete
    [267248.043801] kexec: Starting switchover sequence.
    

Environment

  • Red Hat Enterprise Linux 7
    • seen on 3.10.0-957.3.1.el7.x86_64, kernel-3.10.0-957.1.5.el7.x86_64, and 3.10.0-957.10.1.el7.x86_64
  • Red Hat Enterprise Linux for Power
    • seen on 4.14.0-49.6.1.el7a.ppc64le
  • network or other disruption causing cifs client to reconnect

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content