RHEL7 panic at sk_wait_data with cifs module in backtrace. Due to NULL sock.sk_wq value and sock.sk_flags contains SOCK_DEAD
Issue
-
System panics in sk_wait_data() with a backtrace similar to the following:
[1291903.919901] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [1291903.919996] IP: [<ffffffff95c1eb15>] sk_wait_data+0xc5/0x120 [1291903.920082] PGD 0 [1291903.920139] Oops: 0002 [#1] SMP [1291903.920209] Modules linked in: lp tcp_diag udp_diag inet_diag bridge stp llc cmac arc4 md4 nls_utf8 cifs ccm dns_resolver vmw_vsock_vmci_transport vsock ppdev sb_edac iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel vmw_balloon lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr sg vmw_vmci i2c_piix4 parport_pc parport auth_rpcgss sunrpc binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic sr_mod cdrom crct10dif_pclmul crct10dif_common crc32c_intel serio_raw vmxnet3 vmw_pvscsi vmwgfx ata_generic pata_acpi floppy drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix libata drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod [1291903.920769] CPU: 0 PID: 5033 Comm: cifsd Kdump: loaded Not tainted 3.10.0-957.1.3.el7.x86_64 #1 [1291903.920860] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015 [1291903.920960] task: ffff88366f40d140 ti: ffff883671be8000 task.ti: ffff883671be8000 [1291903.921045] RIP: 0010:[<ffffffff95c1eb15>] [<ffffffff95c1eb15>] sk_wait_data+0xc5/0x120 [1291903.921096] RSP: 0018:ffff883671bebb10 EFLAGS: 00010246 [1291903.921112] RAX: 0000000000000000 RBX: ffff88366f6c1740 RCX: ffff883671bebfd8 [1291903.921132] RDX: ffff883671bebfd8 RSI: 0000000000000200 RDI: ffffffff95c1d5f5 [1291903.921152] RBP: ffff883671bebb60 R08: ffff883671be8000 R09: 0000000000000001 [1291903.921171] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [1291903.921191] R13: 0000000000000000 R14: ffff88366f6c17d0 R15: 0000000000000000 [1291903.921250] FS: 0000000000000000(0000) GS:ffff8836ffc00000(0000) knlGS:0000000000000000 [1291903.921338] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1291903.922380] CR2: 0000000000000008 CR3: 0000000002854000 CR4: 00000000001607f0 [1291903.923106] Call Trace: [1291903.923799] [<ffffffff956c2d00>] ? wake_up_atomic_t+0x30/0x30 [1291903.924490] [<ffffffff95c91bd3>] tcp_recvmsg+0x6d3/0xb30 [1291903.925183] [<ffffffff95cc00e0>] inet_recvmsg+0x80/0xb0 [1291903.925862] [<ffffffff95c194f5>] sock_recvmsg+0xc5/0x100 [1291903.926545] [<ffffffff956d2eb2>] ? check_preempt_curr+0x92/0xa0 [1291903.927252] [<ffffffff956d2ed9>] ? ttwu_do_wakeup+0x19/0xe0 [1291903.927952] [<ffffffff956e011c>] ? update_curr+0x14c/0x1e0 [1291903.928632] [<ffffffff95c1956a>] kernel_recvmsg+0x3a/0x50 [1291903.929338] [<ffffffffc0808045>] cifs_readv_from_socket+0x205/0x310 [cifs] [1291903.930040] [<ffffffff956d64f0>] ? try_to_wake_up+0x190/0x390 [1291903.930749] [<ffffffffc080855c>] cifs_demultiplex_thread+0x11c/0xaa0 [cifs] [1291903.931462] [<ffffffffc0808440>] ? cifs_handle_standard+0x1b0/0x1b0 [cifs] [1291903.932185] [<ffffffff956c1c31>] kthread+0xd1/0xe0 [1291903.932895] [<ffffffff956c1b60>] ? insert_kthread_work+0x40/0x40 [1291903.933622] [<ffffffff95d74c37>] ret_from_fork_nospec_begin+0x21/0x21 [1291903.934348] [<ffffffff956c1b60>] ? insert_kthread_work+0x40/0x40 [1291903.935078] Code: 44 c2 49 39 c5 74 5f 31 f6 48 89 df e8 b5 ea ff ff 4c 8b a3 98 00 00 00 b8 00 00 00 00 4d 39 e6 4c 0f 44 e0 48 8b 83 48 02 00 00 <3e> 80 60 08 fd 48 8b bb e0 00 00 00 48 8d 75 b0 e8 d6 3d aa ff [1291903.936655] RIP [<ffffffff95c1eb15>] sk_wait_data+0xc5/0x120 [1291903.937433] RSP <ffff883671bebb10> [1291903.938234] CR2: 0000000000000008
-
Example of a similar panic seen with a ppc64le kernel.
-
An smb2.1 client on 4.14.0-49.6.1.el7a.ppc64le repeatedly crashes in the cifsd recvmsg code path due to accessing an orphaned sock structure.
[202312.365627] Status code returned 0xc000000d STATUS_INVALID_PARAMETER [202312.365676] CIFS VFS: disabling echoes and oplocks [202312.365703] CIFS VFS: Send error in read = -22 [202316.242199] CIFS VFS: Free previous auth_key.response = c000003fdc51ea00 [202316.243656] Status code returned 0xc00000dc STATUS_INVALID_SERVER_STATE [202316.243710] CIFS VFS: Send error in read = -5 [203412.081965] Status code returned 0xc000000d STATUS_INVALID_PARAMETER [203412.082026] CIFS VFS: Send error in read = -22 [203412.296701] Status code returned 0xc000000d STATUS_INVALID_PARAMETER [203412.296744] CIFS VFS: disabling echoes and oplocks [203412.296770] CIFS VFS: Send error in read = -22 [203414.797934] CIFS VFS: Free previous auth_key.response = c0002039710bba00 [203414.799723] Status code returned 0xc00000dc STATUS_INVALID_SERVER_STATE [203414.799769] CIFS VFS: Send error in read = -5 [267244.949246] CIFS VFS: disabling echoes and oplocks [267244.951363] Unable to handle kernel paging request for data at address 0x00000000 [267244.951411] Faulting instruction address: 0xc000000000c68674 [267244.951449] Oops: Kernel access of bad area, sig: 11 [#1] [267244.951477] LE SMP NR_CPUS=2048 NUMA PowerNV [267244.951509] Modules linked in: nvidia_uvm(POE) arc4 md4 nls_utf8 cifs ccm sctp_diag sctp libcrc32c tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag nfsv3 nfs_acl rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) cxl mlx4_en(OE) mlx4_ib(OE) ib_core(OE) mlx4_core(OE) mlx_compat(OE) devlink i2c_dev dm_mirror dm_region_hash dm_log dm_mod at24 ofpart sg powernv_flash shpchp mtd opal_prd ipmi_powernv uio_pdrv_genirq uio ibmpowernv i2c_opal auth_rpcgss knem(OE) sunrpc binfmt_misc tcp_htcp ip_tables ext4 mbcache jbd2 raid1 sd_mod nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) ast i2c_algo_bit drm_kms_helper ttm syscopyarea sysfillrect [267244.951918] sysimgblt fb_sys_fops drm ahci libahci libata tg3 ipmi_devintf i2c_core ipmi_msghandler ptp pps_core [last unloaded: devlink] [267244.951991] CPU: 112 PID: 18188 Comm: cifsd Kdump: loaded Tainted: P OE ------------ 4.14.0-49.6.1.el7a.ppc64le #1 [267244.952053] task: c000003f7eb5b500 task.stack: c000003fd6a04000 [267244.952088] NIP: c000000000c68674 LR: c0000000001b5178 CTR: c0000000001920d0 [267244.952130] REGS: c000003fd6a07600 TRAP: 0300 Tainted: P OE ------------ (4.14.0-49.6.1.el7a.ppc64le) [267244.952184] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28182b42 XER: 20040000 [267244.952230] CFAR: c0000000001b5174 DAR: 0000000000000000 DSISR: 40000000 SOFTE: 0 GPR00: c0000000001b5178 c000003fd6a07880 c0000000014c8200 0000000000000000 GPR04: c000003fd6a07948 0000000000000001 0000000028188b42 0000000002000000 GPR08: 0000000000000000 0000000000000000 0000000080000070 0000000000000001 GPR12: 0000000000002000 c000000007a5d000 0000000000000000 c000203944461800 GPR16: 0000000000000000 0000000000000000 c0002039444618c8 c000203944461d1c GPR20: 0000000000000000 0000000000000000 c000000001420a08 0000000000000000 GPR24: 0000000000000000 0000000000000000 c000003fd6a07a30 c0002039444618c8 GPR28: 0000000000000000 c000203944461890 0000000000000000 0000000000000001 [267244.952603] NIP [c000000000c68674] _raw_spin_lock_irqsave+0x44/0x100 [267244.952641] LR [c0000000001b5178] remove_wait_queue+0x38/0xc0 [267244.952675] Call Trace: [267244.952691] [c000003fd6a07880] [c000003fd6a07a30] 0xc000003fd6a07a30 (unreliable) [267244.952734] [c000003fd6a078c0] [c0000000001b5178] remove_wait_queue+0x38/0xc0 [267244.952789] [c000003fd6a07900] [c000000000a34134] sk_wait_data+0x1f4/0x310 [267244.952873] [c000003fd6a079a0] [c000000000b04b7c] tcp_recvmsg+0x67c/0xa70 [267244.952955] [c000003fd6a07b00] [c000000000b4a840] inet_recvmsg+0x80/0x120 [267244.953006] [c000003fd6a07b60] [c000000000a2561c] sock_recvmsg+0x7c/0xa0 [267244.953055] [c000003fd6a07ba0] [c00800000da09b58] cifs_readv_from_socket+0x78/0x2e0 [cifs] [267244.953105] [c000003fd6a07c30] [c00800000da09e24] cifs_read_from_socket+0x64/0x80 [cifs] [267244.953154] [c000003fd6a07ce0] [c00800000da0a42c] cifs_demultiplex_thread+0x17c/0xd40 [cifs] [267244.953203] [c000003fd6a07dc0] [c000000000171ce8] kthread+0x168/0x1b0 [267244.953240] [c000003fd6a07e30] [c00000000000b628] ret_from_kernel_thread+0x5c/0xb4 [267244.953282] Instruction dump: [267244.953304] fbe1fff8 f8010010 f821ffc1 7c7e1b78 60000000 60000000 39200000 8bed028a [267244.953347] 992d028a 39400000 994d028c 814d0008 <7d20f029> 2c090000 40c20010 7d40f12d [267244.953393] ---[ end trace da6e0ba5208827e0 ]--- [267245.962125] [267245.962197] Sending IPI to other CPUs [267245.981920] IPI complete [267248.043801] kexec: Starting switchover sequence.
Environment
- Red Hat Enterprise Linux 7
- seen on 3.10.0-957.3.1.el7.x86_64, kernel-3.10.0-957.1.5.el7.x86_64, and 3.10.0-957.10.1.el7.x86_64
- Red Hat Enterprise Linux for Power
- seen on 4.14.0-49.6.1.el7a.ppc64le
- network or other disruption causing cifs client to reconnect
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.