System crash on RHEL 8 due to fnic driver double completions or request list corruption
Issue
-
System crash on RHEL due to fnic driver double completions or request corruption.
-
Example1:
crash> bt
PID: 1611328 TASK: ffff9bb46d5b8000 CPU: 0 COMMAND: "multipathd"
#0 [ffffae57e25e7840] machine_kexec at ffffffffaa85bf3e
#1 [ffffae57e25e7898] __crash_kexec at ffffffffaa96072d
#2 [ffffae57e25e7960] crash_kexec at ffffffffaa96160d
#3 [ffffae57e25e7978] oops_end at ffffffffaa822d4d
#4 [ffffae57e25e7998] no_context at ffffffffaa86ba9e
#5 [ffffae57e25e79f0] do_page_fault at ffffffffaa86c5c2
#6 [ffffae57e25e7a20] page_fault at ffffffffab20122e
[exception RIP: blk_mq_dispatch_rq_list+262]
RIP: ffffffffaac09616 RSP: ffffae57e25e7ad0 RFLAGS: 00010246
RAX: ffff9bbe9fc50000 RBX: ffffae57e25e7b78 RCX: ffff9bbb9f69cad8
RDX: 0000000000000000 RSI: ffffae57e25e7b78 RDI: ffff9bbe9fc50000
RBP: 0000000000000000 R8: ffff9bbb9f4e9408 R9: 0000000000000001
R10: ffff9b9180277d40 R11: ffff9bbb9e945d00 R12: ffff9bbb9f69ca90
R13: ffff9bbb9f69cad8 R14: 0000000000000000 R15: ffff9bbe9fc50000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffffae57e25e7b70] __blk_mq_sched_dispatch_requests at ffffffffaac0f05a
#8 [ffffae57e25e7bb8] blk_mq_sched_dispatch_requests at ffffffffaac0f120
#9 [ffffae57e25e7bc8] __blk_mq_run_hw_queue at ffffffffaac07301
#10 [ffffae57e25e7be0] __blk_mq_delay_run_hw_queue at ffffffffaac07bc1
#11 [ffffae57e25e7c08] blk_mq_sched_insert_request at ffffffffaac0f32c
#12 [ffffae57e25e7c60] blk_execute_rq at ffffffffaac0343b
#13 [ffffae57e25e7c98] sg_io at ffffffffaac1adf5
#14 [ffffae57e25e7d20] scsi_cmd_ioctl at ffffffffaac1b280
#15 [ffffae57e25e7df0] sd_ioctl at ffffffffc04457ec [sd_mod]
#16 [ffffae57e25e7e20] blkdev_ioctl at ffffffffaac10367
#17 [ffffae57e25e7e78] block_ioctl at ffffffffaab19e59
#18 [ffffae57e25e7e80] do_vfs_ioctl at ffffffffaaaef614
....
- Example2:
crash> bt
PID: 0 TASK: ffff94a3c16e4740 CPU: 41 COMMAND: "swapper/41"
#0 [ffff94c2bfbc3b50] machine_kexec at ffffffffafa59a5e
#1 [ffff94c2bfbc3ba8] __crash_kexec at ffffffffafb591fd
#2 [ffff94c2bfbc3c70] crash_kexec at ffffffffafb5a0dd
#3 [ffff94c2bfbc3c88] oops_end at ffffffffafa21edd
#4 [ffff94c2bfbc3ca8] do_trap at ffffffffafa1e75c
#5 [ffff94c2bfbc3cf0] do_invalid_op at ffffffffafa1f006
#6 [ffff94c2bfbc3d10] invalid_op at ffffffffb0400d84
[exception RIP: __list_add_valid+65]
RIP: ffffffffafe2d751 RSP: ffff94c2bfbc3dc8 RFLAGS: 00010046
RAX: 0000000000000058 RBX: ffff94c2bfbe7ba0 RCX: 0000000000000006
RDX: 0000000000000000 RSI: 0000000000000046 RDI: ffff94c2bfbd6a00
RBP: ffff94bdcf8fe4c0 R8: 5d3632353330392e R9: 0000000000000069
R10: 0000000000000000 R11: ffff94c2bfbc3c78 R12: ffff94bdcf8fe518
R13: 0000000000000082 R14: ffff94bdcf8fe518 R15: 000000000000008c
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff94c2bfbc3dc0] __list_add_valid at ffffffffafe2d751
#8 [ffff94c2bfbc3dc8] __blk_complete_request at ffffffffafdeca44
#9 [ffff94c2bfbc3df8] blk_mq_complete_request at ffffffffafdedd86
#10 [ffff94c2bfbc3e20] scsi_mq_done at ffffffffaffb49b7
#11 [ffff94c2bfbc3e38] fnic_cleanup_io at ffffffffc08f68c5 [fnic]
#12 [ffff94c2bfbc3e88] fnic_fcpio_cmpl_handler at ffffffffc08f822f [fnic]
#13 [ffff94c2bfbc3ec0] fnic_wq_copy_cmpl_handler at ffffffffc08f8c86 [fnic]
#14 [ffff94c2bfbc3f00] fnic_isr_msix_wq_copy at ffffffffc08ef0cd [fnic]
#15 [ffff94c2bfbc3f10] __handle_irq_event_percpu at ffffffffafb1a6c0
#16 [ffff94c2bfbc3f50] handle_irq_event_percpu at ffffffffafb1a830
Environment
- Red Hat Enterprise Linux 8.3 & 8.2
- kernel-4.18.0-193.14.3.el8_2
- kernel-4.18.0-240.1.1.el8_3
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.