[RHEL 7.9] Crash in scsi_softirq_done() because of NULL rq->special pointer in an already freed struct request

Solution Verified - Updated -

Issue

  • System crashes when hitting a NULL dereference in scsi_softirq_done() function.
  • The events in the console log shortly preceding the crash show power-on or device resets for scsi tapes which means SCSI error recovery routines were active (any device types -- disk, changers, etc. -- that cause error recovery routines within the driver to be executed are exposed to this issue).
...
[5486367.793116] scsi 2:0:5:0: Sequential-Access HP       Ultrium 7-SCSI   M571 PQ: 0 ANSI: 6
[5486367.898989] st 2:0:5:0: Attached scsi tape st13
[5486367.898998] st 2:0:5:0: st13: try direct i/o: yes (alignment 8 B)
[5486367.899306] st 2:0:5:0: Attached scsi generic sg143 type 1
[5486367.958463] st 2:0:5:0: Power-on or device reset occurred
[5486371.329141] scsi 2:0:7:0: Sequential-Access HP       Ultrium 7-SCSI   M571 PQ: 0 ANSI: 6
[5486371.467668] st 2:0:7:0: Attached scsi tape st15
[5486371.467678] st 2:0:7:0: st15: try direct i/o: yes (alignment 8 B)
[5486371.468010] st 2:0:7:0: Attached scsi generic sg145 type 1
[5486371.541896] st 2:0:7:0: Power-on or device reset occurred
[5486371.560687] st 2:0:7:0: Unexpected response from lun 1 while scanning, scan aborted
[5486409.415754]  rport-2:0-8: blocked FC remote port time out: removing target and saving binding
[5486434.258422] BUG: unable to handle kernel NULL pointer dereference at 00000000000000c4
[5486434.259520] IP: [<ffffffffaaeed0e2>] scsi_softirq_done+0x22/0x160
[5486434.260649] PGD 0 
[5486434.261718] Oops: 0000 [#1] SMP 
  • The kernel panic stack trace looks like:
crash> bt
PID: 39748  TASK: ffff9a5928b46300  CPU: 8   COMMAND: "ssh"
 #0 [ffff9a592fc03b40] machine_kexec at ffffffffaaa662c4
 #1 [ffff9a592fc03ba0] __crash_kexec at ffffffffaab22842
 #2 [ffff9a592fc03c70] crash_kexec at ffffffffaab22930
 #3 [ffff9a592fc03c88] oops_end at ffffffffab18d798
 #4 [ffff9a592fc03cb0] no_context at ffffffffaaa75d14
 #5 [ffff9a592fc03d00] __bad_area_nosemaphore at ffffffffaaa75fe2
 #6 [ffff9a592fc03d50] bad_area_nosemaphore at ffffffffaaa76104
 #7 [ffff9a592fc03d60] __do_page_fault at ffffffffab190750
 #8 [ffff9a592fc03dd0] do_page_fault at ffffffffab190975
 #9 [ffff9a592fc03e00] page_fault at ffffffffab18c778
    [exception RIP: scsi_softirq_done+0x22]
    RIP: ffffffffaaeed0e2  RSP: ffff9a592fc03eb0  RFLAGS: 00010246
    RAX: 0000000000000018  RBX: 0000000000000000  RCX: dead000000000200
    RDX: ffff9a592fc03ee0  RSI: ffff9a592fc16380  RDI: ffff9a55162f8600
    RBP: ffff9a592fc03ed0   R8: ffff9a55162f8680   R9: 0000000039aa30ff
    R10: ffffffffab67a480  R11: 000000000000b7a9  R12: ffff9a55162f8600
    R13: 0000000000000000  R14: 00007f705849c000  R15: 0000000000000001
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#10 [ffff9a592fc03ed8] blk_done_softirq at ffffffffaad5d796
#11 [ffff9a592fc03f18] __do_softirq at ffffffffaaaa4b35
#12 [ffff9a592fc03f88] call_softirq at ffffffffab1994ec
#13 [ffff9a592fc03fa0] do_softirq at ffffffffaaa2f715
#14 [ffff9a592fc03fc0] irq_exit at ffffffffaaaa4eb5
#15 [ffff9a592fc03fd8] smp_apic_timer_interrupt at ffffffffab19aa88
#16 [ffff9a592fc03ff0] apic_timer_interrupt at ffffffffab196fba
--- <IRQ stack> ---
#17 [ffff9a5603953b48] apic_timer_interrupt at ffffffffab196fba
    [exception RIP: __mem_cgroup_uncharge_common+0x1b1]
    RIP: ffffffffaac3d531  RSP: ffff9a5603953bf8  RFLAGS: 00000286
    RAX: ffff9a59a7ffec00  RBX: ffffffffaac3d0ce  RCX: 0000000000000001
    RDX: 0000000000000001  RSI: ffffe17c9f4fdd80  RDI: 000000002430591e
    RBP: ffff9a5603953c00   R8: 0000000000000000   R9: 00003ffffffff000
    R10: ffff9a59a3fb5b00  R11: ffff9a554d05bc00  R12: ffff9a567fd81400
    R13: 0000000000000000  R14: ffff9a59a7ffec00  R15: ffffe17c89f61140
    ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018
#18 [ffff9a5603953c08] mem_cgroup_uncharge_page at ffffffffaac41a2a
#19 [ffff9a5603953c18] page_remove_rmap at ffffffffaac01459
#20 [ffff9a5603953c50] unmap_page_range at ffffffffaabf0ed8
#21 [ffff9a5603953d30] unmap_single_vma at ffffffffaabf14a1
#22 [ffff9a5603953d68] unmap_vmas at ffffffffaabf2ed9
#23 [ffff9a5603953da0] exit_mmap at ffffffffaabfcf1c
#24 [ffff9a5603953e58] mmput at ffffffffaaa97ac7
#25 [ffff9a5603953e78] do_exit at ffffffffaaaa1848
#26 [ffff9a5603953f10] do_group_exit at ffffffffaaaa206f
#27 [ffff9a5603953f40] sys_exit_group at ffffffffaaaa20e4
#28 [ffff9a5603953f50] system_call_fastpath at ffffffffab195f92

Environment

  • Red Hat Enterprise Linux 7.9
  • kernel 3.10.0-1160.24.1.el7.x86_64 and earlier 7.9 kernels
    • patch that introduced this bug was added in 7.8, so 7.8 kernels also can be exposed to this issue
  • Qlogic / Marvell qla2xxx driver controlled Fibre Channel interfaces

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content