Oops in scsi_dma_unmap due to memory corruption
Issue
System crashes when dereferences a bad pointer in scsi_dma_unmap().
The kernel messages show:
[1887417.520241] Unable to handle kernel paging request for data at address 0x48027979e8410028
[1887417.520272] Faulting instruction address: 0xc00000000069eb50
[1887417.520279] Oops: Kernel access of bad area, sig: 11 [#1]
[1887417.520282] SMP NR_CPUS=2048 NUMA pSeries
[1887417.520290] Modules linked in: dm_mod mmfs26(OE) mmfslinux(OE) tracedev(OE) mptctl mptbase rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_fpga_tools(OE) nx_crypto pseries_rng bonding mlx5_ib(OE) mlx4_en(OE) mlx4_ib(OE) ib_core(OE) mlx4_core(OE) sunrpc shpchp binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common mpt2sas ipr mlx5_core(OE) libata raid_class tg3 scsi_transport_sas mlxfw(OE) devlink mlx_compat(OE) ptp pps_core sg [last unloaded: tracedev]
[1887417.520343] CPU: 144 PID: 0 Comm: swapper/144 Kdump: loaded Tainted: G OE ------------ 3.10.0-862.14.4.el7.ppc64 #1
[1887417.520348] task: c000003e68062bc0 ti: c000003e7fb10000 task.ti: c000003e6809c000
[1887417.520352] NIP: c00000000069eb50 LR: d00000001c443bf0 CTR: c00000000069eae0
[1887417.520357] REGS: c000003e7fb13710 TRAP: 0300 Tainted: G OE ------------ (3.10.0-862.14.4.el7.ppc64)
[1887417.520361] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: 28000044 XER: 20000000
[1887417.520372] CFAR: c0000000000093ec DAR: 48027979e8410028 DSISR: 40000000 SOFTE: 0
GPR00: d00000001c443bf0 c000003e7fb13990 c00000000151b500 c000003e3ab00000
GPR04: c0000024ae1b6000 0000000000000021 0000000000000002 0000000000000000
GPR08: d000000024f5be80 c000003e3ab00000 48027979e8410028 d00000001c45c598
GPR12: c00000000069eae0 c000000007b51000 c000003e6809ff90 000000001eea3300
GPR16: 0000000003000000 000000000b000000 0000000000000000 0000000003000000
GPR20: 000000000b000000 0000000000000008 c000003e62523a40 0000000000000001
GPR24: ffffffffffffffff c000003e62e06700 0000000000003340 0000000000000000
GPR28: 0000000000000000 00000000000000ce c000003e62580760 c000002e47b91c00
[1887417.520427] NIP [c00000000069eb50] .scsi_dma_unmap+0x70/0xd0
[1887417.520439] LR [d00000001c443bf0] ._scsih_io_done+0x4e0/0xca0 [mpt2sas]
[1887417.520442] Call Trace:
[1887417.520446] [c000003e7fb13990] [c000003e6252c240] 0xc000003e6252c240 (unreliable)
[1887417.520455] [c000003e7fb13a00] [d00000001c443bf0] ._scsih_io_done+0x4e0/0xca0 [mpt2sas]
[1887417.520463] [c000003e7fb13b20] [d00000001c427eb8] ._base_interrupt+0x358/0xe60 [mpt2sas]
[1887417.520471] [c000003e7fb13ca0] [c0000000001df654] .__handle_irq_event_percpu+0x94/0x2e0
[1887417.520475] [c000003e7fb13d80] [c0000000001df980] .handle_irq_event+0x60/0x100
[1887417.520481] [c000003e7fb13e10] [c0000000001e4e2c] .handle_fasteoi_irq+0xcc/0x230
[1887417.520486] [c000003e7fb13e90] [c0000000001de89c] .generic_handle_irq+0x4c/0x80
[1887417.520492] [c000003e7fb13f10] [c000000000014944] .__do_irq+0x84/0x1a0
[1887417.520497] [c000003e7fb13f90] [c000000000028580] .call_do_irq+0x14/0x24
[1887417.520502] [c000003e6809f820] [c000000000014aec] .do_IRQ+0x8c/0x100
[1887417.520507] [c000003e6809f8c0] [c000000000002a94] hardware_interrupt_common+0x114/0x180
The kernel panic stack trace is
crash> bt
PID: 0 TASK: c000003e68062bc0 CPU: 144 COMMAND: "swapper/144"
#0 [c000003e7fb13990] .scsi_dma_unmap at c00000000069eb50
#1 [c000003e7fb13a00] ._scsih_io_done at d00000001c443bf0 [mpt2sas]
#2 [c000003e7fb13b20] ._base_interrupt at d00000001c427eb8 [mpt2sas]
#3 [c000003e7fb13ca0] .__handle_irq_event_percpu at c0000000001df654
#4 [c000003e7fb13d80] .handle_irq_event at c0000000001df980
#5 [c000003e7fb13e10] .handle_fasteoi_irq at c0000000001e4e2c
#6 [c000003e7fb13e90] .generic_handle_irq at c0000000001de89c
#7 [c000003e7fb13f10] .__do_irq at c000000000014944
#8 [c000003e7fb13f90] .call_do_irq at c000000000028580
#9 [c000003e6809f820] .do_IRQ at c000000000014aec
#10 [c000003e6809f8c0] hardware_interrupt_common at c000000000002a94
Hardware Interrupt [501] exception frame:
R0: c0000000007cfb40 R1: c000003e6809fbb0 R2: c00000000151b500
R3: 0000000000000001 R4: c0000000014122c0 R5: 0000000000000000
R6: 0038940dc1000000 R7: 0000000000000018 R8: 0000000000000808
R9: 000389f5ec7b9415 R10: c00000000191b500 R11: 000389f5ec1c7856
R12: 0000000044000088 R13: c000000007b51000
NIP: c0000000007d212c MSR: 8000000100009032 OR3: c0000000007d2130
CTR: c0000000007d2070 LR: c0000000007d20b0 XER: 0000000000000000
CCR: 0000000024000084 MQ: 0000000000000001 DAR: c000003e6809c000
DSISR: c000000000013ee0 Syscall Result: 0000000000000000
[NIP : .snooze_loop+0xbc]
[LR : .snooze_loop+0x40]
#11 [c000003e6809fbb0] .snooze_loop at c0000000007d212c (unreliable)
#12 [c000003e6809fc50] .cpuidle_idle_call at c0000000007cfb40
#13 [c000003e6809fd30] .pseries_lpar_idle at c00000000009df30
#14 [c000003e6809fda0] .arch_cpu_idle at c00000000001bb34
#15 [c000003e6809fe20] .cpu_startup_entry at c0000000001701e0
#16 [c000003e6809fed0] .start_secondary at c00000000004fbf0
#17 [c000003e6809ff90] start_secondary_prolog at c000000000009b6c
Environment
- Red Hat Enterprise Linux 7
- Kernel 3.10.0-862.14.4.el7.ppc64
- IBM PowerPC architecture
- mpt2sas driver controlled SCSI
- possible involvement of Mellanox Infiniband mlx5_ib driver
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.