Oops in scsi_dma_unmap due to memory corruption

Solution Unverified - Updated -

Issue

System crashes when dereferences a bad pointer in scsi_dma_unmap().
The kernel messages show:

[1887417.520241] Unable to handle kernel paging request for data at address 0x48027979e8410028
[1887417.520272] Faulting instruction address: 0xc00000000069eb50
[1887417.520279] Oops: Kernel access of bad area, sig: 11 [#1]
[1887417.520282] SMP NR_CPUS=2048 NUMA pSeries
[1887417.520290] Modules linked in: dm_mod mmfs26(OE) mmfslinux(OE) tracedev(OE) mptctl mptbase rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_fpga_tools(OE) nx_crypto pseries_rng bonding mlx5_ib(OE) mlx4_en(OE) mlx4_ib(OE) ib_core(OE) mlx4_core(OE) sunrpc shpchp binfmt_misc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common mpt2sas ipr mlx5_core(OE) libata raid_class tg3 scsi_transport_sas mlxfw(OE) devlink mlx_compat(OE) ptp pps_core sg [last unloaded: tracedev]
[1887417.520343] CPU: 144 PID: 0 Comm: swapper/144 Kdump: loaded Tainted: G           OE  ------------   3.10.0-862.14.4.el7.ppc64 #1
[1887417.520348] task: c000003e68062bc0 ti: c000003e7fb10000 task.ti: c000003e6809c000
[1887417.520352] NIP: c00000000069eb50 LR: d00000001c443bf0 CTR: c00000000069eae0
[1887417.520357] REGS: c000003e7fb13710 TRAP: 0300   Tainted: G           OE  ------------    (3.10.0-862.14.4.el7.ppc64)
[1887417.520361] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI>  CR: 28000044  XER: 20000000
[1887417.520372] CFAR: c0000000000093ec DAR: 48027979e8410028 DSISR: 40000000 SOFTE: 0
                 GPR00: d00000001c443bf0 c000003e7fb13990 c00000000151b500 c000003e3ab00000
                 GPR04: c0000024ae1b6000 0000000000000021 0000000000000002 0000000000000000
                 GPR08: d000000024f5be80 c000003e3ab00000 48027979e8410028 d00000001c45c598
                 GPR12: c00000000069eae0 c000000007b51000 c000003e6809ff90 000000001eea3300
                 GPR16: 0000000003000000 000000000b000000 0000000000000000 0000000003000000
                 GPR20: 000000000b000000 0000000000000008 c000003e62523a40 0000000000000001
                 GPR24: ffffffffffffffff c000003e62e06700 0000000000003340 0000000000000000
                GPR28: 0000000000000000 00000000000000ce c000003e62580760 c000002e47b91c00
[1887417.520427] NIP [c00000000069eb50] .scsi_dma_unmap+0x70/0xd0
[1887417.520439] LR [d00000001c443bf0] ._scsih_io_done+0x4e0/0xca0 [mpt2sas]
[1887417.520442] Call Trace:
[1887417.520446] [c000003e7fb13990] [c000003e6252c240] 0xc000003e6252c240 (unreliable)
[1887417.520455] [c000003e7fb13a00] [d00000001c443bf0] ._scsih_io_done+0x4e0/0xca0 [mpt2sas]
[1887417.520463] [c000003e7fb13b20] [d00000001c427eb8] ._base_interrupt+0x358/0xe60 [mpt2sas]
[1887417.520471] [c000003e7fb13ca0] [c0000000001df654] .__handle_irq_event_percpu+0x94/0x2e0
[1887417.520475] [c000003e7fb13d80] [c0000000001df980] .handle_irq_event+0x60/0x100
[1887417.520481] [c000003e7fb13e10] [c0000000001e4e2c] .handle_fasteoi_irq+0xcc/0x230
[1887417.520486] [c000003e7fb13e90] [c0000000001de89c] .generic_handle_irq+0x4c/0x80
[1887417.520492] [c000003e7fb13f10] [c000000000014944] .__do_irq+0x84/0x1a0
[1887417.520497] [c000003e7fb13f90] [c000000000028580] .call_do_irq+0x14/0x24
[1887417.520502] [c000003e6809f820] [c000000000014aec] .do_IRQ+0x8c/0x100
[1887417.520507] [c000003e6809f8c0] [c000000000002a94] hardware_interrupt_common+0x114/0x180

The kernel panic stack trace is

crash> bt
PID: 0      TASK: c000003e68062bc0  CPU: 144  COMMAND: "swapper/144"
 #0 [c000003e7fb13990] .scsi_dma_unmap at c00000000069eb50
 #1 [c000003e7fb13a00] ._scsih_io_done at d00000001c443bf0 [mpt2sas]
 #2 [c000003e7fb13b20] ._base_interrupt at d00000001c427eb8 [mpt2sas]
 #3 [c000003e7fb13ca0] .__handle_irq_event_percpu at c0000000001df654
 #4 [c000003e7fb13d80] .handle_irq_event at c0000000001df980
 #5 [c000003e7fb13e10] .handle_fasteoi_irq at c0000000001e4e2c
 #6 [c000003e7fb13e90] .generic_handle_irq at c0000000001de89c
 #7 [c000003e7fb13f10] .__do_irq at c000000000014944
 #8 [c000003e7fb13f90] .call_do_irq at c000000000028580
 #9 [c000003e6809f820] .do_IRQ at c000000000014aec
#10 [c000003e6809f8c0] hardware_interrupt_common at c000000000002a94
 Hardware Interrupt [501] exception frame:
 R0:  c0000000007cfb40    R1:  c000003e6809fbb0    R2:  c00000000151b500   
 R3:  0000000000000001    R4:  c0000000014122c0    R5:  0000000000000000   
 R6:  0038940dc1000000    R7:  0000000000000018    R8:  0000000000000808   
 R9:  000389f5ec7b9415    R10: c00000000191b500    R11: 000389f5ec1c7856   
 R12: 0000000044000088    R13: c000000007b51000   
 NIP: c0000000007d212c    MSR: 8000000100009032    OR3: c0000000007d2130
 CTR: c0000000007d2070    LR:  c0000000007d20b0    XER: 0000000000000000
 CCR: 0000000024000084    MQ:  0000000000000001    DAR: c000003e6809c000
 DSISR: c000000000013ee0     Syscall Result: 0000000000000000
 [NIP  : .snooze_loop+0xbc]
 [LR   : .snooze_loop+0x40]
#11 [c000003e6809fbb0] .snooze_loop at c0000000007d212c  (unreliable)
#12 [c000003e6809fc50] .cpuidle_idle_call at c0000000007cfb40
#13 [c000003e6809fd30] .pseries_lpar_idle at c00000000009df30
#14 [c000003e6809fda0] .arch_cpu_idle at c00000000001bb34
#15 [c000003e6809fe20] .cpu_startup_entry at c0000000001701e0
#16 [c000003e6809fed0] .start_secondary at c00000000004fbf0
#17 [c000003e6809ff90] start_secondary_prolog at c000000000009b6c

Environment

  • Red Hat Enterprise Linux 7
  • Kernel 3.10.0-862.14.4.el7.ppc64
  • IBM PowerPC architecture
  • mpt2sas driver controlled SCSI
  • possible involvement of Mellanox Infiniband mlx5_ib driver

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content