RHEL 7.4 crash in write_msi_msg() during bootup following fresh install
Issue
System crashes with kernel messages
[ 28.493144] Unable to handle kernel paging request for data at address 0x00000030
[ 28.493959] Faulting instruction address: 0xc00000000057f830
[ 28.494176] Oops: Kernel access of bad area, sig: 11 [#1]
[ 28.494416] SMP NR_CPUS=2048 NUMA PowerNV
[ 28.494588] Modules linked in: bnx2x(+) tg3 ptp pps_core nvme mdio nvme_core sunrpc xts lrw gf128mul dm_crypt dm_round_robin dm_multipath dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_zero dm_mod linear raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq libcrc32c async_xor xor async_tx raid1 raid0 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi squashfs cramfs
[ 28.496343] CPU: 12 PID: 2513 Comm: kworker/u769:3 Not tainted 3.10.0-693.el7.ppc64le #1
[ 28.496610] Workqueue: nvme nvme_reset_work [nvme]
[ 28.496805] task: c00001fd27b85580 ti: c00001fd27c10000 task.ti: c00001fd27c10000
[ 28.497071] NIP: c00000000057f830 LR: c00000000057f820 CTR: 0000000000000001
[ 28.497332] REGS: c00001fd27c13620 TRAP: 0300 Not tainted (3.10.0-693.el7.ppc64le)
[ 28.497627] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24088022 XER: 20000000
[ 28.497996] CFAR: c000000000009368 DAR: 0000000000000030 DSISR: 40000000 SOFTE: 1
GPR00: c00000000057f820 c00001fd27c138a0 c00000000120ee00 0000000000000000
GPR04: 00000000000001f0 0000000000000000 0000000000000001 c000000001269ce0
GPR08: 0000000000000002 00000000fffffffa 00000000000001a0 9000000100001003
GPR12: c00000000008256c c00000000fb86c00 c00000000011ee68 c00001fd27c0fb40
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: c000000001591000 d000080083b23000 0000000000000008 c00001fd32ecd040
GPR24: 0000000000000000 c000000005a50050 c00001fffe296880 c00001fffe296000
GPR28: 0000000000000008 00000000000001f0 c00001fd27c13960 0000000000000000
[ 28.501289] NIP [c00000000057f830] write_msi_msg+0x40/0x220
[ 28.501482] LR [c00000000057f820] write_msi_msg+0x30/0x220
[ 28.501627] Call Trace:
[ 28.501755] [c00001fd27c138a0] [c00000000057f820] write_msi_msg+0x30/0x220 (unreliable)
[ 28.501990] [c00001fd27c13900] [c00000000008c0f8] pnv_setup_msi_irqs+0x158/0x290
[ 28.502328] [c00001fd27c139b0] [c000000000058880] arch_setup_msi_irqs+0x50/0xd0
[ 28.502624] [c00001fd27c13a20] [c0000000005818dc] pci_enable_msix_range+0x34c/0x660
[ 28.502880] [c00001fd27c13af0] [d000000313143dc8] nvme_reset_work+0xb18/0x1374 [nvme]
[ 28.503150] [c00001fd27c13c40] [c00000000011294c] process_one_work+0x1dc/0x680
[ 28.503381] [c00001fd27c13ce0] [c000000000112f90] worker_thread+0x1a0/0x520
[ 28.503618] [c00001fd27c13d80] [c00000000011ef4c] kthread+0xec/0x100
[ 28.503880] [c00001fd27c13e30] [c00000000000a4b8] ret_from_kernel_thread+0x5c/0xa4
[ 28.504112] Instruction dump:
[ 28.504197] fb61ffd8 7c9e2378 fb81ffe0 fba1ffe8 fbe1fff8 f8010010 f821ffa1 4bc608fd
[ 28.504474] 60000000 2c230000 418201d8 ebe30038 <ebbf0030> 813d0078 2f890000 409e0054
Environment
- Red Hat Enterprise Linux 7
- PowerPC64-LE architecture
- kernel version 3.10.0-693.el7.ppc64le (detected, but seen on other RHEL 7 kernel versions, too)
- NVMe disk drives (nvme driver exposes the bug in MSI)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.