Updating to RHEL6.8 with changes to the Infiniband SRP code in ib_srp causes a system panic during SRP host connection and device discovery
Issue
- After updating to one of the RHEL 6.8 kernels
2.6.32-642.el6to2.6.32-642.3.1.el6, and since the update, as soon as thesrp_daemonis started the system panics. Booting back into the 6.7 kernel everything works as expected.
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<(null)>] (null)
PGD 0
Oops: 0010 [#1] SMP
last sysfs file: /sys/devices/pci0000:80/0000:80:02.0/0000:83:00.0/infiniband_srp/srp-mlx4_0-1/add_target
CPU 19
Modules linked in: ib_srp scsi_transport_srp rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_sa ib_mad ib_core ib_addr nfs fscache auth_rpcgss nfs_acl 8021q garp stp llc autofs4 lockd sunrpc dm_multipath microcode iTCO_wdt iTCO_vendor_support dcdbas ipmi_devintf joydev sb_edac edac_core lpc_ich mfd_core shpchp i40e power_meter acpi_ipmi ipmi_si ipmi_msghandler sg tcp_htcp ext3 jbd mbcache sd_mod crc_t10dif qla2xxx scsi_transport_fc scsi_tgt ipv6 mlx4_en ptp pps_core mlx4_core ahci megaraid_sas wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_transport_srp]
Pid: 0, comm: swapper Not tainted 2.6.32-642.3.1.el6.x86_64 #1 Dell Inc. PowerEdge R730xd/072T6D
RIP: 0010:[<0000000000000000>] [<(null)>] (null)
RSP: 0018:ffff8820f0d23c80 EFLAGS: 00010092
RAX: 0000000000000092 RBX: ffff8840322dd5e0 RCX: 0000000000000003
RDX: ffff8840322dd600 RSI: 0000000000000092 RDI: ffff88403a4962c0
RBP: ffff8820f0d23d08 R08: ffff8840322dd5f0 R09: ffff88205334fe48
R10: 00003ef3ae1ab48a R11: 0000000000000001 R12: ffff88404efc0f00
R13: ffff88402d070000 R14: ffff8820f0d23cb8 R15: ffff88403a4962c0
FS: 0000000000000000(0000) GS:ffff8820f0d20000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001a8d000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff88205334c000, task ffff882053347520)
Stack:
ffffffffa0690c3c ffff8820f0d23cc8 ffff884031f9a000 ffff8820f0d303a0
ffff88404ffc24d8 ffffe8e000000000 ffff88404fd4d8c8 0000000000000092
ffff88404fd4d928 00000000f0d23d08 0000000100000001 0000000000000000
Call Trace:
<IRQ>
[<ffffffffa0690c3c>] ? srp_handle_recv+0x22c/0x4e0 [ib_srp]
[<ffffffffa0690f32>] srp_recv_completion+0x42/0x80 [ib_srp]
[<ffffffffa06023e7>] mlx4_ib_cq_comp+0x17/0x20 [mlx4_ib]
[<ffffffffa00812b2>] mlx4_cq_completion+0x42/0x90 [mlx4_core]
[<ffffffffa0082898>] mlx4_eq_int+0x578/0xd60 [mlx4_core]
[<ffffffff8103813d>] ? lapic_next_event+0x1d/0x30
[<ffffffff81014b19>] ? read_tsc+0x9/0x10
[<ffffffffa008136d>] ? mlx4_cq_tasklet_cb+0x6d/0x130 [mlx4_core]
[<ffffffff81085755>] ? tasklet_action+0xe5/0x120
[<ffffffffa0083094>] mlx4_msi_x_interrupt+0x14/0x20 [mlx4_core]
[<ffffffff810f36a0>] handle_IRQ_event+0x60/0x170
[<ffffffff81014b19>] ? read_tsc+0x9/0x10
[<ffffffff810f600e>] handle_edge_irq+0xde/0x180
[<ffffffff8100fd29>] handle_irq+0x49/0xa0
[<ffffffff815515cc>] do_IRQ+0x6c/0xf0
[<ffffffff8100ba53>] ret_from_intr+0x0/0x11
<EOI>
[<ffffffff812faa7e>] ? intel_idle+0xfe/0x1b0
[<ffffffff812faa61>] ? intel_idle+0xe1/0x1b0
[<ffffffff814406ca>] cpuidle_idle_call+0x7a/0xe0
[<ffffffff81009fe6>] cpu_idle+0xb6/0x110
[<ffffffff815408f9>] start_secondary+0x2c0/0x316
Code: Bad RIP value.
RIP [<(null)>] (null)
RSP <ffff8820f0d23c80>
CR2: 0000000000000000
Environment
- Red Hat Enterprise Linux 6.8
- Kernel Version
2.6.32-642.el6to2.6.32-642.3.1.el6
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.