Updating to RHEL6.8 with changes to the Infiniband SRP code in ib_srp causes a system panic during SRP host connection and device discovery

Solution Verified - Updated -

Issue

  • After updating to one of the RHEL 6.8 kernels 2.6.32-642.el6 to 2.6.32-642.3.1.el6, and since the update, as soon as the srp_daemon is started the system panics. Booting back into the 6.7 kernel everything works as expected.
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<(null)>] (null)
PGD 0 
Oops: 0010 [#1] SMP 
last sysfs file: /sys/devices/pci0000:80/0000:80:02.0/0000:83:00.0/infiniband_srp/srp-mlx4_0-1/add_target
CPU 19 
Modules linked in: ib_srp scsi_transport_srp rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_sa ib_mad ib_core ib_addr nfs fscache auth_rpcgss nfs_acl 8021q garp stp llc autofs4 lockd sunrpc dm_multipath microcode iTCO_wdt iTCO_vendor_support dcdbas ipmi_devintf joydev sb_edac edac_core lpc_ich mfd_core shpchp i40e power_meter acpi_ipmi ipmi_si ipmi_msghandler sg tcp_htcp ext3 jbd mbcache sd_mod crc_t10dif qla2xxx scsi_transport_fc scsi_tgt ipv6 mlx4_en ptp pps_core mlx4_core ahci megaraid_sas wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_transport_srp]

Pid: 0, comm: swapper Not tainted 2.6.32-642.3.1.el6.x86_64 #1 Dell Inc. PowerEdge R730xd/072T6D
RIP: 0010:[<0000000000000000>]  [<(null)>] (null)
RSP: 0018:ffff8820f0d23c80  EFLAGS: 00010092
RAX: 0000000000000092 RBX: ffff8840322dd5e0 RCX: 0000000000000003
RDX: ffff8840322dd600 RSI: 0000000000000092 RDI: ffff88403a4962c0
RBP: ffff8820f0d23d08 R08: ffff8840322dd5f0 R09: ffff88205334fe48
R10: 00003ef3ae1ab48a R11: 0000000000000001 R12: ffff88404efc0f00
R13: ffff88402d070000 R14: ffff8820f0d23cb8 R15: ffff88403a4962c0
FS:  0000000000000000(0000) GS:ffff8820f0d20000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001a8d000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff88205334c000, task ffff882053347520)
Stack:
 ffffffffa0690c3c ffff8820f0d23cc8 ffff884031f9a000 ffff8820f0d303a0
 ffff88404ffc24d8 ffffe8e000000000 ffff88404fd4d8c8 0000000000000092
 ffff88404fd4d928 00000000f0d23d08 0000000100000001 0000000000000000
Call Trace:
 <IRQ> 
 [<ffffffffa0690c3c>] ? srp_handle_recv+0x22c/0x4e0 [ib_srp]
 [<ffffffffa0690f32>] srp_recv_completion+0x42/0x80 [ib_srp]
 [<ffffffffa06023e7>] mlx4_ib_cq_comp+0x17/0x20 [mlx4_ib]
 [<ffffffffa00812b2>] mlx4_cq_completion+0x42/0x90 [mlx4_core]
 [<ffffffffa0082898>] mlx4_eq_int+0x578/0xd60 [mlx4_core]
 [<ffffffff8103813d>] ? lapic_next_event+0x1d/0x30
 [<ffffffff81014b19>] ? read_tsc+0x9/0x10
 [<ffffffffa008136d>] ? mlx4_cq_tasklet_cb+0x6d/0x130 [mlx4_core]
 [<ffffffff81085755>] ? tasklet_action+0xe5/0x120
 [<ffffffffa0083094>] mlx4_msi_x_interrupt+0x14/0x20 [mlx4_core]
 [<ffffffff810f36a0>] handle_IRQ_event+0x60/0x170
 [<ffffffff81014b19>] ? read_tsc+0x9/0x10
 [<ffffffff810f600e>] handle_edge_irq+0xde/0x180
 [<ffffffff8100fd29>] handle_irq+0x49/0xa0
 [<ffffffff815515cc>] do_IRQ+0x6c/0xf0
 [<ffffffff8100ba53>] ret_from_intr+0x0/0x11
 <EOI> 
 [<ffffffff812faa7e>] ? intel_idle+0xfe/0x1b0
 [<ffffffff812faa61>] ? intel_idle+0xe1/0x1b0
 [<ffffffff814406ca>] cpuidle_idle_call+0x7a/0xe0
 [<ffffffff81009fe6>] cpu_idle+0xb6/0x110
 [<ffffffff815408f9>] start_secondary+0x2c0/0x316
Code:  Bad RIP value.
RIP  [<(null)>] (null)
 RSP <ffff8820f0d23c80>
CR2: 0000000000000000

Environment

  • Red Hat Enterprise Linux 6.8
  • Kernel Version 2.6.32-642.el6 to 2.6.32-642.3.1.el6

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content