Kernel hangs on resetting the mlx4 infiniband interfaces.

Solution In Progress - Updated -

Issue

  • Server hangs when the Infiniband switch is rebooted with following warning.
[20152.315827] WARNING: CPU: 2 PID: 17762 at drivers/infiniband/core/verbs.c:303 ib_dealloc_pd+0x6c/0xb0 [ib_core]
..
[20152.315844] CPU: 2 PID: 17762 Comm: kworker/u32:1 Not tainted 3.10.0-693.33.1.rt56.621.el6rt.x86_64 #1
[20152.315844] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/25/2017
[20152.315852] Workqueue: mlx4 mlx4_sense_port [mlx4_core]
[20152.315853] Call Trace:
[20152.315859]  [<ffffffff81653af7>] dump_stack+0x19/0x22
[20152.315862]  [<ffffffff8106f679>] __warn+0x109/0x130
[20152.315863]  [<ffffffff8106f6bd>] warn_slowpath_null+0x1d/0x20
[20152.315867]  [<ffffffffa0307a3c>] ib_dealloc_pd+0x6c/0xb0 [ib_core]
[20152.315869]  [<ffffffffa059999e>] ib_uverbs_cleanup_ucontext+0x55e/0x5c0 [ib_uverbs]
[20152.315871]  [<ffffffffa0599bf5>] ib_uverbs_free_hw_resources+0x105/0x250 [ib_uverbs]
[20152.315873]  [<ffffffffa0599dc4>] ib_uverbs_remove_one+0x84/0xe0 [ib_uverbs]
[20152.315878]  [<ffffffffa030d7b7>] ib_unregister_device+0xc7/0x170 [ib_core]
[20152.315882]  [<ffffffffa0342ad6>] mlx4_ib_remove+0x76/0x230 [mlx4_ib]
[20152.315888]  [<ffffffffa0224008>] mlx4_remove_device+0x78/0x90 [mlx4_core]
[20152.315893]  [<ffffffffa02240ab>] mlx4_unregister_device+0x8b/0x100 [mlx4_core]
[20152.315898]  [<ffffffffa02272a0>] mlx4_change_port_types+0x70/0x160 [mlx4_core]
[20152.315903]  [<ffffffffa02381d3>] mlx4_sense_port+0xa3/0xd0 [mlx4_core]
[20152.315906]  [<ffffffff81093d57>] process_one_work+0x197/0x520
[20152.315908]  [<ffffffff81095367>] worker_thread+0x177/0x400
[20152.315909]  [<ffffffff810951f0>] ? manage_workers+0x130/0x130
[20152.315910]  [<ffffffff810951f0>] ? manage_workers+0x130/0x130
[20152.315912]  [<ffffffff8109b8d0>] kthread+0xd0/0xe0
[20152.315913]  [<ffffffff8109b800>] ? kthreadd+0x1d0/0x1d0
[20152.315916]  [<ffffffff8166110d>] ret_from_fork+0x5d/0xb0
[20152.315917]  [<ffffffff8109b800>] ? kthreadd+0x1d0/0x1d0
<..>
[20154.532304] BUG: unable to handle kernel paging request at 00000000ffffffff
[20154.532308] IP: [<ffffffff811b9c06>] __kmalloc_node_track_caller+0xf6/0x2e0
[20154.532308] PGD 0 
[20154.532309] Oops: 0000 [#1] PREEMPT SMP 
[20154.532325] Modules linked in: bonding autofs4 ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm configfs ib_cm iw_cm ext2 vfat fat dm_round_robin dm_multipath iTCO_wdt iTCO_vendor_support pcspkr joydev i2c_i801 tg3 lpc_ich hpilo hpwdt ioatdma ixgbe dca mdio ipmi_si ipmi_msghandler sg acpi_power_meter hwmon ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif crct10dif_common ahci libahci mlx4_ib ib_core ipv6 mlx4_en ptp pps_core mlx4_core devlink qla2xxx scsi_transport_fc hpsa scsi_transport_sas wmi mgag200 ttm drm_kms_helper drm fb_sys_fops sysimgblt sysfillrect syscopyarea i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[20154.532327] CPU: 9 PID: 16349 Comm: udevd Tainted: G        W      ------------   3.10.0-693.33.1.rt56.621.el6rt.x86_64 #1
[20154.532328] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/25/2017
[20154.532328] task: ffff881f686619e0 ti: ffff881f51ab8000 task.ti: ffff881f51ab8000
[20154.532330] RIP: 0010:[<ffffffff811b9c06>]  [<ffffffff811b9c06>] __kmalloc_node_track_caller+0xf6/0x2e0
[20154.532331] RSP: 0018:ffff881f51abb8d8  EFLAGS: 00010246
[20154.532331] RAX: 0000000000000000 RBX: ffff881fffd5c360 RCX: 00000000027c5509
[20154.532332] RDX: 00000000027c5409 RSI: 0000000000000000 RDI: 00000000ffffffff
[20154.532332] RBP: ffff881f51abb948 R08: 000000000001c360 R09: ffff881fff803500
[20154.532332] R10: ffff881fff803500 R11: ffff881f51ab8010 R12: 0000000000000240
[20154.532333] R13: 00000000ffffffff R14: 00000000000106d0 R15: ffff881f51ab8000
[20154.532333] FS:  00007ff0febd67a0(0000) GS:ffff881fffd40000(0000) knlGS:0000000000000000
[20154.532334] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20154.532334] CR2: 00000000ffffffff CR3: 0000001f581ac000 CR4: 00000000001607e0
[20154.532335] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[20154.532335] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[20154.532336] Call Trace:
[20154.532340]  [<ffffffff815532c4>] ? __alloc_skb+0x94/0x1e0
[20154.532341]  [<ffffffff81552c6b>] __kmalloc_reserve+0x3b/0xa0
[20154.532343]  [<ffffffff815532c4>] __alloc_skb+0x94/0x1e0
[20154.532344]  [<ffffffff81553834>] alloc_skb_with_frags+0x74/0x1d0
[20154.532346]  [<ffffffff8154d5ea>] sock_alloc_send_pskb+0x1da/0x280
[20154.532348]  [<ffffffff816310c9>] unix_dgram_sendmsg+0x1a9/0x940
[20154.532349]  [<ffffffff8154a3ec>] sock_sendmsg+0xac/0xe0
[20154.532352]  [<ffffffff81547749>] ? copy_from_user+0x9/0x10
[20154.532353]  [<ffffffff81547a1e>] ? move_addr_to_kernel+0x4e/0x90
[20154.532355]  [<ffffffff81559289>] ? verify_iovec+0x69/0xd0
[20154.532355]  [<ffffffff8154b1b7>] ___sys_sendmsg+0x3f7/0x420
[20154.532359]  [<ffffffff810adb93>] ? migrate_disable+0xc3/0x110
[20154.532360]  [<ffffffff811bc154>] ? kmem_cache_alloc+0x114/0x250
[20154.532363]  [<ffffffff8127d5ac>] ? security_file_alloc+0x1c/0x20
[20154.532364]  [<ffffffff810ad993>] ? migrate_enable+0xf3/0x230
[20154.532365]  [<ffffffff8154b3e9>] __sys_sendmsg+0x49/0x90
[20154.532366]  [<ffffffff8154b449>] SyS_sendmsg+0x19/0x20
[20154.532369]  [<ffffffff816612d8>] system_call_fastpath+0x1c/0x21
[20154.532378] Code: 89 4d b8 4c 89 55 b0 e8 c9 fe ff ff 4c 8b 4d b8 49 89 c7 4c 8b 55 b0 eb 34 0f 1f 40 00 49 63 42 20 4d 8b 02 48 8d 8a 00 01 00 00 <48> 8b 1c 07 48 89 f8 65 49 0f c7 08 0f 94 c0 3c 01 0f 85 5a ff 
[20154.532379] RIP  [<ffffffff811b9c06>] __kmalloc_node_track_caller+0xf6/0x2e0
[20154.532380]  RSP <ffff881f51abb8d8>
[20154.532380] CR2: 00000000ffffffff
[20154.540148] ---[ end trace 0000000000000004 ]---

Environment

  • Red Hat Enterprise Linux 6
  • MRG
  • kernel-3.10.0-693.33.1.rt56.621.el6rt.x86_64
  • InfiniBand (IB)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content