Kernel hangs on resetting the mlx4 infiniband interfaces.
Issue
- Server hangs when the Infiniband switch is rebooted with following warning.
[20152.315827] WARNING: CPU: 2 PID: 17762 at drivers/infiniband/core/verbs.c:303 ib_dealloc_pd+0x6c/0xb0 [ib_core]
..
[20152.315844] CPU: 2 PID: 17762 Comm: kworker/u32:1 Not tainted 3.10.0-693.33.1.rt56.621.el6rt.x86_64 #1
[20152.315844] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/25/2017
[20152.315852] Workqueue: mlx4 mlx4_sense_port [mlx4_core]
[20152.315853] Call Trace:
[20152.315859] [<ffffffff81653af7>] dump_stack+0x19/0x22
[20152.315862] [<ffffffff8106f679>] __warn+0x109/0x130
[20152.315863] [<ffffffff8106f6bd>] warn_slowpath_null+0x1d/0x20
[20152.315867] [<ffffffffa0307a3c>] ib_dealloc_pd+0x6c/0xb0 [ib_core]
[20152.315869] [<ffffffffa059999e>] ib_uverbs_cleanup_ucontext+0x55e/0x5c0 [ib_uverbs]
[20152.315871] [<ffffffffa0599bf5>] ib_uverbs_free_hw_resources+0x105/0x250 [ib_uverbs]
[20152.315873] [<ffffffffa0599dc4>] ib_uverbs_remove_one+0x84/0xe0 [ib_uverbs]
[20152.315878] [<ffffffffa030d7b7>] ib_unregister_device+0xc7/0x170 [ib_core]
[20152.315882] [<ffffffffa0342ad6>] mlx4_ib_remove+0x76/0x230 [mlx4_ib]
[20152.315888] [<ffffffffa0224008>] mlx4_remove_device+0x78/0x90 [mlx4_core]
[20152.315893] [<ffffffffa02240ab>] mlx4_unregister_device+0x8b/0x100 [mlx4_core]
[20152.315898] [<ffffffffa02272a0>] mlx4_change_port_types+0x70/0x160 [mlx4_core]
[20152.315903] [<ffffffffa02381d3>] mlx4_sense_port+0xa3/0xd0 [mlx4_core]
[20152.315906] [<ffffffff81093d57>] process_one_work+0x197/0x520
[20152.315908] [<ffffffff81095367>] worker_thread+0x177/0x400
[20152.315909] [<ffffffff810951f0>] ? manage_workers+0x130/0x130
[20152.315910] [<ffffffff810951f0>] ? manage_workers+0x130/0x130
[20152.315912] [<ffffffff8109b8d0>] kthread+0xd0/0xe0
[20152.315913] [<ffffffff8109b800>] ? kthreadd+0x1d0/0x1d0
[20152.315916] [<ffffffff8166110d>] ret_from_fork+0x5d/0xb0
[20152.315917] [<ffffffff8109b800>] ? kthreadd+0x1d0/0x1d0
<..>
[20154.532304] BUG: unable to handle kernel paging request at 00000000ffffffff
[20154.532308] IP: [<ffffffff811b9c06>] __kmalloc_node_track_caller+0xf6/0x2e0
[20154.532308] PGD 0
[20154.532309] Oops: 0000 [#1] PREEMPT SMP
[20154.532325] Modules linked in: bonding autofs4 ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm configfs ib_cm iw_cm ext2 vfat fat dm_round_robin dm_multipath iTCO_wdt iTCO_vendor_support pcspkr joydev i2c_i801 tg3 lpc_ich hpilo hpwdt ioatdma ixgbe dca mdio ipmi_si ipmi_msghandler sg acpi_power_meter hwmon ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif crct10dif_common ahci libahci mlx4_ib ib_core ipv6 mlx4_en ptp pps_core mlx4_core devlink qla2xxx scsi_transport_fc hpsa scsi_transport_sas wmi mgag200 ttm drm_kms_helper drm fb_sys_fops sysimgblt sysfillrect syscopyarea i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[20154.532327] CPU: 9 PID: 16349 Comm: udevd Tainted: G W ------------ 3.10.0-693.33.1.rt56.621.el6rt.x86_64 #1
[20154.532328] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/25/2017
[20154.532328] task: ffff881f686619e0 ti: ffff881f51ab8000 task.ti: ffff881f51ab8000
[20154.532330] RIP: 0010:[<ffffffff811b9c06>] [<ffffffff811b9c06>] __kmalloc_node_track_caller+0xf6/0x2e0
[20154.532331] RSP: 0018:ffff881f51abb8d8 EFLAGS: 00010246
[20154.532331] RAX: 0000000000000000 RBX: ffff881fffd5c360 RCX: 00000000027c5509
[20154.532332] RDX: 00000000027c5409 RSI: 0000000000000000 RDI: 00000000ffffffff
[20154.532332] RBP: ffff881f51abb948 R08: 000000000001c360 R09: ffff881fff803500
[20154.532332] R10: ffff881fff803500 R11: ffff881f51ab8010 R12: 0000000000000240
[20154.532333] R13: 00000000ffffffff R14: 00000000000106d0 R15: ffff881f51ab8000
[20154.532333] FS: 00007ff0febd67a0(0000) GS:ffff881fffd40000(0000) knlGS:0000000000000000
[20154.532334] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20154.532334] CR2: 00000000ffffffff CR3: 0000001f581ac000 CR4: 00000000001607e0
[20154.532335] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[20154.532335] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[20154.532336] Call Trace:
[20154.532340] [<ffffffff815532c4>] ? __alloc_skb+0x94/0x1e0
[20154.532341] [<ffffffff81552c6b>] __kmalloc_reserve+0x3b/0xa0
[20154.532343] [<ffffffff815532c4>] __alloc_skb+0x94/0x1e0
[20154.532344] [<ffffffff81553834>] alloc_skb_with_frags+0x74/0x1d0
[20154.532346] [<ffffffff8154d5ea>] sock_alloc_send_pskb+0x1da/0x280
[20154.532348] [<ffffffff816310c9>] unix_dgram_sendmsg+0x1a9/0x940
[20154.532349] [<ffffffff8154a3ec>] sock_sendmsg+0xac/0xe0
[20154.532352] [<ffffffff81547749>] ? copy_from_user+0x9/0x10
[20154.532353] [<ffffffff81547a1e>] ? move_addr_to_kernel+0x4e/0x90
[20154.532355] [<ffffffff81559289>] ? verify_iovec+0x69/0xd0
[20154.532355] [<ffffffff8154b1b7>] ___sys_sendmsg+0x3f7/0x420
[20154.532359] [<ffffffff810adb93>] ? migrate_disable+0xc3/0x110
[20154.532360] [<ffffffff811bc154>] ? kmem_cache_alloc+0x114/0x250
[20154.532363] [<ffffffff8127d5ac>] ? security_file_alloc+0x1c/0x20
[20154.532364] [<ffffffff810ad993>] ? migrate_enable+0xf3/0x230
[20154.532365] [<ffffffff8154b3e9>] __sys_sendmsg+0x49/0x90
[20154.532366] [<ffffffff8154b449>] SyS_sendmsg+0x19/0x20
[20154.532369] [<ffffffff816612d8>] system_call_fastpath+0x1c/0x21
[20154.532378] Code: 89 4d b8 4c 89 55 b0 e8 c9 fe ff ff 4c 8b 4d b8 49 89 c7 4c 8b 55 b0 eb 34 0f 1f 40 00 49 63 42 20 4d 8b 02 48 8d 8a 00 01 00 00 <48> 8b 1c 07 48 89 f8 65 49 0f c7 08 0f 94 c0 3c 01 0f 85 5a ff
[20154.532379] RIP [<ffffffff811b9c06>] __kmalloc_node_track_caller+0xf6/0x2e0
[20154.532380] RSP <ffff881f51abb8d8>
[20154.532380] CR2: 00000000ffffffff
[20154.540148] ---[ end trace 0000000000000004 ]---
Environment
- Red Hat Enterprise Linux 6
- MRG
- kernel-3.10.0-693.33.1.rt56.621.el6rt.x86_64
- InfiniBand (IB)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.