System hang with lot of blocked tasks and experiencing high load in using onload module
Issue
- System hang with lots of blocked tasks hung for very long period
- Onload module work items causing a hang of worker threads in pool
LOAD AVERAGE: 2816.06, 2815.92, 2815.21
crash> ps -S
RU: 66
IN: 1751
UN: 2815 <------
WA: 1
Blocked tasks hung for over 22 days:
crash> ps -m | grep UN | tail -n 7
[ 1 22:39:33.366] [UN] PID: 53025 TASK: ffff8801e29f9fa0 CPU: 19 COMMAND: "java"
[ 1 22:39:35.748] [UN] PID: 103482 TASK: ffff887b5f6c1fa0 CPU: 27 COMMAND: "java"
[ 1 22:39:46.882] [UN] PID: 2702 TASK: ffff883f7d4e1fa0 CPU: 7 COMMAND: "vnetd"
[ 1 22:40:00.330] [UN] PID: 103997 TASK: ffff880322ff2f70 CPU: 4 COMMAND: "kworker/4:1"
[ 1 22:40:02.696] [UN] PID: 58449 TASK: ffff884179a03f40 CPU: 50 COMMAND: "sshd"
[ 1 22:40:12.046] [UN] PID: 39413 TASK: ffff8843ae710000 CPU: 49 COMMAND: "kworker/49:2"
[ 1 22:40:12.202] [UN] PID: 2463 TASK: ffff887f6e683f40 CPU: 48 COMMAND: "snmpd"
crash> bt 39413
PID: 39413 TASK: ffff8843ae710000 CPU: 49 COMMAND: "kworker/49:2"
#0 [ffff880c422a7178] __schedule at ffffffff816ab0dc
#1 [ffff880c422a7208] schedule at ffffffff816ab6d9
#2 [ffff880c422a7218] schedule_timeout at ffffffff816a90e9
#3 [ffff880c422a72c0] wait_for_completion at ffffffff816aba8d
#4 [ffff880c422a7320] xfs_buf_submit_wait at ffffffffc07d00c6 [xfs]
#5 [ffff880c422a7348] xfs_bwrite at ffffffffc07d04d4 [xfs]
#6 [ffff880c422a7368] xfs_reclaim_inode at ffffffffc07d8bb1 [xfs]
#7 [ffff880c422a73b8] xfs_reclaim_inodes_ag at ffffffffc07d8e47 [xfs]
#8 [ffff880c422a7550] xfs_reclaim_inodes_nr at ffffffffc07d9e33 [xfs]
#9 [ffff880c422a7570] xfs_fs_free_cached_objects at ffffffffc07e9735 [xfs]
#10 [ffff880c422a7580] prune_super at ffffffff81205648
#11 [ffff880c422a75b8] shrink_slab at ffffffff81197133
#12 [ffff880c422a7658] do_try_to_free_pages at ffffffff8119a292
#13 [ffff880c422a76d0] try_to_free_pages at ffffffff8119a4ac
#14 [ffff880c422a7768] __alloc_pages_slowpath at ffffffff816a1c1b
#15 [ffff880c422a7858] __alloc_pages_nodemask at ffffffff8118eaa5
#16 [ffff880c422a7908] kmalloc_large_node at ffffffff816a2cf4
#17 [ffff880c422a7918] __kmalloc_node_track_caller at ffffffff811e41c7
#18 [ffff880c422a7970] __kmalloc_reserve at ffffffff81574851
#19 [ffff880c422a79b0] __alloc_skb at ffffffff815759ad
#20 [ffff880c422a7a00] netlink_alloc_skb at ffffffff815bce7b
#21 [ffff880c422a7a38] netlink_dump at ffffffff815bd0b3
#22 [ffff880c422a7a68] netlink_recvmsg at ffffffff815bd505
#23 [ffff880c422a7af8] sock_recvmsg at ffffffff8156c88f
#24 [ffff880c422a7c60] kernel_recvmsg at ffffffff8156c90a
#25 [ffff880c422a7c80] netlink_read.constprop.23 at ffffffffc052423d [onload_cplane]
#26 [ffff880c422a7d10] read_rtnl_response at ffffffffc052434e [onload_cplane]
#27 [ffff880c422a7d58] cicpos_dump_tables at ffffffffc05247ec [onload_cplane]
#28 [ffff880c422a7df8] cicpos_worker at ffffffffc0524c1e [onload_cplane]
#29 [ffff880c422a7e20] process_one_work at ffffffff810aa3ba
#30 [ffff880c422a7e68] worker_thread at ffffffff810ab086
#31 [ffff880c422a7ec8] kthread at ffffffff810b252f
#32 [ffff880c422a7f50] ret_from_fork at ffffffff816b8798
Environment
- Red Hat Enterprise Linux 7
- Third party module Solar Flare Open Onload version 201606-u1.3 and lower
onload module version - 201606-u1.3
onload_cplane module version - 201606-u1.3
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.