System hang with lot of blocked tasks and experiencing high load in using onload module

Solution Unverified - Updated -

Issue

  • System hang with lots of blocked tasks hung for very long period
  • Onload module work items causing a hang of worker threads in pool
      LOAD AVERAGE: 2816.06, 2815.92, 2815.21
      crash> ps -S
        RU: 66
        IN: 1751
        UN: 2815 <------
        WA: 1 

Blocked tasks hung for over 22 days: 
crash> ps -m | grep UN | tail -n 7
[ 1 22:39:33.366] [UN]  PID: 53025   TASK: ffff8801e29f9fa0  CPU: 19  COMMAND: "java" 
[ 1 22:39:35.748] [UN]  PID: 103482  TASK: ffff887b5f6c1fa0  CPU: 27  COMMAND: "java" 
[ 1 22:39:46.882] [UN]  PID: 2702    TASK: ffff883f7d4e1fa0  CPU: 7   COMMAND: "vnetd"
[ 1 22:40:00.330] [UN]  PID: 103997  TASK: ffff880322ff2f70  CPU: 4   COMMAND: "kworker/4:1"
[ 1 22:40:02.696] [UN]  PID: 58449   TASK: ffff884179a03f40  CPU: 50  COMMAND: "sshd"
[ 1 22:40:12.046] [UN]  PID: 39413   TASK: ffff8843ae710000  CPU: 49  COMMAND: "kworker/49:2" 
[ 1 22:40:12.202] [UN]  PID: 2463    TASK: ffff887f6e683f40  CPU: 48  COMMAND: "snmpd"

crash> bt 39413
PID: 39413  TASK: ffff8843ae710000  CPU: 49  COMMAND: "kworker/49:2"
 #0 [ffff880c422a7178] __schedule at ffffffff816ab0dc
 #1 [ffff880c422a7208] schedule at ffffffff816ab6d9
 #2 [ffff880c422a7218] schedule_timeout at ffffffff816a90e9
 #3 [ffff880c422a72c0] wait_for_completion at ffffffff816aba8d
 #4 [ffff880c422a7320] xfs_buf_submit_wait at ffffffffc07d00c6 [xfs]
 #5 [ffff880c422a7348] xfs_bwrite at ffffffffc07d04d4 [xfs]
 #6 [ffff880c422a7368] xfs_reclaim_inode at ffffffffc07d8bb1 [xfs]
 #7 [ffff880c422a73b8] xfs_reclaim_inodes_ag at ffffffffc07d8e47 [xfs]
 #8 [ffff880c422a7550] xfs_reclaim_inodes_nr at ffffffffc07d9e33 [xfs]
 #9 [ffff880c422a7570] xfs_fs_free_cached_objects at ffffffffc07e9735 [xfs]
#10 [ffff880c422a7580] prune_super at ffffffff81205648
#11 [ffff880c422a75b8] shrink_slab at ffffffff81197133
#12 [ffff880c422a7658] do_try_to_free_pages at ffffffff8119a292
#13 [ffff880c422a76d0] try_to_free_pages at ffffffff8119a4ac
#14 [ffff880c422a7768] __alloc_pages_slowpath at ffffffff816a1c1b
#15 [ffff880c422a7858] __alloc_pages_nodemask at ffffffff8118eaa5
#16 [ffff880c422a7908] kmalloc_large_node at ffffffff816a2cf4
#17 [ffff880c422a7918] __kmalloc_node_track_caller at ffffffff811e41c7
#18 [ffff880c422a7970] __kmalloc_reserve at ffffffff81574851
#19 [ffff880c422a79b0] __alloc_skb at ffffffff815759ad
#20 [ffff880c422a7a00] netlink_alloc_skb at ffffffff815bce7b
#21 [ffff880c422a7a38] netlink_dump at ffffffff815bd0b3
#22 [ffff880c422a7a68] netlink_recvmsg at ffffffff815bd505
#23 [ffff880c422a7af8] sock_recvmsg at ffffffff8156c88f
#24 [ffff880c422a7c60] kernel_recvmsg at ffffffff8156c90a
#25 [ffff880c422a7c80] netlink_read.constprop.23 at ffffffffc052423d [onload_cplane]
#26 [ffff880c422a7d10] read_rtnl_response at ffffffffc052434e [onload_cplane]
#27 [ffff880c422a7d58] cicpos_dump_tables at ffffffffc05247ec [onload_cplane]
#28 [ffff880c422a7df8] cicpos_worker at ffffffffc0524c1e [onload_cplane]  
#29 [ffff880c422a7e20] process_one_work at ffffffff810aa3ba
#30 [ffff880c422a7e68] worker_thread at ffffffff810ab086
#31 [ffff880c422a7ec8] kthread at ffffffff810b252f
#32 [ffff880c422a7f50] ret_from_fork at ffffffff816b8798

Environment

  • Red Hat Enterprise Linux 7
  • Third party module Solar Flare Open Onload version 201606-u1.3 and lower
    onload module version - 201606-u1.3
    onload_cplane module version - 201606-u1.3

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.