RHEL6.8: High CPU consumption after updating from RHEL6.7 kernel, many processes contending on sparse_irq_lock spin_lock, many processes reading /proc/stat

Solution Verified - Updated -

Issue

  • High CPU usage after updating the kernel from 2.6.32-573.26.1.el6 to 2.6.32-642.1.1.el6.
    • CPU usage was 0% on 2.6.32-573.26.1 kernel (99.9% idle) and 46% on the 2.6.32-642.1.1 kernel (54.4% idle).
  • Before the upgrade

    # uname -a
    Linux foo.example.com 2.6.32-573.26.1.el6.x86_64 #1 SMP Tue Apr 12 01:47:01 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
    
    # top
    top - 23:01:40 up 8 min,  1 user,  load average: 1.69, 1.07, 0.53
    Tasks: 1650 total,   1 running, 1648 sleeping,   0 stopped,   1 zombie
    Cpu(s):  0.0%us,  0.1%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
    ...
    
  • After the upgrade

    Linux foo.example.com 2.6.32-642.1.1.el6.x86_64 #1 SMP Fri May 6 14:54:05 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
    top - 22:45:15 up 2 days,  5:27,  1 user,  load average: 54.24, 50.71, 50.24
    Tasks: 1684 total,  26 running, 1657 sleeping,   0 stopped,   1 zombie
    Cpu(s):  1.3%us, 44.3%sy,  0.0%ni, 54.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
    ...
    
  • High CPU usage with many processes in the following backtrace in _spin_lock_irqsave from kstat_irq_usr

    crash> bt 53441
    PID: 53441  TASK: ffff8830669fe040  CPU: 75  COMMAND: "Agent Heartbeat"
     #0 [ffff8830b8bc6e90] crash_nmi_callback at ffffffff810366e6
     #1 [ffff8830b8bc6ea0] notifier_call_chain at ffffffff8154dd45
     #2 [ffff8830b8bc6ee0] atomic_notifier_call_chain at ffffffff8154ddaa
     #3 [ffff8830b8bc6ef0] notify_die at ffffffff810aceae
     #4 [ffff8830b8bc6f20] do_nmi at ffffffff8154b9c3
     #5 [ffff8830b8bc6f50] nmi at ffffffff8154b283
        [exception RIP: _spin_lock_irqsave+0x2f]
        RIP: ffffffff8154a97f  RSP: ffff8830669dbc78  RFLAGS: 00200083
        RAX: 0000000000000206  RBX: 00000000000006ad  RCX: 00000000000001fd
        RDX: 0000000000200286  RSI: 0000000000000001  RDI: ffffffff81f17c88
        RBP: ffff8830669dbc78   R8: 00000000fffffffb   R9: 00000000fffffffe
        R10: 0000000000000000  R11: 0000000000000014  R12: 00000000000006ae
        R13: ffff8840631264c0  R14: 000000000006fd3b  R15: 0000000000000001
        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    --- <NMI exception stack> ---
     #6 [ffff8830669dbc78] _spin_lock_irqsave at ffffffff8154a97f
     #7 [ffff8830669dbc80] kstat_irqs_usr at ffffffff810f3894
     #8 [ffff8830669dbca0] show_stat at ffffffff8120f7e4
     #9 [ffff8830669dbe20] seq_read at ffffffff811bfe12
    #10 [ffff8830669dbea0] proc_reg_read at ffffffff81205a1e
    #11 [ffff8830669dbef0] vfs_read at ffffffff8119a585
    #12 [ffff8830669dbf30] sys_read at ffffffff8119a8d1
    #13 [ffff8830669dbf80] system_call_fastpath at ffffffff8100b0d2
    

Environment

  • Red Hat Enterprise Linux 6.7
    • kernels greater than 2.6.32-573*.el6
  • Red Hat Enterprise Linux 6.8
    • Seen on various kernel-2.6.32-642*..el6
  • Many processes reading /proc/stat in parallel
    • seen with performance monitoring tools
    • Example: Interscope CA Application Monitoring
  • Hardware
    • Seen on IBM BladeCenter Hx5 -[7873AC1]-/Node 1, System Card, BIOS -[HIE179AUS-1.79]- 04/23/2013
    • 80 CPUs
    • the 'intr' line of /proc/stat has over 2,000 numbers on it, indicating a very high number of IRQs in the system
    • Hyper-Threading of CPUs can increase the severity of this issue

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content