RHEL6.8: High CPU consumption after updating from RHEL6.7 kernel, many processes contending on sparse_irq_lock spin_lock, many processes reading /proc/stat
Issue
- High CPU usage after updating the kernel from 2.6.32-573.26.1.el6 to 2.6.32-642.1.1.el6.
- CPU usage was 0% on 2.6.32-573.26.1 kernel (99.9% idle) and 46% on the 2.6.32-642.1.1 kernel (54.4% idle).
-
Before the upgrade
# uname -a Linux foo.example.com 2.6.32-573.26.1.el6.x86_64 #1 SMP Tue Apr 12 01:47:01 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux # top top - 23:01:40 up 8 min, 1 user, load average: 1.69, 1.07, 0.53 Tasks: 1650 total, 1 running, 1648 sleeping, 0 stopped, 1 zombie Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st ... -
After the upgrade
Linux foo.example.com 2.6.32-642.1.1.el6.x86_64 #1 SMP Fri May 6 14:54:05 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux top - 22:45:15 up 2 days, 5:27, 1 user, load average: 54.24, 50.71, 50.24 Tasks: 1684 total, 26 running, 1657 sleeping, 0 stopped, 1 zombie Cpu(s): 1.3%us, 44.3%sy, 0.0%ni, 54.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st ... -
High CPU usage with many processes in the following backtrace in
_spin_lock_irqsavefromkstat_irq_usrcrash> bt 53441 PID: 53441 TASK: ffff8830669fe040 CPU: 75 COMMAND: "Agent Heartbeat" #0 [ffff8830b8bc6e90] crash_nmi_callback at ffffffff810366e6 #1 [ffff8830b8bc6ea0] notifier_call_chain at ffffffff8154dd45 #2 [ffff8830b8bc6ee0] atomic_notifier_call_chain at ffffffff8154ddaa #3 [ffff8830b8bc6ef0] notify_die at ffffffff810aceae #4 [ffff8830b8bc6f20] do_nmi at ffffffff8154b9c3 #5 [ffff8830b8bc6f50] nmi at ffffffff8154b283 [exception RIP: _spin_lock_irqsave+0x2f] RIP: ffffffff8154a97f RSP: ffff8830669dbc78 RFLAGS: 00200083 RAX: 0000000000000206 RBX: 00000000000006ad RCX: 00000000000001fd RDX: 0000000000200286 RSI: 0000000000000001 RDI: ffffffff81f17c88 RBP: ffff8830669dbc78 R8: 00000000fffffffb R9: 00000000fffffffe R10: 0000000000000000 R11: 0000000000000014 R12: 00000000000006ae R13: ffff8840631264c0 R14: 000000000006fd3b R15: 0000000000000001 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #6 [ffff8830669dbc78] _spin_lock_irqsave at ffffffff8154a97f #7 [ffff8830669dbc80] kstat_irqs_usr at ffffffff810f3894 #8 [ffff8830669dbca0] show_stat at ffffffff8120f7e4 #9 [ffff8830669dbe20] seq_read at ffffffff811bfe12 #10 [ffff8830669dbea0] proc_reg_read at ffffffff81205a1e #11 [ffff8830669dbef0] vfs_read at ffffffff8119a585 #12 [ffff8830669dbf30] sys_read at ffffffff8119a8d1 #13 [ffff8830669dbf80] system_call_fastpath at ffffffff8100b0d2
Environment
- Red Hat Enterprise Linux 6.7
- kernels greater than 2.6.32-573*.el6
- Red Hat Enterprise Linux 6.8
- Seen on various kernel-2.6.32-642*..el6
- Many processes reading /proc/stat in parallel
- seen with performance monitoring tools
- Example: Interscope CA Application Monitoring
- Hardware
- Seen on IBM BladeCenter Hx5 -[7873AC1]-/Node 1, System Card, BIOS -[HIE179AUS-1.79]- 04/23/2013
- 80 CPUs
- the 'intr' line of /proc/stat has over 2,000 numbers on it, indicating a very high number of IRQs in the system
- Hyper-Threading of CPUs can increase the severity of this issue
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
