RHEL6.8: High CPU consumption after updating from RHEL6.7 kernel, many processes contending on sparse_irq_lock spin_lock, many processes reading /proc/stat
Issue
- High CPU usage after updating the kernel from 2.6.32-573.26.1.el6 to 2.6.32-642.1.1.el6.
- CPU usage was 0% on 2.6.32-573.26.1 kernel (99.9% idle) and 46% on the 2.6.32-642.1.1 kernel (54.4% idle).
-
Before the upgrade
# uname -a Linux foo.example.com 2.6.32-573.26.1.el6.x86_64 #1 SMP Tue Apr 12 01:47:01 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux # top top - 23:01:40 up 8 min, 1 user, load average: 1.69, 1.07, 0.53 Tasks: 1650 total, 1 running, 1648 sleeping, 0 stopped, 1 zombie Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st ...
-
After the upgrade
Linux foo.example.com 2.6.32-642.1.1.el6.x86_64 #1 SMP Fri May 6 14:54:05 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux top - 22:45:15 up 2 days, 5:27, 1 user, load average: 54.24, 50.71, 50.24 Tasks: 1684 total, 26 running, 1657 sleeping, 0 stopped, 1 zombie Cpu(s): 1.3%us, 44.3%sy, 0.0%ni, 54.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st ...
-
High CPU usage with many processes in the following backtrace in
_spin_lock_irqsave
fromkstat_irq_usr
crash> bt 53441 PID: 53441 TASK: ffff8830669fe040 CPU: 75 COMMAND: "Agent Heartbeat" #0 [ffff8830b8bc6e90] crash_nmi_callback at ffffffff810366e6 #1 [ffff8830b8bc6ea0] notifier_call_chain at ffffffff8154dd45 #2 [ffff8830b8bc6ee0] atomic_notifier_call_chain at ffffffff8154ddaa #3 [ffff8830b8bc6ef0] notify_die at ffffffff810aceae #4 [ffff8830b8bc6f20] do_nmi at ffffffff8154b9c3 #5 [ffff8830b8bc6f50] nmi at ffffffff8154b283 [exception RIP: _spin_lock_irqsave+0x2f] RIP: ffffffff8154a97f RSP: ffff8830669dbc78 RFLAGS: 00200083 RAX: 0000000000000206 RBX: 00000000000006ad RCX: 00000000000001fd RDX: 0000000000200286 RSI: 0000000000000001 RDI: ffffffff81f17c88 RBP: ffff8830669dbc78 R8: 00000000fffffffb R9: 00000000fffffffe R10: 0000000000000000 R11: 0000000000000014 R12: 00000000000006ae R13: ffff8840631264c0 R14: 000000000006fd3b R15: 0000000000000001 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #6 [ffff8830669dbc78] _spin_lock_irqsave at ffffffff8154a97f #7 [ffff8830669dbc80] kstat_irqs_usr at ffffffff810f3894 #8 [ffff8830669dbca0] show_stat at ffffffff8120f7e4 #9 [ffff8830669dbe20] seq_read at ffffffff811bfe12 #10 [ffff8830669dbea0] proc_reg_read at ffffffff81205a1e #11 [ffff8830669dbef0] vfs_read at ffffffff8119a585 #12 [ffff8830669dbf30] sys_read at ffffffff8119a8d1 #13 [ffff8830669dbf80] system_call_fastpath at ffffffff8100b0d2
Environment
- Red Hat Enterprise Linux 6.7
- kernels greater than 2.6.32-573*.el6
- Red Hat Enterprise Linux 6.8
- Seen on various kernel-2.6.32-642*..el6
- Many processes reading /proc/stat in parallel
- seen with performance monitoring tools
- Example: Interscope CA Application Monitoring
- Hardware
- Seen on IBM BladeCenter Hx5 -[7873AC1]-/Node 1, System Card, BIOS -[HIE179AUS-1.79]- 04/23/2013
- 80 CPUs
- the 'intr' line of /proc/stat has over 2,000 numbers on it, indicating a very high number of IRQs in the system
- Hyper-Threading of CPUs can increase the severity of this issue
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.