Why do my RHEV hosts experience a very high load and system CPU usage?
Issue
- My RHEV hosts experience a very high load (120+) and very high system CPU usage when the number of VMs running in them goes beyond a certain number (40 VMs approximately):
$ uptime
11:29:45 up 2:07, 0 users, load average: 124.81, 119.18, 119.63
$ ^^^^^^
$ dstat -at
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----system----
usr sys idl wai hiq siq| read writ| recv send| in out | int csw | date/time
10 38 53 0 0 0| 0 1080k|5206k 5434k| 0 0 | 229k 355k|17-10 12:03:16
12 50 37 0 0 0|1028k 1449k|6000k 5159k| 0 0 | 221k 316k|17-10 12:03:17
8 61 31 0 0 0| 0 2524k|6171k 6394k| 0 0 | 216k 302k|17-10 12:03:18
9 55 36 0 0 0| 0 1376k|5785k 5517k| 0 0 | 218k 309k|17-10 12:03:19
8 34 58 0 0 0| 0 1148k|6762k 8135k| 0 0 | 227k 361k|17-10 12:03:20
10 39 51 0 0 0| 0 1156k|7746k 6963k| 0 0 | 229k 360k|17-10 12:03:21
10 31 60 0 0 0| 0 1076k|6701k 7543k| 0 0 | 227k 363k|17-10 12:03:22
11 34 55 0 0 0| 0 3224k|6278k 4847k| 0 0 | 226k 364k|17-10 12:03:23
11 45 43 0 0 0|4096B 1932k|8633k 7924k| 0 0 | 225k 344k|17-10 12:03:24
9 37 54 0 0 0| 0 1448k|6105k 7732k| 0 0 | 219k 343k|17-10 12:03:25
9 33 58 0 0 0| 0 1908k|4784k 6192k| 0 0 | 228k 369k|17-10 12:03:26
13 33 55 0 0 0|1212k 936k|4842k 4397k| 0 0 | 231k 366k|17-10 12:03:27
12 43 45 0 0 0|1024k 1321k|5512k 5791k| 0 0 | 229k 345k|17-10 12:03:28
11 31 57 0 0 0| 0 1948k|5727k 5726k| 0 0 | 229k 369k|17-10 12:03:29
10 47 42 0 0 0| 0 1848k|7908k 7164k| 0 0 | 222k 324k|17-10 12:03:30
9 62 28 0 0 0| 0 356k|5903k 6670k| 0 0 | 214k 289k|17-10 12:03:31
11 47 41 0 0 0| 0 1804k|6483k 5862k| 0 0 | 224k 331k|17-10 12:03:32
8 48 43 0 0 0| 0 2800k|4783k 5233k| 0 0 | 220k 335k|17-10 12:03:33
8 36 56 0 0 0|4096B 1140k|5910k 5530k| 0 0 | 227k 365k|17-10 12:03:34
- The load and system CPU usage explode when the number of VMs exceeds an undetermined value. 40 VMs are enough to trigger it:
20 VMs per host --> Load: 2
30 VMs per host --> Load: 4
40 VMs per host --> Load: 120+
- We have set the
virtual-hostprofile oftunedin those hosts which changes the default values ofkernel.sched_min_granularity_nsandkernel.sched_wakeup_granularity_nsscheduler tunables:
$ sysctl -a | grep granularity
kernel.sched_min_granularity_ns = 10000000
kernel.sched_wakeup_granularity_ns = 15000000
- Setting
tunedprofile todefaultthus revertingkernel.sched_min_granularity_nsandkernel.sched_wakeup_granularity_nsto their default values causes both the load and system CPU usage to decrease back to reasonable values:
# tuned-adm profile default
Reverting to saved sysctl settings: [ OK ]
Calling '/etc/ktune.d/tunedadm.sh stop': [ OK ]
Reverting to cfq elevator: dm-0 dm-1 dm-10 dm-11 dm-12 dm-13 dm-14 dm-15 dm-16 dm-2 dm-3 dm-4 dm-5 dm-6 dm-7 dm-8 dm-9 sda sdb sdc sdd sde [ OK ]
Stopping tuned: [ OK ]
Switching to profile 'default'
Applying ktune sysctl settings:
/etc/ktune.d/tunedadm.conf: [ OK ]
Applying sysctl settings from /etc/sysctl.conf
Starting tuned: [ OK ]
#
# sysctl -a | grep granularity
kernel.sched_min_granularity_ns = 2000000
kernel.sched_wakeup_granularity_ns = 2000000
#
# dstat -at
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----system----
usr sys idl wai hiq siq| read writ| recv send| in out | int csw | date/time
17 21 62 0 0 0| 0 2508k|5582k 6018k| 0 0 | 236k 395k|17-10 12:03:35
11 8 81 0 0 0| 0 1088k|4003k 5049k| 0 0 | 232k 420k|17-10 12:03:36
11 11 78 0 0 0|1028k 1405k|5113k 4456k| 0 0 | 234k 428k|17-10 12:03:37
13 14 73 0 0 0| 0 2928k|5732k 5575k| 0 0 | 234k 418k|17-10 12:03:38
10 10 79 0 0 0| 0 1588k|9184k 6931k| 0 0 | 237k 426k|17-10 12:03:39
12 10 78 0 0 0| 0 2112k|7443k 9568k| 0 0 | 234k 423k|17-10 12:03:40
11 10 79 0 0 0| 0 720k|6871k 6967k| 0 0 | 236k 423k|17-10 12:03:41
Environment
- Red Hat Enterprise Virtualization 3.2 or later.
tunedenabled on RHEV hosts and set to profilevirtual-host(default since RHEV 3.3).
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
