Why do my RHEV hosts experience a very high load and system CPU usage?
Issue
- My RHEV hosts experience a very high load (120+) and very high system CPU usage when the number of VMs running in them goes beyond a certain number (40 VMs approximately):
$ uptime
11:29:45 up 2:07, 0 users, load average: 124.81, 119.18, 119.63
$ ^^^^^^
$ dstat -at
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----system----
usr sys idl wai hiq siq| read writ| recv send| in out | int csw | date/time
10 38 53 0 0 0| 0 1080k|5206k 5434k| 0 0 | 229k 355k|17-10 12:03:16
12 50 37 0 0 0|1028k 1449k|6000k 5159k| 0 0 | 221k 316k|17-10 12:03:17
8 61 31 0 0 0| 0 2524k|6171k 6394k| 0 0 | 216k 302k|17-10 12:03:18
9 55 36 0 0 0| 0 1376k|5785k 5517k| 0 0 | 218k 309k|17-10 12:03:19
8 34 58 0 0 0| 0 1148k|6762k 8135k| 0 0 | 227k 361k|17-10 12:03:20
10 39 51 0 0 0| 0 1156k|7746k 6963k| 0 0 | 229k 360k|17-10 12:03:21
10 31 60 0 0 0| 0 1076k|6701k 7543k| 0 0 | 227k 363k|17-10 12:03:22
11 34 55 0 0 0| 0 3224k|6278k 4847k| 0 0 | 226k 364k|17-10 12:03:23
11 45 43 0 0 0|4096B 1932k|8633k 7924k| 0 0 | 225k 344k|17-10 12:03:24
9 37 54 0 0 0| 0 1448k|6105k 7732k| 0 0 | 219k 343k|17-10 12:03:25
9 33 58 0 0 0| 0 1908k|4784k 6192k| 0 0 | 228k 369k|17-10 12:03:26
13 33 55 0 0 0|1212k 936k|4842k 4397k| 0 0 | 231k 366k|17-10 12:03:27
12 43 45 0 0 0|1024k 1321k|5512k 5791k| 0 0 | 229k 345k|17-10 12:03:28
11 31 57 0 0 0| 0 1948k|5727k 5726k| 0 0 | 229k 369k|17-10 12:03:29
10 47 42 0 0 0| 0 1848k|7908k 7164k| 0 0 | 222k 324k|17-10 12:03:30
9 62 28 0 0 0| 0 356k|5903k 6670k| 0 0 | 214k 289k|17-10 12:03:31
11 47 41 0 0 0| 0 1804k|6483k 5862k| 0 0 | 224k 331k|17-10 12:03:32
8 48 43 0 0 0| 0 2800k|4783k 5233k| 0 0 | 220k 335k|17-10 12:03:33
8 36 56 0 0 0|4096B 1140k|5910k 5530k| 0 0 | 227k 365k|17-10 12:03:34
- The load and system CPU usage explode when the number of VMs exceeds an undetermined value. 40 VMs are enough to trigger it:
20 VMs per host --> Load: 2
30 VMs per host --> Load: 4
40 VMs per host --> Load: 120+
- We have set the
virtual-hostprofile oftunedin those hosts which changes the default values ofkernel.sched_min_granularity_nsandkernel.sched_wakeup_granularity_nsscheduler tunables:
$ sysctl -a | grep granularity
kernel.sched_min_granularity_ns = 10000000
kernel.sched_wakeup_granularity_ns = 15000000
- Setting
tunedprofile todefaultthus revertingkernel.sched_min_granularity_nsandkernel.sched_wakeup_granularity_nsto their default values causes both the load and system CPU usage to decrease back to reasonable values:
# tuned-adm profile default
Reverting to saved sysctl settings: [ OK ]
Calling '/etc/ktune.d/tunedadm.sh stop': [ OK ]
Reverting to cfq elevator: dm-0 dm-1 dm-10 dm-11 dm-12 dm-13 dm-14 dm-15 dm-16 dm-2 dm-3 dm-4 dm-5 dm-6 dm-7 dm-8 dm-9 sda sdb sdc sdd sde [ OK ]
Stopping tuned: [ OK ]
Switching to profile 'default'
Applying ktune sysctl settings:
/etc/ktune.d/tunedadm.conf: [ OK ]
Applying sysctl settings from /etc/sysctl.conf
Starting tuned: [ OK ]
#
# sysctl -a | grep granularity
kernel.sched_min_granularity_ns = 2000000
kernel.sched_wakeup_granularity_ns = 2000000
#
# dstat -at
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----system----
usr sys idl wai hiq siq| read writ| recv send| in out | int csw | date/time
17 21 62 0 0 0| 0 2508k|5582k 6018k| 0 0 | 236k 395k|17-10 12:03:35
11 8 81 0 0 0| 0 1088k|4003k 5049k| 0 0 | 232k 420k|17-10 12:03:36
11 11 78 0 0 0|1028k 1405k|5113k 4456k| 0 0 | 234k 428k|17-10 12:03:37
13 14 73 0 0 0| 0 2928k|5732k 5575k| 0 0 | 234k 418k|17-10 12:03:38
10 10 79 0 0 0| 0 1588k|9184k 6931k| 0 0 | 237k 426k|17-10 12:03:39
12 10 78 0 0 0| 0 2112k|7443k 9568k| 0 0 | 234k 423k|17-10 12:03:40
11 10 79 0 0 0| 0 720k|6871k 6967k| 0 0 | 236k 423k|17-10 12:03:41
Environment
- Red Hat Enterprise Virtualization 3.2 or later.
tunedenabled on RHEV hosts and set to profilevirtual-host(default since RHEV 3.3).
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.