Why do my RHEV hosts experience a very high load and system CPU usage?
Issue
- My RHEV hosts experience a very high load (120+) and very high system CPU usage when the number of VMs running in them goes beyond a certain number (40 VMs approximately):
 
$ uptime
 11:29:45 up  2:07,  0 users,  load average: 124.81, 119.18, 119.63
$                                            ^^^^^^
$ dstat -at
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----system----
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw |  date/time
 10  38  53   0   0   0|   0  1080k|5206k 5434k|   0     0 | 229k  355k|17-10 12:03:16
 12  50  37   0   0   0|1028k 1449k|6000k 5159k|   0     0 | 221k  316k|17-10 12:03:17
  8  61  31   0   0   0|   0  2524k|6171k 6394k|   0     0 | 216k  302k|17-10 12:03:18
  9  55  36   0   0   0|   0  1376k|5785k 5517k|   0     0 | 218k  309k|17-10 12:03:19
  8  34  58   0   0   0|   0  1148k|6762k 8135k|   0     0 | 227k  361k|17-10 12:03:20
 10  39  51   0   0   0|   0  1156k|7746k 6963k|   0     0 | 229k  360k|17-10 12:03:21
 10  31  60   0   0   0|   0  1076k|6701k 7543k|   0     0 | 227k  363k|17-10 12:03:22
 11  34  55   0   0   0|   0  3224k|6278k 4847k|   0     0 | 226k  364k|17-10 12:03:23
 11  45  43   0   0   0|4096B 1932k|8633k 7924k|   0     0 | 225k  344k|17-10 12:03:24
  9  37  54   0   0   0|   0  1448k|6105k 7732k|   0     0 | 219k  343k|17-10 12:03:25
  9  33  58   0   0   0|   0  1908k|4784k 6192k|   0     0 | 228k  369k|17-10 12:03:26
 13  33  55   0   0   0|1212k  936k|4842k 4397k|   0     0 | 231k  366k|17-10 12:03:27
 12  43  45   0   0   0|1024k 1321k|5512k 5791k|   0     0 | 229k  345k|17-10 12:03:28
 11  31  57   0   0   0|   0  1948k|5727k 5726k|   0     0 | 229k  369k|17-10 12:03:29
 10  47  42   0   0   0|   0  1848k|7908k 7164k|   0     0 | 222k  324k|17-10 12:03:30
  9  62  28   0   0   0|   0   356k|5903k 6670k|   0     0 | 214k  289k|17-10 12:03:31
 11  47  41   0   0   0|   0  1804k|6483k 5862k|   0     0 | 224k  331k|17-10 12:03:32
  8  48  43   0   0   0|   0  2800k|4783k 5233k|   0     0 | 220k  335k|17-10 12:03:33
  8  36  56   0   0   0|4096B 1140k|5910k 5530k|   0     0 | 227k  365k|17-10 12:03:34
- The load and system CPU usage explode when the number of VMs exceeds an undetermined value. 40 VMs are enough to trigger it:
 
20 VMs per host --> Load: 2
30 VMs per host --> Load: 4
40 VMs per host --> Load: 120+
- We have set the 
virtual-hostprofile oftunedin those hosts which changes the default values ofkernel.sched_min_granularity_nsandkernel.sched_wakeup_granularity_nsscheduler tunables: 
$ sysctl -a | grep granularity
kernel.sched_min_granularity_ns = 10000000
kernel.sched_wakeup_granularity_ns = 15000000
- Setting 
tunedprofile todefaultthus revertingkernel.sched_min_granularity_nsandkernel.sched_wakeup_granularity_nsto their default values causes both the load and system CPU usage to decrease back to reasonable values: 
# tuned-adm profile default
Reverting to saved sysctl settings:                        [  OK  ]
Calling '/etc/ktune.d/tunedadm.sh stop':                   [  OK  ]
Reverting to cfq elevator: dm-0 dm-1 dm-10 dm-11 dm-12 dm-13 dm-14 dm-15 dm-16 dm-2 dm-3 dm-4 dm-5 dm-6 dm-7 dm-8 dm-9 sda sdb sdc sdd sde                                                   [  OK  ]
Stopping tuned:                                            [  OK  ]
Switching to profile 'default'
Applying ktune sysctl settings:
/etc/ktune.d/tunedadm.conf:                                [  OK  ]
Applying sysctl settings from /etc/sysctl.conf
Starting tuned:                                            [  OK  ]
#
# sysctl -a | grep granularity
kernel.sched_min_granularity_ns = 2000000
kernel.sched_wakeup_granularity_ns = 2000000
#
# dstat -at
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----system----
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw |  date/time
 17  21  62   0   0   0|   0  2508k|5582k 6018k|   0     0 | 236k  395k|17-10 12:03:35
 11   8  81   0   0   0|   0  1088k|4003k 5049k|   0     0 | 232k  420k|17-10 12:03:36
 11  11  78   0   0   0|1028k 1405k|5113k 4456k|   0     0 | 234k  428k|17-10 12:03:37
 13  14  73   0   0   0|   0  2928k|5732k 5575k|   0     0 | 234k  418k|17-10 12:03:38
 10  10  79   0   0   0|   0  1588k|9184k 6931k|   0     0 | 237k  426k|17-10 12:03:39
 12  10  78   0   0   0|   0  2112k|7443k 9568k|   0     0 | 234k  423k|17-10 12:03:40
 11  10  79   0   0   0|   0   720k|6871k 6967k|   0     0 | 236k  423k|17-10 12:03:41
  Environment
- Red Hat Enterprise Virtualization 3.2 or later.
 tunedenabled on RHEV hosts and set to profilevirtual-host(default since RHEV 3.3).
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.