Why do my RHEV hosts experience a very high load and system CPU usage?

Issue

My RHEV hosts experience a very high load (120+) and very high system CPU usage when the number of VMs running in them goes beyond a certain number (40 VMs approximately):

$ uptime
 11:29:45 up  2:07,  0 users,  load average: 124.81, 119.18, 119.63
$                                            ^^^^^^
$ dstat -at
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----system----
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw |  date/time
 10  38  53   0   0   0|   0  1080k|5206k 5434k|   0     0 | 229k  355k|17-10 12:03:16
 12  50  37   0   0   0|1028k 1449k|6000k 5159k|   0     0 | 221k  316k|17-10 12:03:17
  8  61  31   0   0   0|   0  2524k|6171k 6394k|   0     0 | 216k  302k|17-10 12:03:18
  9  55  36   0   0   0|   0  1376k|5785k 5517k|   0     0 | 218k  309k|17-10 12:03:19
  8  34  58   0   0   0|   0  1148k|6762k 8135k|   0     0 | 227k  361k|17-10 12:03:20
 10  39  51   0   0   0|   0  1156k|7746k 6963k|   0     0 | 229k  360k|17-10 12:03:21
 10  31  60   0   0   0|   0  1076k|6701k 7543k|   0     0 | 227k  363k|17-10 12:03:22
 11  34  55   0   0   0|   0  3224k|6278k 4847k|   0     0 | 226k  364k|17-10 12:03:23
 11  45  43   0   0   0|4096B 1932k|8633k 7924k|   0     0 | 225k  344k|17-10 12:03:24
  9  37  54   0   0   0|   0  1448k|6105k 7732k|   0     0 | 219k  343k|17-10 12:03:25
  9  33  58   0   0   0|   0  1908k|4784k 6192k|   0     0 | 228k  369k|17-10 12:03:26
 13  33  55   0   0   0|1212k  936k|4842k 4397k|   0     0 | 231k  366k|17-10 12:03:27
 12  43  45   0   0   0|1024k 1321k|5512k 5791k|   0     0 | 229k  345k|17-10 12:03:28
 11  31  57   0   0   0|   0  1948k|5727k 5726k|   0     0 | 229k  369k|17-10 12:03:29
 10  47  42   0   0   0|   0  1848k|7908k 7164k|   0     0 | 222k  324k|17-10 12:03:30
  9  62  28   0   0   0|   0   356k|5903k 6670k|   0     0 | 214k  289k|17-10 12:03:31
 11  47  41   0   0   0|   0  1804k|6483k 5862k|   0     0 | 224k  331k|17-10 12:03:32
  8  48  43   0   0   0|   0  2800k|4783k 5233k|   0     0 | 220k  335k|17-10 12:03:33
  8  36  56   0   0   0|4096B 1140k|5910k 5530k|   0     0 | 227k  365k|17-10 12:03:34

The load and system CPU usage explode when the number of VMs exceeds an undetermined value. 40 VMs are enough to trigger it:

20 VMs per host --> Load: 2
30 VMs per host --> Load: 4
40 VMs per host --> Load: 120+

We have set the virtual-host profile of tuned in those hosts which changes the default values of kernel.sched_min_granularity_ns and kernel.sched_wakeup_granularity_ns scheduler tunables:

$ sysctl -a | grep granularity
kernel.sched_min_granularity_ns = 10000000
kernel.sched_wakeup_granularity_ns = 15000000

Setting tuned profile to default thus reverting kernel.sched_min_granularity_ns and kernel.sched_wakeup_granularity_ns to their default values causes both the load and system CPU usage to decrease back to reasonable values:

# tuned-adm profile default
Reverting to saved sysctl settings:                        [  OK  ]
Calling '/etc/ktune.d/tunedadm.sh stop':                   [  OK  ]
Reverting to cfq elevator: dm-0 dm-1 dm-10 dm-11 dm-12 dm-13 dm-14 dm-15 dm-16 dm-2 dm-3 dm-4 dm-5 dm-6 dm-7 dm-8 dm-9 sda sdb sdc sdd sde                                                   [  OK  ]
Stopping tuned:                                            [  OK  ]
Switching to profile 'default'
Applying ktune sysctl settings:
/etc/ktune.d/tunedadm.conf:                                [  OK  ]
Applying sysctl settings from /etc/sysctl.conf
Starting tuned:                                            [  OK  ]
#
# sysctl -a | grep granularity
kernel.sched_min_granularity_ns = 2000000
kernel.sched_wakeup_granularity_ns = 2000000
#
# dstat -at
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- ----system----
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw |  date/time
 17  21  62   0   0   0|   0  2508k|5582k 6018k|   0     0 | 236k  395k|17-10 12:03:35
 11   8  81   0   0   0|   0  1088k|4003k 5049k|   0     0 | 232k  420k|17-10 12:03:36
 11  11  78   0   0   0|1028k 1405k|5113k 4456k|   0     0 | 234k  428k|17-10 12:03:37
 13  14  73   0   0   0|   0  2928k|5732k 5575k|   0     0 | 234k  418k|17-10 12:03:38
 10  10  79   0   0   0|   0  1588k|9184k 6931k|   0     0 | 237k  426k|17-10 12:03:39
 12  10  78   0   0   0|   0  2112k|7443k 9568k|   0     0 | 234k  423k|17-10 12:03:40
 11  10  79   0   0   0|   0   720k|6871k 6967k|   0     0 | 236k  423k|17-10 12:03:41

Environment

Red Hat Enterprise Virtualization 3.2 or later.
tuned enabled on RHEV hosts and set to profile virtual-host (default since RHEV 3.3).

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

Why do my RHEV hosts experience a very high load and system CPU usage?

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links