Chapter 4. Analyzing RHEL-OSP 7 Benchmark Results with Rally

With the basic fundamentals in creating a .json file as seen in the section Section 3.5, “Benchmarking with Rally”, this section changes its focus to analyzing the different pre-defined benchmarking Rally scenarios using Rally’s HTML Reporting to analyze the captured results. The Rally scenarios to be analyzed are:

  • KeystoneBasic.create-user (setup validation)
  • Authenticate.validate_nova
  • NovaServers.boot_and_list_server

The Keystone.create-user rally scenario looks like the following:

{
    "KeystoneBasic.create_user": [
        {
            "args": {},
            "runner": {
                "type": "constant",
                "times": 100,
                "concurrency": 10
            }
        }
    ]
}

The KeystoneBasic.create_user scenario reads as attempting to create 10 users in parallel (concurrency) until it reaches the maximum value (times) of 100.

As with all scenarios that are discussed in this reference architecture, the key parameters concurrency and times control the amount of load to be placed on a RHEL-OSP environment. This is an important part of analysing the results as it allows for diagnosing a RHEL-OSP environment at different workload levels.

When creating a .json file, the value of concurrency and times are static values that dicatate the maximum number of users to launch for a specified scenario. To overcome this limitation, a script labeled rally-wrapper.sh that increments the maximum number of users to launch until the success rate is no longer satisified. The rally-wrapper.sh script increments the values of concurrency and times by a value of 10 as long as the success rate is met. The variable that controls the success rate within the rally-wrapper.sh script is labeled EXPECTED_SUCCESS.

The rally-wrapper.sh script is the perfect complement to all the rally scenarios discussed as the ability to increase the maximum number of users or guests (depending on the scenario) allows for diagnosis of any errors as quickly as possible.

The contents of the rally-wrapper.sh script:

# cat rally-wrapper.sh
#
# This code will increment by 10
# @author Joe Talerico <jtaleric@redhat.com>
#
RALLY_JSON="ra-scaleboot-nonetworking.json"
EXPECTED_SUCCESS="100"
REPEAT=1
INCREMENT=10
TIMESTAMP=$(date +%s)
mkdir -p run-${REPEAT}

while [[ $REPEAT -gt 0 ]] ; do
 RUN=true
 while $RUN ; do
  CONCURRENCY=`cat ${RALLY_JSON} | grep concurrency | awk '{print $2}'`
  echo "Current number of guests launching : ${CONCURRENCY}"
  RALLY_RESULT=$(rally task start ${RALLY_JSON})
  TASK=$(echo "${RALLY_RESULT}" | grep Task | grep finished | awk '{print substr($2,0,length($2)-1)}')
  RUN_RESULT=$(echo "${RALLY_RESULT}" | grep total | awk '{print $16}')
  echo "     Task : ${TASK}"
  echo "     Result : ${RUN_RESULT}"
  rally task report ${TASK} --out run-${REPEAT}/${TASK}.html
  rally task results ${TASK} > run-${REPEAT}/${TASK}.json

  SUCCESS_RATE=$(echo "${RUN_RESULT}" | awk -F. '{ print $1 }')

  if [ "${SUCCESS_RATE}" -ge "${EXPECTED_SUCCESS}" ] ; then
   NEW_CON=$(echo "`cat ${RALLY_JSON} | grep concurrency | awk '{print $2}'`+${INCREMENT}" | bc)
   sed -i "s/\"times\"\:.*$/\"times\"\: ${NEW_CON},/g" ${RALLY_JSON}
   sed -i "s/\"concurrency\"\:.*$/\"concurrency\"\: ${NEW_CON}/g" ${RALLY_JSON}
  else
   RUN=false
   sed -i "s/\"times\"\:.*$/\"times\"\: 10,/g" ${RALLY_JSON}
   sed -i "s/\"concurrency\"\:.*$/\"concurrency\"\: 10/g" ${RALLY_JSON}
  fi
  sleep 60
 done
 let REPEAT-=1
done

The following rally scenario Authenticate.validate_nova attempts to create 3 tenants with 5 users in each then attempts to authenticate to nova 10 times performing 5 authentications per second.

{
    "Authenticate.validate_nova": [
        {
            "args": {
                "repetitions": 2
            },
            "runner": {
                "type": "constant",
                "times": 10,
                "concurrency": 5
            },
            "context": {
                "users": {
                    "tenants": 3,
                    "users_per_tenant": 5
                }
            }
        }
    ]
}

The Rally scenario NovaServers.boot_and_list_server scenario attempts to launch a m1.small guest with 1 vCPU, 2GB of RAM, and 20GB of storage disk (default values) and list all the booted guest instances. In order to determine the maximum amount of deployable guest instances within the OpenStack cloud, the theoretical limits of the RHEL-OSP 7 environment must be calculated. Within the reference environment, the theoretical limit of 383GB of Total RAM available to deploy guest instances constitutes to 187 guest instances available for deployment with the m1.small flavor. The theoretical value to determine maximum number of deployable guest instances is determined by calculating ((total ram - reserved memory)* memory overcommit) / RAM of each instance.

The value of 1 is used to represent the default memory overcommit ratio (1:1) within the RHEL-OSP 7 environment. The overcommit ratio, found within the /etc/nova/nova.conf file, plays an important role in scalability and performance. As the value of overcommit is increased, scalability increases while performance decreases. Performance suffers due to more demand added to the OpenStack cloud for additional guests using a fixed amount of resources. However, as the value of overcommit is decreased, performance increases as less resources are shared across the different guests but scalability suffers due to the ratio 1:1 mapping to the same value of physical hardware.

Using the nova hypervisor-stats command, a user can learn a lot about the current environment. For example, this is the hypervisor stats of the existing reference environment configuration when only one compute node is available.

$ nova hypervisor-stats
+----------------------+-------+
| Property             | Value |
+----------------------+-------+
| count                | 1     |
| current_workload     | 0     |
| disk_available_least | 36949 |
| free_disk_gb         | 37001 |
| free_ram_mb          | 93912 |
| local_gb             | 37021 |
| local_gb_used        | 20    |
| memory_mb            | 96472 |
| memory_mb_used       | 2560  |
| running_vms          | 1     |
| vcpus                | 32    |
| vcpus_used           | 1     |
+----------------------+-------+

Taking a closer look at these stats, the key values to further investigate include: count, free_ram_mb, memory_mb, memory_mb_used, and vcpus.

The breakdown of these is as follows:

  • count - represents the number of compute nodes available
  • free_ram_mb - amount of RAM available to launch instances shown in megabytes
  • memory_mb - total amount of RAM in megabytes
  • memory_mb_used - total memory used in the environment (includes reserved_memory and memory of existing instances)
  • vpus - total available virtual CPUs.

With these statistics, to calculate the amount of guest instances that can be launched take free_ram_mb and divide by the amount of memory consumed by the particular flavor that is to be deployed. In this reference environment, the m1.small flavor consumes 2GB per instance. This consistutes to 93912 MB / 2048 MB = 45 guest instances (rounded down).

Note

The running_vms value of 1 is due to the creation of an already existing VM.

Once the theoretical maximum amount of deployable guest instances within an existing RHEL-OSP 7 environment is identified, the next step is to create a .json file that attempts to reach the theoretical upper boundaries calculated. The .json file to capture the maximum number of guests that can be launched by a RHEL-OSP 7 environment is as follows.

{% set flavor_name = flavor_name or "m1.small" %}
{
    "NovaServers.boot_and_list_server": [
        {
            "args": {
                "flavor": {
                    "name": "{{flavor_name}}"
                },
                "nics": [{
                    "net-id": "0fd1b597-7ed0-45cf-b9e2-a5dfbee80377"
                }],
                "image": {
                    "name": "rhel-server7"
                },
                "detailed": true
            },
            "runner": {
                "concurrency": 1,
                "times": 1,
                "type": "constant"
            },
            "context": {
                "users": {
                    "tenants": 1,
                    "users_per_tenant": 1
                },
                "quotas": {
                    "neutron": {
                        "network": -1,
                        "port": -1
                    },
                    "nova": {
                        "instances": -1,
                        "cores": -1,
                        "ram": -1
                    }
                }
            }
        }
    ]
}

The initial objective is use the rally-wrapper.sh script while using small times and concurrency values that increment slowly in order to diagnose any errors as quickly as possible.

4.1. Initial boot-storm Rally Results

The boot-storm tests in rally attempt to launch as many guests as the RHEL-OSP environment can handle simultaneously. The initial tests consist of boot-storm tests with 1, 2, 3 and 4 compute nodes.

When attempting to launch as many guests as the RHEL-OSP environment can handle simultaneously, one must first calculate the amount of free RAM available to launch instances. As shown in Chapter 4, Analyzing RHEL-OSP 7 Benchmark Results with Rally, the nova hypervisor-stats is a critical component to this success. With 1 compute node, this existing reference environment is able to launch a maximum of 45 guests (92376 MB / 2048 MB) if each guest takes 2 GB of RAM. With this information, one can then use the rally-wrapper.sh script and set the concurrency value to 45, and the times value to 45. This strategy then attempts to launch as many guests as the environment can handle simultaneously.

A snippet of the initial results for one compute node are shown below.

1-compute-node-45times-45concurrency-vif-plugging-timeout-30

+------------------------------------------------------------------------------------------------+
|                                      Response Times (sec)                                      |
+-------------------+--------+---------+---------+---------+---------+---------+---------+-------+
| action            | min    | median  | 90%ile  | 95%ile  | max     | avg     | success | count |
+-------------------+--------+---------+---------+---------+---------+---------+---------+-------+
| nova.boot_server  | 68.79  | 183.686 | 279.841 | 296.122 | 304.14  | 178.048 | 100.0%  | 45    |
| nova.list_servers | 0.849  | 0.993   | 1.246   | 1.327   | 1.402   | 1.043   | 100.0%  | 45    |
| total             | 69.933 | 184.674 | 280.906 | 297.124 | 305.223 | 179.092 | 100.0%  | 45    |
+-------------------+--------+---------+---------+---------+---------+---------+---------+-------+
Load duration: 305.434197903
Full duration: 445.057275057

HINTS:
* To plot HTML graphics with this data, run:
        rally task report 80493bb8-8229-4ec2-ba7b-bdca0954dc73 --out output.html

* To generate a JUnit report, run:
        rally task report 80493bb8-8229-4ec2-ba7b-bdca0954dc73 --junit --out output.xml

* To get raw JSON output of task results, run:
        rally task results 80493bb8-8229-4ec2-ba7b-bdca0954dc73

Taking a closer look at the initial results, results show launching all 45 guests simultaneously achieves a success rate of 100%. While this is good news, the actual response times to boot those instances are quite high. The minimum time to boot was listed at 68.79 seconds while the average time was 178.048 seconds.

The first question one might ask is Why are the boot time values so high? The answer lies within the /etc/nova/nova.conf file of the compute node. The parameter vif_plugging_timeout determines the amount of time the nova service waits for neutron to report back that the port setup process is complete prior to nova continuing with the booting of an instance. The parameter vif_plugging_is_fatal determines what nova should do with that instance upon it exceeding the assigned timeout value.

RHEL-OSP 7 ships with vif_plugging_is_fatal set to False and vif_plugging_timeout set to 30, which makes nova wait for 30 seconds. To avoid the unncessary waiting, it is recommended to keep vif_plugging_is_fatal set to false and vif_plugging_timeout set to zero to avoid the wait.

With the modification of the vif parameters, rerunning the initial bootstorm test on 1 compute node resulted in the following:

1-compute-node-45times-45concurrency-vif-plugging-timeout-0

+------------------------------------------------------------------------------------------------+
|                                      Response Times (sec)                                      |
+-------------------+--------+---------+---------+---------+---------+---------+---------+-------+
| action            | min    | median  | 90%ile  | 95%ile  | max     | avg     | success | count |
+-------------------+--------+---------+---------+---------+---------+---------+---------+-------+
| nova.boot_server  | 33.18  | 111.991 | 163.766 | 178.831 | 180.093 | 106.784 | 100.0%  | 45    |
| nova.list_servers | 0.843  | 1.039   | 1.262   | 1.296   | 1.41    | 1.065   | 100.0%  | 45    |
| total             | 34.023 | 112.924 | 164.718 | 179.76  | 181.259 | 107.849 | 100.0%  | 45    |
+-------------------+--------+---------+---------+---------+---------+---------+---------+-------+
Load duration: 181.42532897
Full duration: 323.352867126

HINTS:
* To plot HTML graphics with this data, run:
        rally task report 23529f43-798d-44f1-9c7a-7488bfe86585 --out output.html

* To generate a JUnit report, run:
        rally task report 23529f43-798d-44f1-9c7a-7488bfe86585 --junit --out output.xml

* To get raw JSON output of task results, run:
        rally task results 23529f43-798d-44f1-9c7a-7488bfe86585

The changing of these values drastically improves the response times. Referencing just the minimum and average values, the minimum boot time is decreased 51.7% percent and the average boot time is decreased by 40% percent.

With the understanding of how vif_plugging_is_fatal and vif_plugging_timeout effect the bootstorm performance, the next step is to continue on with the bootstorm testing for 2, 3 and 4 compute nodes.

When adding an additional compute node to launch instances, the environment has 186800 MB of RAM available verified by nova hypervisor-stats. Each guest instance takes up 2048 MB of RAM, thus allows for the max concurrency launch of 91 instances ( 186800 MB / 2048 MB). Below is the response time results of that run.

2-compute-node-91times-91concurrency-vif-plugging-timeout-0

+------------------------------------------------------------------------------------------------+
|                                      Response Times (sec)                                      |
+-------------------+--------+---------+---------+---------+---------+---------+---------+-------+
| action            | min    | median  | 90%ile  | 95%ile  | max     | avg     | success | count |
+-------------------+--------+---------+---------+---------+---------+---------+---------+-------+
| nova.boot_server  | 43.502 | 171.631 | 295.4   | 309.489 | 314.713 | 168.553 | 100.0%  | 91    |
| nova.list_servers | 1.355  | 1.961   | 2.522   | 2.565   | 3.024   | 2.031   | 94.5%   | 91    |
| total             | 45.928 | 167.625 | 274.446 | 291.547 | 307.154 | 162.119 | 94.5%   | 91    |
+-------------------+--------+---------+---------+---------+---------+---------+---------+-------+
Load duration: 315.15325284
Full duration: 499.308477163

HINTS:
* To plot HTML graphics with this data, run:
        rally task report 04d9e8aa-da94-4724-a904-37abca471543 --out output.html

* To generate a JUnit report, run:
        rally task report 04d9e8aa-da94-4724-a904-37abca471543 --junit --out output.xml

* To get raw JSON output of task results, run:
        rally task results 04d9e8aa-da94-4724-a904-37abca471543

As a compute node is added, the RHEL-OSP environment is able to achieve a 100% success rate when booting 91 guest instances concurrently. As additional guest instances are launched simultaneously, the minimum boot time and average boot time increases. The minimum boot time increased by 23.7% when doubling the amount of compute nodes and instances launched, while the average boot time inreased by 36.6%.

When taking a closer look at the Rally results, one piece of information stands out. Why is nova able to boot with a 100% success rate, but does not achieve 100% success rate when listing the servers? The reason for this is due to a bug found within Rally: https://bugs.launchpad.net/rally/+bug/1510175, where it is possible to see discrepancies between the Rally HTML report and the results found within the Rally log file. To confirm if the success rate drop is related to the nova.list_servers and nova.boot_servers refer to the HTML reports Failures tab.

In order to launch the Rally HTML report, run via command line rally task report <id_of_report> --out <name>.html. An example from the above listing shows the following command:

rally task report 04d9e8aa-da94-4724-a904-37abca471543 --out output.html

Figure 4.1. Rally Task Failures with 2 Compute Nodes with 91 Concurrency and Times

taskfailures

Within the Failures tab, 5 tasks fail all with timeout exceptions. Each exception fails with Rally unable to change the status of a booting instance from its BUILD state to its ACTIVE state. When a guest instance is first launched, it is placed in a BUILD state. A boot failure occurs when a guest instance is unable to achieve an ACTIVE state. When reading the above error, it explicitly says that it timed out waiting for the guest instance to change states. Due to this, it is clear that the failures are not caused by nova.list_servers but instead by nova unable to successfully boot up all the instances to an ACTIVE state.

With knowing what the failures are, the next question to answer is Why are these instances failing? To answer this question, one must turn to the Ceph nodes. Within the Ceph nodes verify whether there is any unusually high amount of I/O wait. By capturing the results of top while an existing Rally task is running, the CPU wait times can be further reviewed. The image below displays the top event when attempting to launch 91 guest instances concurrently with 2 compute nodes.

Figure 4.2. High CPU Wait during 2 Compute Node Run with 91 Concurrency and 91 Times

highcpuwait

As depicted in the image above, the Ceph nodes cannot keep up with the amount of I/O requests coming into the environment as a large amount of spawning guest instance requests come in from the Rally task. Due to this, failures such as instances not achieving an ACTIVE state from their current BUILD state occurs. Another side effect of waiting for I/O to complete, is the higher boot times as compute nodes are added to support additional guest instances. In order to alleviate this bottleneck, changing the Ceph environment to use cached journals or including more Ceph nodes are potential solutions to alleviate the bottleneck. However, hardware restrictions in the System Engineering lab, prevented us from making these changes.

Continuing with findings, the figures below show as increasing the guest instances concurrently causes higher boot times and a lower success rate due to the I/O wait bottleneck by the Ceph nodes.

3-compute-node-137times-137concurrency-vif-plugging-timeout-0

+------------------------------------------------------------------------------------------------+
|                                      Response Times (sec)                                      |
+-------------------+--------+---------+---------+---------+---------+---------+---------+-------+
| action            | min    | median  | 90%ile  | 95%ile  | max     | avg     | success | count |
+-------------------+--------+---------+---------+---------+---------+---------+---------+-------+
| nova.boot_server  | 61.482 | 262.091 | 324.636 | 324.899 | 326.21  | 233.9   | 100.0%  | 137   |
| nova.list_servers | 1.9    | 3.086   | 3.86    | 3.903   | 4.259   | 3.086   | 71.5%   | 137   |
| total             | 64.951 | 214.873 | 308.903 | 313.681 | 320.596 | 201.339 | 71.5%   | 137   |
+-------------------+--------+---------+---------+---------+---------+---------+---------+-------+
Load duration: 327.246902943
Full duration: 641.576355934

HINTS:
* To plot HTML graphics with this data, run:
        rally task report 5580c462-fe7c-4919-b549-776b2a222ec5 --out output.html

* To generate a JUnit report, run:
        rally task report 5580c462-fe7c-4919-b549-776b2a222ec5 --junit --out output.xml

* To get raw JSON output of task results, run:
        rally task results 5580c462-fe7c-4919-b549-776b2a222ec5

When running the same test with 3 compute nodes, the RHEL-OSP environment is able to achieve a 71.5% success rate when booting 137 guest instances concurrently. As additional guest instances are launched simultaneously, the minimum boot time increases by 29.2%, while the average boot time increases by 27.9% from the previous test of 2 compute node with 91 guest instances launched.

Note

71.5% success rate is for the nova.boot_server not the nova.list_server due to BZ: https://bugs.launchpad.net/rally/+bug/1510175

4-compute-node-183times-183concurrency-vif-plugging-timeout-0

+------------------------------------------------------------------------------------------------+
|                                      Response Times (sec)                                      |
+-------------------+--------+---------+---------+---------+---------+---------+---------+-------+
| action            | min    | median  | 90%ile  | 95%ile  | max     | avg     | success | count |
+-------------------+--------+---------+---------+---------+---------+---------+---------+-------+
| nova.boot_server  | 86.67  | 328.305 | 337.028 | 338.162 | 339.256 | 272.641 | 100.0%  | 183   |
| nova.list_servers | 3.337  | 4.649   | 5.227   | 5.506   | 6.123   | 4.659   | 47.5%   | 183   |
| total             | 91.243 | 208.544 | 291.381 | 326.766 | 334.988 | 209.405 | 47.5%   | 183   |
+-------------------+--------+---------+---------+---------+---------+---------+---------+-------+
Load duration: 339.872634888
Full duration: 1070.80222201

HINTS:
* To plot HTML graphics with this data, run:
        rally task report 3812352f-6c0f-4b37-8544-cd40b51f21ef --out output.html

* To generate a JUnit report, run:
        rally task report 3812352f-6c0f-4b37-8544-cd40b51f21ef --junit --out output.xml

* To get raw JSON output of task results, run:
        rally task results 3812352f-6c0f-4b37-8544-cd40b51f21ef

When running the same test with 4 compute nodes, the RHEL-OSP environment is able to achieve a 47.5% success rate when booting 183 guest instances concurrently. As additional guest instances are launched simultaneously, the minimum boot time increases from by 29.2%, while the average boot time increases by 27.9% from the previous test of 3 compute nodes with 137 guest instances launched.

Note

47.5% success rate is for the nova.boot_server not the nova.list_server due to BZ: https://bugs.launchpad.net/rally/+bug/1510175

4.2. Rally Max Guest Launch

While the boot-storm tests attempt to launch as many guests as the RHEL-OSP environment can handle simultaneously, the max guest test tries to see whether the reference environment can achieve the theoretical max of m1.small guests. The initial calculation is to take free_ram_mb / amount of RAM of flavor to find max guests that can be launched using the nova hypervisor-stats command. The rally-wrapper.sh script does the max guest launch test by concurrently launching 8 guest instances until reaching the maximum amount of times for 1, 2, 3 and 4 compute nodes.

nova.boot_and_list_server_one_compute

{
    "NovaServers.boot_and_list_server": [
        {
            "args": {
                "flavor": {
                    "name": "m1.small"
                },
                "nics": [{
                    "net-id": "0fd1b597-7ed0-45cf-b9e2-a5dfbee80377"
                }],
                "image": {
                    "name": "rhel-server7"
                },
                "detailed": true
            },
            "runner": {
                "concurrency": 8,
                "times": 45,
                "type": "constant"
            },
            "context": {
                "users": {
                    "tenants": 1,
                    "users_per_tenant": 1
                },
                "quotas": {
                    "neutron": {
                        "network": -1,
                        "port": -1
                    },
                    "nova": {
                        "instances": -1,
                        "cores": -1,
                        "ram": -1
                    }
                }
            }
        }
    ]
}

1-compute-node-45times-8concurrency-vif-plugging-timeout-0

+-------------------------------------------------------------------------------------------+
|                                   Response Times (sec)                                    |
+-------------------+--------+--------+--------+--------+--------+--------+---------+-------+
| action            | min    | median | 90%ile | 95%ile | max    | avg    | success | count |
+-------------------+--------+--------+--------+--------+--------+--------+---------+-------+
| nova.boot_server  | 27.035 | 34.348 | 39.161 | 42.1   | 42.87  | 34.042 | 100.0%  | 45    |
| nova.list_servers | 0.432  | 0.852  | 1.142  | 1.158  | 1.312  | 0.824  | 100.0%  | 45    |
| total             | 28.194 | 35.25  | 39.694 | 42.576 | 43.348 | 34.866 | 100.0%  | 45    |
+-------------------+--------+--------+--------+--------+--------+--------+---------+-------+
Load duration: 210.317100048
Full duration: 347.372730017

The results above show that a success rate 100% for the max amount of guest instances of 45 when running with 1 compute node. The value of 45 guest instances is calculated using nova hypervisor-stats to correctly calculate the supported guest instance value.

When running with 2 compute nodes, 91 guest instances are supported. Below is the .json file used to achieve the 91 max guest instances.

nova.boot_and_list_server_two_compute

{
    "NovaServers.boot_and_list_server": [
        {
            "args": {
                "flavor": {
                    "name": "m1.small"
                },
                "nics": [{
                    "net-id": "0fd1b597-7ed0-45cf-b9e2-a5dfbee80377"
                }],
                "image": {
                    "name": "rhel-server7"
                },
                "detailed": true
            },
            "runner": {
                "concurrency": 8,
                "times": 91,
                "type": "constant"
            },
            "context": {
                "users": {
                    "tenants": 1,
                    "users_per_tenant": 1
                },
                "quotas": {
                    "neutron": {
                        "network": -1,
                        "port": -1
                    },
                    "nova": {
                        "instances": -1,
                        "cores": -1,
                        "ram": -1
                    }
                }
            }
        }
    ]
}

2-compute-node-91times-8concurrency-vif-plugging-timeout-0

+---------------------------------------------------------------------------------------------+
|                                    Response Times (sec)                                     |
+-------------------+--------+--------+--------+---------+---------+--------+---------+-------+
| action            | min    | median | 90%ile | 95%ile  | max     | avg    | success | count |
+-------------------+--------+--------+--------+---------+---------+--------+---------+-------+
| nova.boot_server  | 24.637 | 32.123 | 45.629 | 117.858 | 152.226 | 41.814 | 100.0%  | 91    |
| nova.list_servers | 0.412  | 1.23   | 1.741  | 1.84    | 2.632   | 1.2    | 100.0%  | 91    |
| total             | 26.317 | 33.458 | 46.451 | 118.356 | 152.743 | 43.015 | 100.0%  | 91    |
+-------------------+--------+--------+--------+---------+---------+--------+---------+-------+
Load duration: 500.732126951
Full duration: 652.159901142

The results above show that a success rate 100% for the max amount of guest instances of 91 when running with 2 compute node. The value of 91 guest instances is calculated using nova hypervisor-stats to correctly calculate the supported guest instance value. Further investigating the results, it is easy to see that not only is the 100% success rate achieved, but the boot times are significantly lower than the boot-storm results as launching a lower amount of guest instances simultaneously decreases the amount of I/O stress put on the Ceph nodes.

The same applies with the results of 3 compute nodes and 4 compute nodes. Both are able to achieve 100% success rate in achieving the max guests with a lower boot time for each guest instance due to launching a much smaller amount of guest instances simultaneously.

3-compute-node-137times-8concurrency-vif-plugging-timeout-0

+--------------------------------------------------------------------------------------------+
|                                    Response Times (sec)                                    |
+-------------------+--------+--------+--------+--------+---------+--------+---------+-------+
| action            | min    | median | 90%ile | 95%ile | max     | avg    | success | count |
+-------------------+--------+--------+--------+--------+---------+--------+---------+-------+
| nova.boot_server  | 26.235 | 31.75  | 43.451 | 48.708 | 132.845 | 36.095 | 100.0%  | 137   |
| nova.list_servers | 0.391  | 1.598  | 2.481  | 2.664  | 2.826   | 1.56   | 100.0%  | 137   |
| total             | 27.269 | 33.148 | 44.328 | 49.962 | 133.556 | 37.655 | 100.0%  | 137   |
+-------------------+--------+--------+--------+--------+---------+--------+---------+-------+
Load duration: 659.672354221
Full duration: 824.834990025

4-compute-node-183times-8concurrency-vif-plugging-timeout-0

+--------------------------------------------------------------------------------------------+
|                                    Response Times (sec)                                    |
+-------------------+--------+--------+--------+--------+---------+--------+---------+-------+
| action            | min    | median | 90%ile | 95%ile | max     | avg    | success | count |
+-------------------+--------+--------+--------+--------+---------+--------+---------+-------+
| nova.boot_server  | 25.672 | 32.315 | 46.114 | 49.741 | 145.58  | 35.707 | 100.0%  | 183   |
| nova.list_servers | 0.388  | 1.884  | 3.082  | 3.236  | 5.096   | 1.933  | 100.0%  | 183   |
| total             | 27.055 | 34.53  | 47.828 | 51.143 | 146.348 | 37.642 | 100.0%  | 183   |
+-------------------+--------+--------+--------+--------+---------+--------+---------+-------+
Load duration: 870.662543058
Full duration: 1055.67080402