Chapter 5. Inventory Refresh
One of the biggest factors that affects the perceived performance of a large CloudForms installation is the time taken to update the provider inventory in the VMDB. This is known as an EMS refresh. There are two types of EMS refresh: a full refresh, where all objects are returned from the provider; and a targeted refresh, where only the details of requested components such as specific VMs or hosts are fetched and processed. In CloudForms Management Engine 5.9 the OpenShift Container Manager provider always performs a full refresh.
5.1. Refresh Overview
Whenever CloudForms is notified of a change related to a managed object, a message is queued for a full EMS refresh. There is never more than one EMS refresh operation in progress for each provider at any one time, with at most one further refresh queued.
If a new refresh is called for, the miq_queue table is first examined to see if a refresh message already exists in the "ready" state for the intended EMS. If no such message already exists, a new one is created. If a message already exists and it is for a full refresh, the new request is ignored.
5.2. Challenges of Scale
As might be expected, the more managed objects in an OpenShift Container Platform cluster, the longer a full refresh takes to complete. Tests on an OpenShift Container Platform cluster containing 3000 pods (each with 1 container) and 3000 images have shown that a full refresh of a cluster of this size takes between 50 and 70 seconds, depending on master node and VMDB database activity.
The refresh time has a knock-on effect for the process or workflow that initiated the refresh. In most cases this is inconvenient but not critical, such as a delay in seeing a container’s status change in the WebUI. In other cases however - particularly when using an automate workflow - a very long EMS refresh may cause the triggering workflow to timeout and exit with an error condition.
5.3. Monitoring Refresh Performance
A refresh operation has two significant phases that each contribute to the overall performance:
Extracting and parsing the data from OpenShift Container Platform
- Network latency to the OpenShift Container Platform Master node
- Time waiting for the Master node to process the request and return data
- CPU cycles parsing the returned data
Updating the inventory in the VMDB
- Network latency to the database
- Database appliance CPU, memory and I/O resources
Fortunately the line printed to evm.log at the completion of the operation contains detailed timings of each stage of the operation, and these can be used to determine bottlenecks. A typical log line is as follows:
... MIQ(ManageIQ::Providers::Openshift::ContainerManager::Refresher#refresh) ⏎
EMS: [OpenShift], id: [1000000000004] Refreshing targets for EMS...Complete ⏎
- Timings { ⏎
:collect_inventory_for_targets=>4.851187705993652, ⏎
:parse_targeted_inventory=>4.859905481338501, ⏎
:save_inventory=>5.751120328903198, ⏎
:manager_refresh_post_processing=>1.8835067749023438e-05, ⏎
:ems_refresh=>15.463248014450073}The timing values are described as follows:
- :collect_inventory_for_targets - the time taken to extract the inventory from the OpenShift Container Platform master node.
- :parse_targeted_inventory - the time taken to parse the inventory data
- :save_inventory - the time taken to save or update the inventory into the VMDB
- :manager_refresh_post_processing - the time taken to batch process some post-processing actions
- :ems_refresh - the total time to perform the refresh
Extracting the timings[10] from the log line shown above reveals the following performance values:
Refresh timings: collect_inventory_for_targets: 4.851188 seconds parse_targeted_inventory: 4.859905 seconds save_inventory: 5.751120 seconds manager_refresh_post_processing: 0.000019 seconds ems_refresh: 15.463248 seconds
This shows that two of the significant time components to this operation were extracting and parsing the inventory from the OpenShift Container Platform Master node (4.85 seconds), and loading the data into the database (5.75 seconds).
5.4. Identifying Refresh Problems
Refresh problems are best identified by establishing baseline timings when the managed OpenShift Container Platform cluster is least busy. To determine the relative data collection and database load times, the ':collect_inventory_for_targets' and ':save_inventory' timing counters from evm.log can be plotted. For this example the cfme-log-parsing/ems_refresh_timings.rb script is used, as follows:
ruby ~/git/cfme-log-parsing/ems_refresh_timings.rb ⏎
-i evm.log -o ems_refresh_timings.out
grep collect_inventory_for_targets ems_refresh_timings.out | ⏎
awk '{print $2}' > parse_targeted_inventory.txt
grep save_inventory ems_refresh_timings.out | ⏎
awk '{print $2}' > save_inventory.txtA significant increase or wide variation in data extraction times from this baseline can indicate that the master node is experiencing high load and not responding quickly to API requests.
Some variation in database load times throughout a 24 hour period is expected, but sustained periods of long load times can indicate that the database is overloaded.
5.5. Tuning Refresh
There is little CloudForms tuning that can be done to improve the data extraction time of a refresh. If the extraction times vary significantly throughout the day then some investigation into the performance of the EMS itself may be warranted.
If database load times are high, then CPU, memory and I/O load on the database appliance should be investigated and if necessary tuned. The top_output.log and vmstat_output.log files in /var/www/miq/vmdb/log on the database appliance can be used to correlate the times of high CPU and memory demand against the long database load times.
5.5.1. Configuration
The :ems_refresh section of the Configuration → Advanced settings is listed as follows:
:ems_refresh:
...
:openshift:
:refresh_interval: 15.minutes
:inventory_object_refresh: true
:inventory_collections:
:saver_strategy: :batch
:get_container_images: true
:store_unused_images: true
Most of the values are not intended to be user-customisable, but the following two settings can be changed to false if required to improve performance.
5.5.1.1. Get Container Images
The :get_container_images value defines whether or not container images should be included in the inventory collection. If an OpenShift Container Platform installation has many thousands of container images the inventory refresh can take a long time, and performance can be improved by not collecting inventory data on the images.
5.5.1.2. Store Unused Images
The :store_unused_images value defines whether to save metadata - for example labels - on all container images (the default), or only for "used" images that are referenced by active pods.

Where did the comment section go?
Red Hat's documentation publication system recently went through an upgrade to enable speedier, more mobile-friendly content. We decided to re-evaluate our commenting platform to ensure that it meets your expectations and serves as an optimal feedback mechanism. During this redesign, we invite your input on providing feedback on Red Hat documentation via the discussion platform.