Chapter 4. Database Sizing and Optimization

As discussed in Chapter 2, Architecture CloudForms 4.5 uses a PostgreSQL 9.5 database as its VMDB back-end, and the performance of this database is critical to the overall smooth-running of the CloudForms installation. This section discusses techniques to ensure that the database is installed and configured correctly, and optimally tuned.

4.1. Sizing the Database Appliance

The database server or appliance should be sized according to the anticipated scale of the CloudForms deployment. A minimum of 8 GBytes memory is recommended for most installations, but the required number of vCPUs varies with the number of worker processes accessing the database. An initial estimate can be made based on the anticipated idle load of the CFME appliances in the region.

Some earlier investigation into the database load generated by idle CFME appliances is published in Appendix A, Database Appliance CPU Count. To determine the number of CFME appliances required for a region, the total anticipated number of managed objects (VMs, hosts, clusters, datastores etc.) should first be established. Using the appliance to VM ratios suggested in Table 3.2, “Objects per CFME Appliance Guidelines”, the CFME appliance count can be estimated. For example a Red Hat Virtualization virtual infrastructure containing 3000 managed objects would need 10 CFME appliances in default configuration to handle a typical workload.

It can be seen from the table in Appendix A, Database Appliance CPU Count that a region containing 10 idle CFME appliances would need a database server with 4 vCPUs to maintain CPU utilisation at under 25%. Adding the provider(s) and enabling the various server roles on the appliances will increase the database load, and so CPU, memory and I/O utilisation must be monitored. A CPU utilisation of 75-80% should be considered a maximum for the database server, and an indication that any of the following should be increased:

  • Real memory
  • vCPUs
  • The value of shared_buffers (see below)

4.2. Sizing the Database Partition Before Installation

The disk used for the database partition should be presented to the database appliance from the fastest available block storage. Sparsely allocated, thin provisioned or file-backed storage such as NFS are not recommended for optimum performance.

A rough estimate of database size in GBytes can be made by approximating the number of managed VMs, hosts and storage devices in the region over a one and two year period.[11]

The following guidelines can be used:

After 1 year:

Database size (GBytes) =

(VM Count * 0.035) + (Host count * 0.0002) + (Storage Count * 0.001)

After 2 years:

Database size (GBytes) =

(VM Count * 0.055) + (Host count * 0.0002) + (Storage Count * 0.0015)

As an example, given an installation that is projected to manage 1500 VMs, with 20 hypervisors and 25 storage domains, the estimated database size would be:

(1500 * 0.035) + (20 * 0.0002) + (25 * 0.001) = 52.57 GBytes after 1 year

(1500 * 0.055) + (20 * 0.0002) + (25 * 0.0015) = 82.58 GBytes after 2 years

4.3. Installation

The first CFME appliance in a region can be configured as a database appliance with or without Rails. The database is created using appliance_console by selecting the following option:

 5) Configure Database

After creating a new encryption key and selecting to create an internal database, the following question is asked:

Should this appliance run as a standalone database server?

NOTE:
* The CFME application will not be running.
* This is required when using highly available database deployments.
* CAUTION: This is not reversible.

Selecting 'Y' to this option configures the appliance without Rails or the evmserverd application, and allows the server to be configured optimally as a pure PostgreSQL server. This configuration also allows for PostgreSQL high availability to be configured at any time in the future using the following appliance_console option:

6) Configure Database Replication
Note

Although configuring a CFME appliance as a dedicated database instance will result in optimum database performance, the absence of Rails and the evmserverd service and workers means that the server will not appear in the CloudForms WebUI as a CFME appliance in the region.

If it is known that PostgreSQL high availability will never be required (and at the expense of some loss of memory and CPU utilisation), the answer 'N' can be given to the question Should this appliance run as a standalone database server?. In this case the VMDB appliance will be configured to run Rails and the evmserverd service as normal, and will appear as a CloudForms appliance in the region.

It is recommended that this type of VMDB appliance be isolated in its own dedicated zone, and that any unnecessary server roles are disabled.

4.3.1. Configuring PostgreSQL

The PostgreSQL configuration file on a CFME appliance is /var/opt/rh/rh-postgresql95/lib/pgsql/data/postgresql.conf. Some of the values in this file have been defined based on the assumption of a small CloudForms installation. For larger installations - particularly when using a dedicated database instance - these values should be changed.

4.3.1.1. Shared Buffers

The most important tuning parameter to set is the value for shared_buffers. The default value from the configuration file is as follows:

shared_buffers = 128MB       # MIQ Value SHARED CONFIGURATION
#shared_buffers = 1GB        # MIQ Value DEDICATED CONFIGURATION

For a dedicated PostgreSQL server this should be set to 25% of the real memory on the database appliance, but not more than a maximum of 4GB. This allows many more of the dense index pages and actively accessed tables to reside in memory, significantly reducing physical I/O and improving overall performance. 

4.3.1.2. Max Connections

Each worker process on a CFME appliance in a region opens a connection to the database. There are typically between 20 and 30 worker sessions per appliance in default configuration, and so a region with 20 CFME appliances will open approximately 500 connections. The default value for max_connections in the configuration file is as follows:

max_connections = 1000                  # MIQ Value;
#max_connections = 100

Although this default value allows for 1000 connections, under certain extreme load conditions CFME appliance workers can be killed but their sessions not terminated. In this situation the number of connections can rise above the expected value.

Tip

The number of open connections to the database can be seen using the following psql command:

SELECT datname,application_name,client_addr FROM pg_stat_activity;

The number of outbound database connections from a CFME appliance can be seen using the following bash command:

netstat -tp | grep postgres

It may be necessary to increase the value for max_connections if the default number is being exceeded.

4.3.1.3. Log Directory

By default the block device used for the database partition is used for the PGDATA directories and files, and also the postgresql.log log file (this is the text log file, not the database write-ahead log). Moving the log file to a separate partition allows the PGDATA block device to be used exclusively for database I/O, which can improve performance. The default value for log_directory in the configuration file is as follows:

#log_directory = 'pg_log'      # directory where log files are written,
                               # can be absolute or relative to PGDATA

This value creates the log file as /var/opt/rh/rh-postgresql95/lib/pgsql/data/pg_log/postgresql.log. To use the default CFME log directory for the log file, change this line to be:

log_directory = '/var/www/miq/vmdb/log'

4.3.1.4. Huge Pages

For VMDB appliances configured as dedicated database instances, some performance gain can be achieved by creating sufficient kernel huge pages for PostgreSQL and the configured shared_buffers region. The following bash commands allocate 600 huge pages (1.2 GBytes):

sysctl -w vm.nr_hugepages=600
echo "vm.nr_hugepages=600" >> /etc/sysctl.d/rh-postgresql95.conf

The default setting for PostgreSQL 9.5 is to use huge pages if they are available, and so no further PostgreSQL configuration is necessary.

4.4. Maintaining Performance

Several of the database tables benefit greatly from regular vacuuming and frequent re-indexing, and database maintenance scripts can be added to cron to perform these functions.[12]

On a CFME 5.8 appliance these scripts can be installed using the following appliance_console option:

 7) Configure Database Maintenance

The scripts perform hourly reindexing of the following tables:

  • metrics_00 to metrics_23 (one per hour)
  • miq_queue
  • miq_workers

The scripts perform weekly or monthly vacuuming of the following tables:

  • vms
  • binary_blob_parts
  • binary_blobs
  • customization_specs
  • firewall_rules
  • hosts
  • storages
  • miq_schedules
  • event_logs
  • policy_events
  • snapshots
  • jobs
  • networks
  • miq_queue
  • miq_request_tasks
  • miq_workers
  • miq_servers
  • miq_searches
  • miq_scsi_luns
  • miq_scsi_targets
  • storage_files
  • taggings
  • vim_performance_states

4.5. Resizing the Database Directory After Installation

It is sometimes the case that a managed virtual infrastructure or cloud grows at a faster rate than anticipated. As a result the CloudForms database mount point may need expanding from its initial size to allow the database to grow further.

The database mount point /var/opt/rh/rh-postgresql95/lib/pgsql is a logical volume formatted as XFS. A new disk can be presented to the database appliance and added to LVM to allow the filesystem to grow.

Note

Some virtual or cloud infrastructures don’t support the 'hot' adding of a new disk to a virtual machine that is powered on. It may be necessary to stop the evmserverd service on all CFME appliances in the region, and shut down the VMDB appliance before adding the new disk.

The following steps illustrate the procedure to add an additional 10 GBytes of storage (a new disk /dev/vdd) to the database mount point:

# label the new disk
parted /dev/vdd mklabel msdos

# partition the disk
parted /dev/vdd mkpart primary 2048s 100%

# create an LVM physical volume
pvcreate /dev/vdd1
  Physical volume "/dev/vdd1" successfully created.

# add the new physical volume to the vg_pg volume group
vgextend vg_pg /dev/vdd1
  Volume group "vg_pg" successfully extended

# determine the number of free extents in the volume group
vgdisplay vg_pg
  --- Volume group ---
  VG Name               vg_pg
  System ID
  ...
  VG Size               19.99 GiB
  PE Size               4.00 MiB
  Total PE              5118
  Alloc PE / Size       2559 / 10.00 GiB
  Free  PE / Size       2559 / 10.00 GiB
  VG UUID               IjKZmo-retr-qJ9f-WCdg-gzrc-jbl3-i52mUn

# extend the logical volume by the number of free extents
lvextend -l +2559 /dev/vg_pg/lv_pg
  Size of logical volume vg_pg/lv_pg changed from 10.00 GiB ⏎
       (2559 extents) to 19.99 GiB (5118 extents).
  Logical volume vg_pg/lv_pg successfully resized.

# grow the filesystem to fill the logical volume
xfs_growfs /var/opt/rh/rh-postgresql95/lib/pgsql
meta-data=/dev/mapper/vg_pg-lv_pg isize=256   ...
         =                       sectsz=512   ...
         =                       crc=0        ...
data     =                       bsize=4096   ...
         =                       sunit=0      ...
naming   =version 2              bsize=4096   ...
log      =internal               bsize=4096   ...
         =                       sectsz=512   ...
realtime =none                   extsz=4096   ...
data blocks changed from 2620416 to 5240832


[11] These sizing estimates have been generated from real-world VMDB usage statistics gathered from earlier versions of ManageIQ/CloudForms, managing virtual infrastructures such as VMware. To date insufficient data has been gathered for comparable sizing estimates of CloudForms installations that primarily manage OpenShift Container Platforms
[12] See https://access.redhat.com/solutions/1419333 (Continuous Maintenance for CloudForms Management Engine VMDB to maintain Responsiveness)