Red Hat Training

A Red Hat training course is available for RHEL 8

Chapter 4. Testing VDO performance

You can perform a series of tests to measure VDO performance, obtain a performance profile of your system with VDO, and determine which applications perform well with VDO.

Prerequisites

  • One or more Linux physical block devices are available.
  • The target block device (for example, /dev/sdb) is larger than 512 GiB.
  • Flexible I/O Tester (fio) is installed.
  • VDO is installed.

4.1. Preparing an environment for VDO performance testing

Before testing VDO performance, you must consider the host system configuration, VDO configuration, and the workloads that will be used during testing. These choices affect the benchmarking of space efficiency, bandwidth, and latency.

To prevent one test from affecting the results of another, you must create a new VDO volume for each iteration of each test.

4.1.1. Considerations before testing VDO performance

The following conditions and configurations affect the VDO test results:

System configuration

  • Number and type of CPU cores available. You can list this information using the taskset utility.
  • Available memory and total installed memory
  • Configuration of storage devices
  • Active disk scheduler
  • Linux kernel version
  • Packages installed

VDO configuration

  • Partitioning scheme
  • File systems used on VDO volumes
  • Size of the physical storage assigned to a VDO volume
  • Size of the logical VDO volume created
  • Sparse or dense UDS indexing
  • UDS Index in memory size
  • VDO thread configuration

Workloads

  • Types of tools used to generate test data
  • Number of concurrent clients
  • The quantity of duplicate 4 KiB blocks in the written data
  • Read and write patterns
  • The working set size

4.1.2. Special considerations for testing VDO read performance

You must consider these additional factors before testing VDO read performance:

  • If a 4 KiB block has never been written, VDO does not read from the storage and immediately responds with a zero block.
  • If a 4 KiB block has been written but contains all zeros, VDO does not read from the storage and immediately responds with a zero block.

This behavior results in very fast read performance when there is no data to read. This is why read tests must prefill the volume with actual data.

4.1.3. Preparing the system for testing VDO performance

This procedure configures system settings to achieve optimal VDO performance during testing.

Important

Testing beyond the bounds listed in any particular test might result in the loss of testing time due to abnormal results.

For example, the VDO tests describe a test that conducts random reads over a 100 GiB address range. To test a working set of 500 GiB, you must increase the amount of RAM allocated for the VDO block map cache accordingly.

Procedure

  1. Ensure that your CPU is running at its highest performance setting.
  2. If possible, disable CPU frequency scaling using the BIOS configuration or the Linux cpupower utility.
  3. If possible, enable dynamic processor frequency adjustment (Turbo Boost or Turbo Core) for the CPU. This feature introduces some variability in the test results, but improves overall performance.
  4. File systems might have unique impacts on performance. They often skew performance measurements, making it harder to isolate the impact of VDO on the results.

    If reasonable, measure performance on the raw block device. If this is not possible, format the device using the file system that VDO will use in the target implementation.

4.2. Creating a VDO volume for performance testing

This procedure creates a VDO volume with a logical size of 1 TiB on a 512 GiB physical volume for testing VDO performance.

Procedure

  • Create a VDO volume:

    # vdo create --name=vdo-test \
                 --device=/dev/sdb \
                 --vdoLogicalSize=1T \
                 --writePolicy=policy \
                 --verbose
    • Replace /dev/sdb with the path to a block device.
    • To test the VDO async mode on top of asynchronous storage, create an asynchronous volume using the --writePolicy=async option.
    • To test the VDO sync mode on top of synchronous storage, create a synchronous volume using the --writePolicy=sync option.

4.3. Cleaning up the VDO performance testing volume

This procedure removes the VDO volume used for testing VDO performance from the system.

Prerequisites

  • A VDO test volume exists on the system.

Procedure

  • Remove the VDO test volume from the system:

    # vdo remove --name=vdo-test

Verification steps

  • Verify that the volume has been removed:

    # vdo list --all | grep vdo-test

    The command should not list the VDO test partition.

4.4. Testing the effects of I/O depth on VDO performance

These tests determine the I/O depth that produces the optimal throughput and the lowest latency for your VDO configuration. I/O depth represents the number of I/O requests that the fio tool submits at a time.

Because VDO uses a 4 KiB sector size, the tests perform four-corner testing at 4 KiB I/O operations, and I/O depth of 1, 8, 16, 32, 64, 128, 256, 512, and 1024.

4.4.1. Testing the effect of I/O depth on sequential 100% reads in VDO

This test determines how sequential 100% read operations perform on a VDO volume at different I/O depth values.

Procedure

  1. Create a new VDO volume.

    For details, see Section 4.2, “Creating a VDO volume for performance testing”.

  2. Prefill any areas that the test might access by performing a write fio job on the test volume:

    # fio --rw=write \
          --bs=8M \
          --name=vdo \
          --filename=/dev/mapper/vdo-test \
          --ioengine=libaio \
          --thread \
          --direct=1 \
          --scramble_buffers=1
  3. Record the reported throughput and latency for sequential 100% reads:

    # for depth in 1 2 4 8 16 32 64 128 256 512 1024 2048; do
      fio --rw=read \
          --bs=4096 \
          --name=vdo \
          --filename=/dev/mapper/vdo-test \
          --ioengine=libaio \
          --numjobs=1 \
          --thread \
          --norandommap \
          --runtime=300 \
          --direct=1 \
          --iodepth=$depth \
          --scramble_buffers=1 \
          --offset=0 \
          --size=100g
      done
  4. Remove the VDO test volume.

    For details, see Section 4.3, “Cleaning up the VDO performance testing volume”.

4.4.2. Testing the effect of I/O depth on sequential 100% writes in VDO

This test determines how sequential 100% write operations perform on a VDO volume at different I/O depth values.

Procedure

  1. Create a new VDO test volume.

    For details, see Section 4.2, “Creating a VDO volume for performance testing”.

  2. Prefill any areas that the test might access by performing a write fio job:

    # fio --rw=write \
          --bs=8M \
          --name=vdo \
          --filename=/dev/mapper/vdo-test \
          --ioengine=libaio \
          --thread \
          --direct=1 \
          --scramble_buffers=1
  3. Record the reported throughput and latency for sequential 100% writes:

    # for depth in 1 2 4 8 16 32 64 128 256 512 1024 2048; do
      fio --rw=write \
          --bs=4096 \
          --name=vdo \
          --filename=/dev/mapper/vdo-test \
          --ioengine=libaio \
          --numjobs=1 \
          --thread \
          --norandommap \
          --runtime=300 \
          --direct=1 \
          --iodepth=$depth \
          --scramble_buffers=1 \
          --offset=0 \
          --size=100g
      done
  4. Remove the VDO test volume.

    For details, see Section 4.3, “Cleaning up the VDO performance testing volume”.

4.4.3. Testing the effect of I/O depth on random 100% reads in VDO

This test determines how random 100% read operations perform on a VDO volume at different I/O depth values.

Procedure

  1. Create a new VDO test volume.

    For details, see Section 4.2, “Creating a VDO volume for performance testing”.

  2. Prefill any areas that the test might access by performing a write fio job:

    # fio --rw=write \
          --bs=8M \
          --name=vdo \
          --filename=/dev/mapper/vdo-test \
          --ioengine=libaio \
          --thread \
          --direct=1 \
          --scramble_buffers=1
  3. Record the reported throughput and latency for random 100% reads:

    # for depth in 1 2 4 8 16 32 64 128 256 512 1024 2048; do
      fio --rw=randread \
          --bs=4096 \
          --name=vdo \
          --filename=/dev/mapper/vdo-test \
          --ioengine=libaio \
          --numjobs=1 \
          --thread \
          --norandommap \
          --runtime=300 \
          --direct=1 \
          --iodepth=$depth \
          --scramble_buffers=1 \
          --offset=0 \
          --size=100g
      done
  4. Remove the VDO test volume.

    For details, see Section 4.3, “Cleaning up the VDO performance testing volume”.

4.4.4. Testing the effect of I/O depth on random 100% writes in VDO

This test determines how random 100% write operations perform on a VDO volume at different I/O depth values.

Important

You must recreate the VDO volume between each I/O depth test run.

Procedure

Perform the following series of steps separately for the I/O depth values of 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, and 2048:

  1. Create a new VDO test volume.

    For details, see Section 4.2, “Creating a VDO volume for performance testing”.

  2. Prefill any areas that the test might access by performing a write fio job:

    # fio --rw=write \
          --bs=8M \
          --name=vdo \
          --filename=/dev/mapper/vdo-test \
          --ioengine=libaio \
          --thread \
          --direct=1 \
          --scramble_buffers=1
  3. Record the reported throughput and latency for random 100% writes:

    # fio --rw=randwrite \
          --bs=4096 \
          --name=vdo \
          --filename=/dev/mapper/vdo-test \
          --ioengine=libaio \
          --numjobs=1 \
          --thread \
          --norandommap \
          --runtime=300 \
          --direct=1 \
          --iodepth=depth-value
          --scramble_buffers=1 \
          --offset=0 \
          --size=100g
      done
  4. Remove the VDO test volume.

    For details, see Section 4.3, “Cleaning up the VDO performance testing volume”.

4.4.5. Analysis of VDO performance at different I/O depths

The following example analyses VDO throughput and latency recorded at different I/O depth values.

Watch the behavior across the range and the points of inflection where increased I/O depth provides diminishing throughput gains. Sequential access and random access probably peak at different values, but the peaks might be different for all types of storage configurations.

Example 4.1. I/O depth analysis

Figure 4.1. VDO throughput analysis

VDO throughput analysis

Notice the "knee" in each performance curve:

  • Marker 1 identifies the peak sequential throughput at point X. This particular configuration does not benefit from sequential 4 KiB I/O depth larger than X.
  • Marker 2 identifies peak random 4 KiB throughput at point Z. This particular configuration does not benefit from random 4 KiB I/O depth larger than Z.

Beyond the I/O depth at points X and Z, there are diminishing bandwidth gains, and average request latency increases 1:1 for each additional I/O request.

The following image shows an example of the random write latency after the "knee" of the curve in the previous graph. You should test at these points for maximum throughput that incurs the least response time penalty.

Figure 4.2. VDO latency analysis

VDO latency analysis

Optimal I/O depth

Point Z marks the optimal I/O depth. The test plan collects additional data with I/O depth equal to Z.

4.5. Testing the effects of I/O request size on VDO performance

Using these tests, you can identify the block size that produces the best performance of VDO at the optimal I/O depth.

The tests perform four-corner testing at a fixed I/O depth, with varied block sizes over the range of 8 KiB to 1 MiB.

Prerequisites

4.5.1. Testing the effect of I/O request size on sequential writes in VDO

This test determines how sequential write operations perform on a VDO volume at different I/O request sizes.

Procedure

  1. Create a new VDO volume.

    For details, see Section 4.2, “Creating a VDO volume for performance testing”.

  2. Prefill any areas that the test might access by performing a write fio job on the test volume:

    # fio --rw=write \
          --bs=8M \
          --name=vdo \
          --filename=/dev/mapper/vdo-test \
          --ioengine=libaio \
          --thread \
          --direct=1 \
          --scramble_buffers=1
  3. Record the reported throughput and latency for the sequential write test:

    # for iosize in 4 8 16 32 64 128 256 512 1024; do
      fio --rw=write \
          --bs=${iosize}k \
          --name=vdo \
          --filename=/dev/mapper/vdo-test \
          --ioengine=libaio \
          --numjobs=1 \
          --thread \
          --norandommap \
          --runtime=300 \
          --direct=1 \
          --iodepth=optimal-depth \
          --scramble_buffers=1 \
          --offset=0 \
          --size=100g
      done
  4. Remove the VDO test volume.

    For details, see Section 4.3, “Cleaning up the VDO performance testing volume”.

4.5.2. Testing the effect of I/O request size on random writes in VDO

This test determines how random write operations perform on a VDO volume at different I/O request sizes.

Important

You must recreate the VDO volume between each I/O request size test run.

Procedure

Perform the following series steps separately for the I/O request sizes of 4k, 8k, 16k, 32k, 64k, 128k, 256k, 512k, and 1024k:

  1. Create a new VDO volume.

    For details, see Section 4.2, “Creating a VDO volume for performance testing”.

  2. Prefill any areas that the test might access by performing a write fio job on the test volume:

    # fio --rw=write \
          --bs=8M \
          --name=vdo \
          --filename=/dev/mapper/vdo-test \
          --ioengine=libaio \
          --thread \
          --direct=1 \
          --scramble_buffers=1
  3. Record the reported throughput and latency for the random write test:

    # fio --rw=randwrite \
          --bs=request-size \
          --name=vdo \
          --filename=/dev/mapper/vdo-test \
          --ioengine=libaio \
          --numjobs=1 \
          --thread \
          --norandommap \
          --runtime=300 \
          --direct=1 \
          --iodepth=optimal-depth \
          --scramble_buffers=1 \
          --offset=0 \
          --size=100g
      done
  4. Remove the VDO test volume.

    For details, see Section 4.3, “Cleaning up the VDO performance testing volume”.

4.5.3. Testing the effect of I/O request size on sequential read in VDO

This test determines how sequential read operations perform on a VDO volume at different I/O request sizes.

Procedure

  1. Create a new VDO volume.

    For details, see Section 4.2, “Creating a VDO volume for performance testing”.

  2. Prefill any areas that the test might access by performing a write fio job on the test volume:

    # fio --rw=write \
          --bs=8M \
          --name=vdo \
          --filename=/dev/mapper/vdo-test \
          --ioengine=libaio \
          --thread \
          --direct=1 \
          --scramble_buffers=1
  3. Record the reported throughput and latency for the sequential read test:

    # for iosize in 4 8 16 32 64 128 256 512 1024; do
      fio --rw=read \
          --bs=${iosize}k \
          --name=vdo \
          --filename=/dev/mapper/vdo-test \
          --ioengine=libaio \
          --numjobs=1 \
          --thread \
          --norandommap \
          --runtime=300 \
          --direct=1 \
          --iodepth=optimal-depth \
          --scramble_buffers=1 \
          --offset=0 \
          --size=100g
      done
  4. Remove the VDO test volume.

    For details, see Section 4.3, “Cleaning up the VDO performance testing volume”.

4.5.4. Testing the effect of I/O request size on random read in VDO

This test determines how random read operations perform on a VDO volume at different I/O request sizes.

Procedure

  1. Create a new VDO volume.

    For details, see Section 4.2, “Creating a VDO volume for performance testing”.

  2. Prefill any areas that the test might access by performing a write fio job on the test volume:

    # fio --rw=write \
          --bs=8M \
          --name=vdo \
          --filename=/dev/mapper/vdo-test \
          --ioengine=libaio \
          --thread \
          --direct=1 \
          --scramble_buffers=1
  3. Record the reported throughput and latency for the random read test:

    # for iosize in 4 8 16 32 64 128 256 512 1024; do
      fio --rw=read \
          --bs=${iosize}k \
          --name=vdo \
          --filename=/dev/mapper/vdo-test \
          --ioengine=libaio \
          --numjobs=1 \
          --thread \
          --norandommap \
          --runtime=300 \
          --direct=1 \
          --iodepth=optimal-depth \
          --scramble_buffers=1 \
          --offset=0 \
          --size=100g
      done
  4. Remove the VDO test volume.

    For details, see Section 4.3, “Cleaning up the VDO performance testing volume”.

4.5.5. Analysis of VDO performance at different I/O request sizes

The following example analyses VDO throughput and latency recorded at different I/O request sizes.

Example 4.2. I/O request size analysis

Figure 4.3. Request size versus throughput analysis and key inflection points

Request size versus throughput analysis and key inflection points

Analyzing the example results:

  • Sequential writes reach a peak throughput at request size Y.

    This curve demonstrates how applications that are configurable or naturally dominated by certain request sizes might perceive performance. Larger request sizes often provide more throughput because 4 KiB I/O operations might benefit from merging.

  • Sequential reads reach a similar peak throughput at point Z.

    After these peaks, the overall latency before the I/O operation completes increases with no additional throughput. You should tune the device to not accept I/O operations larger than this size.

  • Random reads achieve peak throughput at point X.

    Certain devices might achieve near-sequential throughput rates at large request size random accesses, but others suffer more penalty when varying from purely sequential access.

  • Random writes achieve peak throughput at point Y.

    Random writes involve the most interaction of a deduplication device, and VDO achieves high performance especially when request sizes or I/O depths are large.

4.6. Testing the effects of mixed I/O loads on VDO performance

This test determines how your VDO configuration behaves with mixed read and write I/O loads, and analyzes the effects of mixed reads and writes at the optimal random queue depth and request sizes from 4 KB to 1 MB.

This procedure performs four-corner testing at fixed I/O depth, varied block size over the 8 KB to 256 KB range, and set read percentage at 10% increments, beginning with 0%.

Prerequisites

Procedure

  1. Create a new VDO volume.

    For details, see Section 4.2, “Creating a VDO volume for performance testing”.

  2. Prefill any areas that the test might access by performing a write fio job on the test volume:

    # fio --rw=write \
          --bs=8M \
          --name=vdo \
          --filename=/dev/mapper/vdo-test \
          --ioengine=libaio \
          --thread \
          --direct=1 \
          --scramble_buffers=1
  3. Record the reported throughput and latency for the read and write input stimulus:

    # for readmix in 0 10 20 30 40 50 60 70 80 90 100; do
        for iosize in 4 8 16 32 64 128 256 512 1024; do
          fio --rw=rw \
              --rwmixread=$readmix \
              --bs=${iosize}k \
              --name=vdo \
              --filename=/dev/mapper/vdo-test \
              --ioengine=libaio \
              --numjobs=1 \
              --thread \
              --norandommap \
              --runtime=300 \
              --direct=0 \
              --iodepth=optimal-depth \
              --scramble_buffers=1 \
              --offset=0 \
              --size=100g
        done
      done
  4. Remove the VDO test volume.

    For details, see Section 4.3, “Cleaning up the VDO performance testing volume”.

  5. Graph the test results.

    Example 4.3. Mixed I/O loads analysis

    The following image shows an example of how VDO might respond to mixed I/O loads:

    Figure 4.4. Performance is consistent across varying read and write mixes

    Performance is consistent across varying read and write mixes

    Aggregate performance and aggregate latency are relatively consistent across the range of mixing reads and writes, trending from the lower maximum write throughput to the higher maximum read throughput.

    This behavior might vary with different storage, but the important observation is that the performance is consistent under varying loads or that you can understand performance expectation for applications that demonstrate specific read and write mixes.

    Note

    If your system does not show a similar response consistency, it might be a sign of a sub-optimal configuration. Contact your Red Hat Sales Engineer if this occurs.

4.7. Testing the effects of application environments on VDO performance

These tests determine how your VDO configuration behaves when deployed in a mixed, real application environment. If you know more details about the expected environment, test them as well.

Prerequisites

  • Consider limiting the permissible queue depth on your configuration.
  • If possible, tune the application to issue requests with the block sizes that are the most beneficial to VDO performance.

Procedure

  1. Create a new VDO volume.

    For details, see Section 4.2, “Creating a VDO volume for performance testing”.

  2. Prefill any areas that the test might access by performing a write fio job on the test volume:

    # fio --rw=write \
          --bs=8M \
          --name=vdo \
          --filename=/dev/mapper/vdo-test \
          --ioengine=libaio \
          --thread \
          --direct=1 \
          --scramble_buffers=1
  3. Record the reported throughput and latency for the read and write input stimulus:

    # for readmix in 20 50 80; do
        for iosize in 4 8 16 32 64 128 256 512 1024; do
          fio --rw=rw \
              --rwmixread=$readmix \
              --bsrange=4k-256k \
              --name=vdo \
              --filename=/dev/mapper/vdo-name \
              --ioengine=libaio \
              --numjobs=1 \
              --thread \
              --norandommap \
              --runtime=300 \
              --direct=0 \
              --iodepth=$iosize \
              --scramble_buffers=1 \
              --offset=0 \
              --size=100g
        done
      done
  4. Remove the VDO test volume.

    For details, see Section 4.3, “Cleaning up the VDO performance testing volume”.

  5. Graph the test results.

    Example 4.4. Application environment analysis

    The following image shows an example of how VDO might respond to mixed I/O loads:

    Figure 4.5. Mixed environment performance

    Mixed environment performance

4.8. Options used for testing VDO performance with fio

The VDO tests use the fio utility to synthetically generate data with repeatable characteristics. The following fio options are necessary to simulate real world workloads in the tests:

Table 4.1. Used fio options

ArgumentDescriptionValue used in the tests

--size

The quantity of data that fio sends to the target per job.

See also the --numjobs option.

100 GiB

--bs

The block size of each read-and-write request produced by fio.

Red Hat recommends a 4 KiB block size to match 4 KiB default of VDO.

4k

--numjobs

The number of jobs that fio creates for the benchmark.

Each job sends the amount of data specified by the --size option. The first job sends data to the device at the offset specified by the --offset option. Subsequent jobs overwrite the same region of the disk unless you provide the --offset_increment option, which offsets each job from where the previous job began by that value.

To achieve peak performance on flash disks (SSD), Red Hat recommends at least two jobs. One job is typically enough to saturate rotational disk (HDD) throughput.

1 for HDD, 2 for SSD

--thread

Instructs fio jobs to run in threads rather than to fork, which might provide better performance by limiting context switching.

none

--ioengine

The I/O engine that fio uses for the benchmark.

Red Hat testing uses the asynchronous unbuffered engine called libaio to test workloads where one or more processes are making simultaneous random requests. The libaio engine enables a single thread to make multiple requests asynchronously before it retrieves any data. This limits the number of context switches that a synchronous engine would require if it provided the requests by many threads.

libaio

--direct

The option enables requests submitted to the device to bypass the kernel page cache.

You must use the libaio engine with the --direct option. Otherwise, the kernel uses the sync API for all I/O requests.

1 (libaio)

--iodepth

The number of I/O buffers in flight at any time.

A high value usually increases performance, particularly for random reads or writes. High values ensure that the controller always has requests to batch. However, setting the value too high, (typically greater than 1K, might cause undesirable latency.

Red Hat recommends a value between 128 and 512. The final value is a trade-off and depends on how your application tolerates latency.

128 at minimum

--iodepth_batch_submit

The number of I/O requests to create when the I/O depth buffer pool begins to empty.

This option limits task switching from I/O operations to buffer creation during the test.

16

--iodepth_batch_complete

The number of I/O operations to complete before submitting a batch.

This option limits task switching from I/O operations to buffer creation during the test.

16

--gtod_reduce

Disables time-of-day calls to calculate latency.

This setting lowers throughput if enabled. Enable the option unless you require latency measurement.

1