Chapter 7. Benchmarking Performance

The purpose of this section is to give Ceph administrators a basic understanding of Ceph’s native benchmarking tools. These tools will provide some insight into how the Ceph storage cluster is performing. This is not the definitive guide to Ceph performance benchmarking, nor is it a guide on how to tune Ceph accordingly.

7.1. Performance Baseline

The OSD (including the journal) disks and the network throughput should each have a performance baseline to compare against. You can identify potential tuning opportunities by comparing the baseline performance data with the data from Ceph’s native tools. Red Hat Enterprise Linux has many built-in tools, along with a plethora of open source community tools, available to help accomplish these tasks. For more details about some of the available tools, see this Knowledgebase article.

7.2. Storage Cluster

Ceph includes the rados bench command to do performance benchmarking on a RADOS storage cluster. The command will execute a write test and two types of read tests. The --no-cleanup option is important to use when testing both read and write performance. By default the rados bench command will delete the objects it has written to the storage pool. Leaving behind these objects allows the two read tests to measure sequential and random read performance.

Note

Before running these performance tests, drop all the file system caches by running the following:

# echo 3 | sudo tee /proc/sys/vm/drop_caches && sudo sync
  1. Create a new storage pool:

    # ceph osd pool create testbench 100 100
  2. Execute a write test for 10 seconds to the newly created storage pool:

    # rados bench -p testbench 10 write --no-cleanup

    Example Output

    Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects
     Object prefix: benchmark_data_cephn1.home.network_10510
       sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
         0       0         0         0         0         0         -         0
         1      16        16         0         0         0         -         0
         2      16        16         0         0         0         -         0
         3      16        16         0         0         0         -         0
         4      16        17         1  0.998879         1   3.19824   3.19824
         5      16        18         2   1.59849         4   4.56163   3.87993
         6      16        18         2   1.33222         0         -   3.87993
         7      16        19         3   1.71239         2   6.90712     4.889
         8      16        25         9   4.49551        24   7.75362   6.71216
         9      16        25         9   3.99636         0         -   6.71216
        10      16        27        11   4.39632         4   9.65085   7.18999
        11      16        27        11   3.99685         0         -   7.18999
        12      16        27        11   3.66397         0         -   7.18999
        13      16        28        12   3.68975   1.33333   12.8124   7.65853
        14      16        28        12   3.42617         0         -   7.65853
        15      16        28        12   3.19785         0         -   7.65853
        16      11        28        17   4.24726   6.66667   12.5302   9.27548
        17      11        28        17   3.99751         0         -   9.27548
        18      11        28        17   3.77546         0         -   9.27548
        19      11        28        17   3.57683         0         -   9.27548
     Total time run:         19.505620
    Total writes made:      28
    Write size:             4194304
    Bandwidth (MB/sec):     5.742
    
    Stddev Bandwidth:       5.4617
    Max bandwidth (MB/sec): 24
    Min bandwidth (MB/sec): 0
    Average Latency:        10.4064
    Stddev Latency:         3.80038
    Max latency:            19.503
    Min latency:            3.19824

  3. Execute a sequential read test for 10 seconds to the storage pool:

    # rados bench -p testbench 10 seq

    Example Output

    sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
      0       0         0         0         0         0         -         0
    Total time run:        0.804869
    Total reads made:      28
    Read size:             4194304
    Bandwidth (MB/sec):    139.153
    
    Average Latency:       0.420841
    Max latency:           0.706133
    Min latency:           0.0816332

  4. Execute a random read test for 10 seconds to the storage pool:

    # rados bench -p testbench 10 rand

    Example Output

    sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
      0       0         0         0         0         0         -         0
      1      16        46        30   119.801       120  0.440184  0.388125
      2      16        81        65   129.408       140  0.577359  0.417461
      3      16       120       104   138.175       156  0.597435  0.409318
      4      15       157       142   141.485       152  0.683111  0.419964
      5      16       206       190   151.553       192  0.310578  0.408343
      6      16       253       237   157.608       188 0.0745175  0.387207
      7      16       287       271   154.412       136  0.792774   0.39043
      8      16       325       309   154.044       152  0.314254   0.39876
      9      16       362       346   153.245       148  0.355576  0.406032
     10      16       405       389   155.092       172   0.64734  0.398372
    Total time run:        10.302229
    Total reads made:      405
    Read size:             4194304
    Bandwidth (MB/sec):    157.248
    
    Average Latency:       0.405976
    Max latency:           1.00869
    Min latency:           0.0378431

  5. To increase the number of concurrent reads and writes, use the -t option, which the default is 16 threads. Also, the -b parameter can adjust the size of the object being written. The default object size is 4MB. A safe maximum object size is 16MB. Red Hat recommends running multiple copies of these benchmark tests to different pools. Doing this shows the changes in performance from multiple clients.

    Add the --run-name <label> option to control the names of the objects that get written during the benchmark test. Multiple rados bench commands may be ran simultaneously by changing the --run-name label for each running command instance. This prevents potential I/O errors that can occur when multiple clients are trying to access the same object and allows for different clients to access different objects. The --run-name option is also useful when trying to simulate a real world workload. For example:

    # rados bench -p testbench 10 write -t 4 --run-name client1

    Example Output

    Maintaining 4 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects
     Object prefix: benchmark_data_node1_12631
       sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
         0       0         0         0         0         0         -         0
         1       4         4         0         0         0         -         0
         2       4         6         2   3.99099         4   1.94755   1.93361
         3       4         8         4   5.32498         8     2.978   2.44034
         4       4         8         4   3.99504         0         -   2.44034
         5       4        10         6   4.79504         4   2.92419    2.4629
         6       3        10         7   4.64471         4   3.02498    2.5432
         7       4        12         8   4.55287         4   3.12204   2.61555
         8       4        14        10    4.9821         8   2.55901   2.68396
         9       4        16        12   5.31621         8   2.68769   2.68081
        10       4        17        13   5.18488         4   2.11937   2.63763
        11       4        17        13   4.71431         0         -   2.63763
        12       4        18        14   4.65486         2    2.4836   2.62662
        13       4        18        14   4.29757         0         -   2.62662
    Total time run:         13.123548
    Total writes made:      18
    Write size:             4194304
    Bandwidth (MB/sec):     5.486
    
    Stddev Bandwidth:       3.0991
    Max bandwidth (MB/sec): 8
    Min bandwidth (MB/sec): 0
    Average Latency:        2.91578
    Stddev Latency:         0.956993
    Max latency:            5.72685
    Min latency:            1.91967

  6. Remove the data created by the rados bench command:

    # rados -p testbench cleanup

7.3. Block Device

Ceph includes the rbd bench-write command to test sequential writes to the block device measuring throughput and latency. The default byte size is 4096, the default number of I/O threads is 16, and the default total number of bytes to write is 1 GB. These defaults can be modified by the --io-size, --io-threads and --io-total options respectively. For more information on the rbd command, see the Block Device Commands section in the Ceph Block Device Guide for Red Hat Ceph Storage 3.

Creating a Ceph Block Device

  1. As root, load the rbd kernel module, if not already loaded:

    # modprobe rbd
  2. As root, create a 1 GB rbd image file in the testbench pool:

    # rbd create image01 --size 1024 --pool testbench
    Note

    When creating a block device image these features are enabled by default: layering, object-map, deep-flatten, journaling, exclusive-lock, and fast-diff.

    On Red Hat Enterprise Linux 7.2 and Ubuntu 16.04, users utilizing the kernel RBD client will not be able to map the block device image. You must first disable all these features, except, layering.

    Syntax

    # rbd feature disable <image_name> <feature_name>

    Example

    # rbd feature disable image1 journaling deep-flatten exclusive-lock fast-diff object-map

    Using the --image-feature layering option on the rbd create command will only enable layering on newly created block device images.

    This is a known issue, see the Red Hat Ceph Storage 3.3 Release Notes. for more details.

    All these features will work for users utilizing the user-space RBD client to access the block device images.

  3. As root, map the image file to a device file:

    # rbd map image01 --pool testbench --name client.admin
  4. As root, create an ext4 file system on the block device:

    # mkfs.ext4 /dev/rbd/testbench/image01
  5. As root, create a new directory:

    # mkdir /mnt/ceph-block-device
  6. As root, mount the block device under /mnt/ceph-block-device/:

    # mount /dev/rbd/testbench/image01 /mnt/ceph-block-device

Execute the write performance test against the block device

# rbd bench-write image01 --pool=testbench

Example

bench-write  io_size 4096 io_threads 16 bytes 1073741824 pattern seq
  SEC       OPS   OPS/SEC   BYTES/SEC
    2     11127   5479.59  22444382.79
    3     11692   3901.91  15982220.33
    4     12372   2953.34  12096895.42
    5     12580   2300.05  9421008.60
    6     13141   2101.80  8608975.15
    7     13195    356.07  1458459.94
    8     13820    390.35  1598876.60
    9     14124    325.46  1333066.62
    ..