Chapter 29. VDO Integration

29.1. Theoretical Overview of VDO

Virtual Data Optimizer (VDO) is a block virtualization technology that allows you to easily create compressed and deduplicated pools of block storage.
  • Deduplication is a technique for reducing the consumption of storage resources by eliminating multiple copies of duplicate blocks.
    Instead of writing the same data more than once, VDO detects each duplicate block and records it as a reference to the original block. VDO maintains a mapping from logical block addresses, which are used by the storage layer above VDO, to physical block addresses, which are used by the storage layer under VDO.
    After deduplication, multiple logical block addresses may be mapped to the same physical block address; these are called shared blocks. Block sharing is invisible to users of the storage, who read and write blocks as they would if VDO were not present. When a shared block is overwritten, a new physical block is allocated for storing the new block data to ensure that other logical block addresses that are mapped to the shared physical block are not modified.
  • Compression is a data-reduction technique that works well with file formats that do not necessarily exhibit block-level redundancy, such as log files and databases. See Section 29.4.8, “Using Compression” for more detail.
The VDO solution consists of the following components:
kvdo
A kernel module that loads into the Linux Device Mapper layer to provide a deduplicated, compressed, and thinly provisioned block storage volume
uds
A kernel module that communicates with the Universal Deduplication Service (UDS) index on the volume and analyzes data for duplicates.
Command line tools
For configuring and managing optimized storage.

29.1.1. The UDS Kernel Module (uds)

The UDS index provides the foundation of the VDO product. For each new piece of data, it quickly determines if that piece is identical to any previously stored piece of data. If the index finds match, the storage system can then internally reference the existing item to avoid storing the same information more than once.
The UDS index runs inside the kernel as the uds kernel module.

29.1.2. The VDO Kernel Module (kvdo)

The kvdo Linux kernel module provides block-layer deduplication services within the Linux Device Mapper layer. In the Linux kernel, Device Mapper serves as a generic framework for managing pools of block storage, allowing the insertion of block-processing modules into the storage stack between the kernel's block interface and the actual storage device drivers.
The kvdo module is exposed as a block device that can be accessed directly for block storage or presented through one of the many available Linux file systems, such as XFS or ext4. When kvdo receives a request to read a (logical) block of data from a VDO volume, it maps the requested logical block to the underlying physical block and then reads and returns the requested data.
When kvdo receives a request to write a block of data to a VDO volume, it first checks whether it is a DISCARD or TRIM request or whether the data is uniformly zero. If either of these conditions holds, kvdo updates its block map and acknowledges the request. Otherwise, a physical block is allocated for use by the request.

Overview of VDO Write Policies

If the kvdo module is operating in synchronous mode:
  1. It temporarily writes the data in the request to the allocated block and then acknowledges the request.
  2. Once the acknowledgment is complete, an attempt is made to deduplicate the block by computing a MurmurHash-3 signature of the block data, which is sent to the VDO index.
  3. If the VDO index contains an entry for a block with the same signature, kvdo reads the indicated block and does a byte-by-byte comparison of the two blocks to verify that they are identical.
  4. If they are indeed identical, then kvdo updates its block map so that the logical block points to the corresponding physical block and releases the allocated physical block.
  5. If the VDO index did not contain an entry for the signature of the block being written, or the indicated block does not actually contain the same data, kvdo updates its block map to make the temporary physical block permanent.
If kvdo is operating in asynchronous mode:
  1. Instead of writing the data, it will immediately acknowledge the request.
  2. It will then attempt to deduplicate the block in same manner as described above.
  3. If the block turns out to be a duplicate, kvdo will update its block map and release the allocated block. Otherwise, it will write the data in the request to the allocated block and update the block map to make the physical block permanent.

29.1.3. VDO Volume

VDO uses a block device as a backing store, which can include an aggregation of physical storage consisting of one or more disks, partitions, or even flat files. When a VDO volume is created by a storage management tool, VDO reserves space from the volume for both a UDS index and the VDO volume, which interact together to provide deduplicated block storage to users and applications. Figure 29.1, “VDO Disk Organization” illustrates how these pieces fit together.
VDO Disk Organization

Figure 29.1. VDO Disk Organization

Slabs

The physical storage of the VDO volume is divided into a number of slabs, each of which is a contiguous region of the physical space. All of the slabs for a given volume will be of the same size, which may be any power of 2 multiple of 128 MB up to 32 GB.
The default slab size is 2 GB in order to facilitate evaluating VDO on smaller test systems. A single VDO volume may have up to 8096 slabs. Therefore, in the default configuration with 2 GB slabs, the maximum allowed physical storage is 16 TB. When using 32 GB slabs, the maximum allowed physical storage is 256 TB.
For a recommendation on what slab size to choose depending on your physical volume size, see Table 29.1, “Recommended VDO Slab Sizes by Physical Volume Size”.
At least one entire slab will be reserved by VDO for metadata, and therefore cannot be used for storing user data.
The size of a slab can be controlled by providing the --vdoSlabSize=megabytes option to the vdo create command.

Table 29.1. Recommended VDO Slab Sizes by Physical Volume Size

Physical Volume Size10–99 GB100 GB – 1 TB2–10 TB11–50 TB51–100 TB101–256 TB
Slab Size1 GB2 GB32 GB32 GB32 GB32 GB

Physical Size and Available Physical Size

Both physical size and available physical size describe the amount of disk space on the block device that VDO can utilize:
  • Physical size is the same size as the underlying block device. VDO uses this storage for:
    • User data, which might be deduplicated and compressed
    • VDO metadata, such as the UDS index
  • Available physical size is the portion of the physical size that VDO is able to use for user data.
    It is equivalent to the physical size minus the size of the metadata, minus the remainder after dividing the volume into slabs by the given slab size.
For examples of how much storage VDO metadata require on block devices of different sizes, see Section 29.2.3, “Examples of VDO System Requirements by Physical Volume Size”.

Logical Size

If the --vdoLogicalSize option is not specified, the logical volume size defaults to the available physical volume size. Note that, in Figure 29.1, “VDO Disk Organization”, the VDO deduplicated storage target sits completely on top of the block device, meaning the physical size of the VDO volume is the same size as the underlying block device.
VDO currently supports any logical size up to 254 times the size of the physical volume with an absolute maximum logical size of 4PB.

29.1.4. Command Line Tools

VDO includes the following command line tools for configuration and management:
vdo
Creates, configures, and controls VDO volumes
vdostats
Provides utilization and performance statistics