Chapter 29. VDO Integration
29.1. Theoretical Overview of VDO
- Deduplication is a technique for reducing the consumption of storage resources by eliminating multiple copies of duplicate blocks.Instead of writing the same data more than once, VDO detects each duplicate block and records it as a reference to the original block. VDO maintains a mapping from logical block addresses, which are used by the storage layer above VDO, to physical block addresses, which are used by the storage layer under VDO.After deduplication, multiple logical block addresses may be mapped to the same physical block address; these are called shared blocks. Block sharing is invisible to users of the storage, who read and write blocks as they would if VDO were not present. When a shared block is overwritten, a new physical block is allocated for storing the new block data to ensure that other logical block addresses that are mapped to the shared physical block are not modified.
- Compression is a data-reduction technique that works well with file formats that do not necessarily exhibit block-level redundancy, such as log files and databases. See Section 29.4.8, “Using Compression” for more detail.
- A kernel module that loads into the Linux Device Mapper layer to provide a deduplicated, compressed, and thinly provisioned block storage volume
- A kernel module that communicates with the Universal Deduplication Service (UDS) index on the volume and analyzes data for duplicates.
- Command line tools
- For configuring and managing optimized storage.
29.1.1. The UDS Kernel Module (
29.1.2. The VDO Kernel Module (
kvdoLinux kernel module provides block-layer deduplication services within the Linux Device Mapper layer. In the Linux kernel, Device Mapper serves as a generic framework for managing pools of block storage, allowing the insertion of block-processing modules into the storage stack between the kernel's block interface and the actual storage device drivers.
kvdomodule is exposed as a block device that can be accessed directly for block storage or presented through one of the many available Linux file systems, such as XFS or ext4. When
kvdoreceives a request to read a (logical) block of data from a VDO volume, it maps the requested logical block to the underlying physical block and then reads and returns the requested data.
kvdoreceives a request to write a block of data to a VDO volume, it first checks whether it is a
TRIMrequest or whether the data is uniformly zero. If either of these conditions holds,
kvdoupdates its block map and acknowledges the request. Otherwise, a physical block is allocated for use by the request.
Overview of VDO Write Policies
kvdomodule is operating in synchronous mode:
- It temporarily writes the data in the request to the allocated block and then acknowledges the request.
- Once the acknowledgment is complete, an attempt is made to deduplicate the block by computing a MurmurHash-3 signature of the block data, which is sent to the VDO index.
- If the VDO index contains an entry for a block with the same signature,
kvdoreads the indicated block and does a byte-by-byte comparison of the two blocks to verify that they are identical.
- If they are indeed identical, then
kvdoupdates its block map so that the logical block points to the corresponding physical block and releases the allocated physical block.
- If the VDO index did not contain an entry for the signature of the block being written, or the indicated block does not actually contain the same data,
kvdoupdates its block map to make the temporary physical block permanent.
kvdois operating in asynchronous mode:
- Instead of writing the data, it will immediately acknowledge the request.
- It will then attempt to deduplicate the block in same manner as described above.
- If the block turns out to be a duplicate,
kvdowill update its block map and release the allocated block. Otherwise, it will write the data in the request to the allocated block and update the block map to make the physical block permanent.
29.1.3. VDO Volume
Figure 29.1. VDO Disk Organization
--vdoSlabSize=megabytesoption to the
Table 29.1. Recommended VDO Slab Sizes by Physical Volume Size
|Physical Volume Size||10–99 GB||100 GB – 1 TB||2–10 TB||11–50 TB||51–100 TB||101–256 TB|
|Slab Size||1 GB||2 GB||32 GB||32 GB||32 GB||32 GB|
Physical Size and Available Physical Size
- Physical size is the same size as the underlying block device. VDO uses this storage for:
- User data, which might be deduplicated and compressed
- VDO metadata, such as the UDS index
- Available physical size is the portion of the physical size that VDO is able to use for user data.It is equivalent to the physical size minus the size of the metadata, minus the remainder after dividing the volume into slabs by the given slab size.
--vdoLogicalSizeoption is not specified, the logical volume size defaults to the available physical volume size. Note that, in Figure 29.1, “VDO Disk Organization”, the VDO deduplicated storage target sits completely on top of the block device, meaning the physical size of the VDO volume is the same size as the underlying block device.