Deduplicating and compressing logical volumes on RHEL
Using VDO to increase LVM storage capacity
Making open source more inclusive
Red Hat is committed to replacing problematic language in our code, documentation, and web properties. We are beginning with these four terms: master, slave, blacklist, and whitelist. Because of the enormity of this endeavor, these changes will be implemented gradually over several upcoming releases. For more details, see our CTO Chris Wright’s message.
Providing feedback on Red Hat documentation
We appreciate your input on our documentation. Please let us know how we could make it better. To do so:
For simple comments on specific passages:
- Make sure you are viewing the documentation in the Multi-page HTML format. In addition, ensure you see the Feedback button in the upper right corner of the document.
- Use your mouse cursor to highlight the part of text that you want to comment on.
- Click the Add Feedback pop-up that appears below the highlighted text.
- Follow the displayed instructions.
For submitting more complex feedback, create a Bugzilla ticket:
- Go to the Bugzilla website.
- As the Component, use Documentation.
- Fill in the Description field with your suggestion for improvement. Include a link to the relevant part(s) of documentation.
- Click Submit Bug.
Chapter 1. Introduction to VDO on LVM
The Virtual Data Optimizer (VDO) feature provides inline block-level deduplication, compression, and thin provisioning for storage. You can manage VDO as a type of LVM logical volumes (LVs), similar to LVM thinly provisioned volumes.
VDO volumes on LVM (LVM-VDO) are composed of the following LVs:
- VDO pool LV
This is the backing physical device that stores, deduplicates, and compresses data for the VDO LV. The VDO pool LV sets the physical size of the VDO volume, which is the amount of data that VDO can store on the disk.
Currently, each VDO pool LV can hold only one VDO LV. As a result, VDO deduplicated and compresses each VDO LV separately. In other words, VDO cannot deduplicate or compress a piece of data that is shared between several VDO LV.
- VDO LV
- This is the virtual, provisioned device on top of the VDO pool LV. The VDO LV sets the provisioned, logical size of the VDO volume, which is the amount of data that applications can write to the volume before deduplication and compression occurs.
Table 1.1. A comparison of components in VDO on LVM and LVM thin provisioning
|Physical device||Provisioned device|
VDO on LVM
VDO pool LV
LVM thin provisioning
Because VDO is thinly provisioned, the file system and applications only see the logical space in use and are not aware of the actual physical space available. Use scripting to monitor the actual available space and generate an alert if use exceeds a threshold: for example, when the VDO pool LV is 80% full.
- For documentation on stand-alone VDO, see Deduplicating and compressing storage.
- For documentation on LVM thin provisioning, see Creating and managing thinly-provisioned logical volumes (thin volumes).
Chapter 2. VDO requirements
VDO has certain requirements on its placement and your system resources.
2.1. VDO memory requirements
Each VDO volume has two distinct memory requirements:
The VDO module
VDO requires 370 MB of RAM plus an additional 268 MB per each 1 TB of physical storage managed by the volume.
The Universal Deduplication Service (UDS) index
UDS requires a minimum of 250 MB of RAM, which is also the default amount that deduplication uses.
The memory required for the UDS index is determined by the index type and the required size of the deduplication window:
|Index type||Deduplication window||Note|
1 TB per 1 GB of RAM
A 1 GB dense index is generally sufficient for up to 4 TB of physical storage.
10 TB per 1 GB of RAM
A 1 GB sparse index is generally sufficient for up to 40 TB of physical storage.
The UDS Sparse Indexing feature is the recommended mode for VDO. It relies on the temporal locality of data and attempts to retain only the most relevant index entries in memory. With the sparse index, UDS can maintain a deduplication window that is ten times larger than with dense, while using the same amount of memory.
Although the sparse index provides the greatest coverage, the dense index provides more deduplication advice. For most workloads, given the same amount of memory, the difference in deduplication rates between dense and sparse indexes is negligible.
- For concrete examples of UDS index memory requirements, see Section 2.4, “Examples of VDO requirements by physical size”.
2.2. VDO storage space requirements
You can configure a VDO volume to use up to 256 TB of physical storage. Only a certain part of the physical storage is usable to store data. This section provides the calculations to determine the usable size of a VDO-managed volume.
VDO requires storage for two types of VDO metadata and for the UDS index:
- The first type of VDO metadata uses approximately 1 MB for each 4 GB of physical storage plus an additional 1 MB per slab.
- The second type of VDO metadata consumes approximately 1.25 MB for each 1 GB of logical storage, rounded up to the nearest slab.
- The amount of storage required for the UDS index depends on the type of index and the amount of RAM allocated to the index. For each 1 GB of RAM, a dense UDS index uses 17 GB of storage, and a sparse UDS index will use 170 GB of storage.
- For concrete examples of VDO storage requirements, see Section 2.4, “Examples of VDO requirements by physical size”.
- For a description of slabs, see Section 3.4, “Slab size in VDO”.
2.3. Placement of VDO in the storage stack
You should place certain storage layers under VDO and others above VDO.
You can place thick-provisioned layers on top of VDO, but you cannot rely on the guarantees of thick provisioning in that case. Because the VDO layer is thin-provisioned, the effects of thin provisioning apply to all layers above it. If you do not monitor the VDO device, you might run out of physical space on thick-provisioned volumes above VDO.
Layers that you can place only under VDO:
- DM Multipath
- DM Crypt
- Software RAID (LVM or MD RAID)
Layers that you can place only above VDO:
- LVM cache
- LVM snapshots
- LVM thin provisioning
- VDO on top of other VDO volumes
- VDO on top of LVM snapshots
- VDO on top of LVM cache
- VDO on top of a loopback device
- VDO on top of LVM thin provisioning
- Encrypted volumes on top of VDO
- Partitions on a VDO volume
- RAID, such as LVM RAID, MD RAID, or any other type, on top of a VDO volume
- For more information on stacking VDO with LVM layers, see the Stacking LVM volumes article.
2.4. Examples of VDO requirements by physical size
The following tables provide approximate system requirements of VDO based on the physical size of the underlying volume. Each table lists requirements appropriate to the intended deployment, such as primary storage or backup storage.
The exact numbers depend on your configuration of the VDO volume.
Primary storage deployment
In the primary storage case, the UDS index is between 0.01% to 25% the size of the physical size.
Table 2.1. Storage and memory requirements for primary storage
|Physical size||RAM usage||Disk usage||Index type|
Backup storage deployment
In the backup storage case, the UDS index covers the size of the backup set but is not bigger than the physical size. If you expect the backup set or the physical size to grow in the future, factor this into the index size.
Table 2.2. Storage and memory requirements for backup storage
|Physical size||RAM usage||Disk usage||Index type|
Chapter 3. Creating a deduplicated and compressed logical volume
You can create an LVM logical volume that uses the VDO feature to deduplicate and compress data.
3.1. LVM-VDO deployment scenarios
You can deploy VDO on LVM (LVM-VDO) in a variety of ways to provide deduplicated storage for:
- block access
- file access
- local storage
- remote storage
Because LVM-VDO exposes its deduplicated storage as a regular logical volume (LV), you can use it with standard file systems, iSCSI and FC target drivers, or as unified storage.
Deploying Ceph Storage on LVM-VDO is currently not supported.
You can deploy LVM-VDO on a KVM server configured with Direct Attached Storage.
You can create file systems on top of a VDO LV and expose them to NFS or CIFS users with the NFS server or Samba.
You can export the entirety of the VDO LV as an iSCSI target to remote iSCSI initiators.
Device Mapper (DM) mechanisms such as DM Crypt are compatible with VDO. Encrypting a VDO LV volumes helps ensure data security, and any file systems above the VDO LV are still deduplicated.
Applying the encryption layer above the VDO LV results in little if any data deduplication. Encryption makes duplicate blocks different before VDO can deduplicate them.
Always place the encryption layer below the VDO LV.
3.2. The physical and logical size of an LVM-VDO volume
This section describes the physical size, available physical size, and logical size that VDO can utilize.
- Physical size
This is the same size as the physical extents allocated to the VDO pool LV. VDO uses this storage for:
- User data, which might be deduplicated and compressed
- VDO metadata, such as the UDS index
- Available physical size
This is the portion of the physical size that VDO is able to use for user data.
It is equivalent to the physical size minus the size of the metadata, minus the remainder after dividing the volume into slabs by the given slab size.
- Logical Size
This is the provisioned size that the VDO LV presents to applications. It is usually larger than the available physical size.
If you do not specify the
--virtualsizeoption, VDO provisions the volume to a
1:1ratio. For example, if you put a VDO LV on top of a 20 GB VDO pool LV, VDO reserves 2.5 GB for the UDS index, if the default index size is used. The remaining 17.5 GB is provided for the VDO metadata and user data. As a result, the available storage to consume is not more than 17.5 GB, and can be less due to metadata that makes up the actual VDO volume.
VDO currently supports any logical size up to 254 times the size of the physical volume with an absolute maximum logical size of 4 PB.
- For more information on how much storage VDO metadata requires on different physical sizes, see Section 2.4, “Examples of VDO requirements by physical size”.
3.3. The recommended logical size for VDO logical volumes
When you set up a VDO logical volume (LV), you specify the amount of logical storage that the VDO LV presents. Red Hat recommends the following logical sizes for these use cases:
- When hosting active VMs or containers, Red Hat recommends provisioning storage at a 10:1 logical to physical ratio: that is, if you are utilizing 1 TB of physical storage, you would present it as 10 TB of logical storage.
- For object storage, such as the type provided by Ceph, Red Hat recommends using a 3:1 logical to physical ratio: that is, 1 TB of physical storage would present as 3 TB logical storage.
In either case, you can simply put a file system on top of the VDO LV and then use it directly or as part of a distributed cloud storage architecture.
3.4. Slab size in VDO
The physical storage of the VDO volume is divided into a number of slabs. Each slab is a contiguous region of the physical space. All of the slabs for a given volume have the same size, which can be any power of 2 multiple of 128 MB up to 32 GB.
The default slab size is 2 GB in order to facilitate evaluating VDO on smaller test systems. A single VDO volume can have up to 8192 slabs. Therefore, in the default configuration with 2 GB slabs, the maximum allowed physical storage is 16 TB. When using 32 GB slabs, the maximum allowed physical storage is 256 TB. VDO always reserves at least one entire slab for metadata, and therefore, the reserved slab cannot be used for storing user data.
Slab size has no effect on the performance of the VDO volume.
Table 3.1. Recommended VDO slab sizes by physical volume size
|Physical volume size||Recommended slab size|
100 GB – 1 TB
You can control the slab size by providing the
--vdoSlabSize=megabytes option to the
vdo create command.
3.5. Installing VDO
This procedure installs software necessary to create, mount, and manage VDO volumes.
# yum install vdo kmod-kvdo
3.6. Creating an LVM-VDO volume
This procedure creates an VDO logical volume (LV) on a VDO pool LV.
- Install the VDO software. See Section 3.5, “Installing VDO”.
- An LVM volume group with free storage capacity exists on your system.
Pick a name for your VDO LV, such as
vdo1. You must use a different name and device for each VDO LV on the system.
In all the following steps, replace vdo-name with the name.
Create the VDO LV:
# lvcreate --type vdo \ --name vdo-name --size physical-size --virtualsize logical-size \ vg-name
- Replace vg-name with the name of an existing LVM volume group where you want to place the VDO LV.
- Replace logical-size with the amount of logical storage that the VDO LV will present.
If the physical size is larger than 16TiB, add the following option to increase the slab size on the volume to 32GiB:
If you use the default slab size of 2GiB on a physical size larger than 16TiB, the
lvcreatecommand fails with the following error:
ERROR - vdoformat: formatVDO failed on '/dev/device': VDO Status: Exceeds maximum number of slabs supported
Example 3.1. Creating a VDO LV for container storage
For example, to create a VDO LV for container storage on a 1TB VDO pool LV, you can use:
# lvcreate --type vdo \ --name vdo1 --size 1T --virtualsize 10T \ vg-nameImportant
If a failure occurs when creating the VDO volume, remove the volume to clean up.
Create a file system on the VDO LV:
For the XFS file system:
# mkfs.xfs -K /dev/vg-name/vdo-name
For the ext4 file system:
# mkfs.ext4 -E nodiscard /dev/vg-name/vdo-name
3.7. Mounting an LVM-VDO volume
This procedure mounts a file system on an LVM-VDO volume, either manually or persistently.
- An LVM-VDO volume exists on your system. For more information, see Section 3.6, “Creating an LVM-VDO volume”.
To mount the file system on the LVM-VDO volume manually, use:
# mount /dev/vg/vdo-name mount-point
To configure the file system to mount automatically at boot, add a line to the
For the XFS file system:
/dev/vg/vdo-name mount-point xfs defaults,_netdev,x-systemd.device-timeout=0,x-systemd.requires=vdo.service 0 0
For the ext4 file system:
/dev/vg/vdo-name mount-point ext4 defaults,_netdev,x-systemd.device-timeout=0,x-systemd.requires=vdo.service 0 0