Appendix A. The Device Mapper

The Device Mapper is a kernel driver that provides a framework for volume management. It provides a generic way of creating mapped devices, which may be used as logical volumes. It does not specifically know about volume groups or metadata formats.
The Device Mapper provides the foundation for a number of higher-level technologies. In addition to LVM, Device-Mapper multipath and the dmraid command use the Device Mapper. The application interface to the Device Mapper is the ioctl system call. The user interface is the dmsetup command.
LVM logical volumes are activated using the Device Mapper. Each logical volume is translated into a mapped device. Each segment translates into a line in the mapping table that describes the device. The Device Mapper supports a variety of mapping targets, including linear mapping, striped mapping, and error mapping. So, for example, two disks may be concatenated into one logical volume with a pair of linear mappings, one for each disk. When LVM creates a volume, it creates an underlying device-mapper device that can be queried with the dmsetup command. For information about the format of devices in a mapping table, see Section A.1, “Device Table Mappings”. For information about using the dmsetup command to query a device, see Section A.2, “The dmsetup Command”.

A.1. Device Table Mappings

A mapped device is defined by a table that specifies how to map each range of logical sectors of the device using a supported Device Table mapping. The table for a mapped device is constructed from a list of lines of the form:
start length mapping [mapping_parameters...]
In the first line of a Device Mapper table, the start parameter must equal 0. The start + length parameters on one line must equal the start on the next line. Which mapping parameters are specified in a line of the mapping table depends on which mapping type is specified on the line.
Sizes in the Device Mapper are always specified in sectors (512 bytes).
When a device is specified as a mapping parameter in the Device Mapper, it can be referenced by the device name in the filesystem (for example, /dev/hda) or by the major and minor numbers in the format major:minor. The major:minor format is preferred because it avoids pathname lookups.
The following shows a sample mapping table for a device. In this table there are four linear targets:
0 35258368 linear 8:48 65920
35258368 35258368 linear 8:32 65920
70516736 17694720 linear 8:16 17694976
88211456 17694720 linear 8:16 256
The first 2 parameters of each line are the segment starting block and the length of the segment. The next keyword is the mapping target, which in all of the cases in this example is linear. The rest of the line consists of the parameters for a linear target.
The following subsections describe the format of the following mappings:
  • linear
  • striped
  • mirror
  • snapshot and snapshot-origin
  • error
  • zero
  • multipath
  • crypt

A.1.1. The linear Mapping Target

A linear mapping target maps a continuous range of blocks onto another block device. The format of a linear target is as follows:
start length linear device offset
start
starting block in virtual device
length
length of this segment
device
block device, referenced by the device name in the filesystem or by the major and minor numbers in the format major:minor
offset
starting offset of the mapping on the device
The following example shows a linear target with a starting block in the virtual device of 0, a segment length of 1638400, a major:minor number pair of 8:2, and a starting offset for the device of 41146992.
0 16384000 linear 8:2 41156992
The following example shows a linear target with the device parameter specified as the device /dev/hda.
0 20971520 linear /dev/hda 384

A.1.2. The striped Mapping Target

The striped mapping target supports striping across physical devices. It takes as arguments the number of stripes and the striping chunk size followed by a list of pairs of device name and sector. The format of a striped target is as follows:
start length striped #stripes chunk_size device1 offset1 ... deviceN offsetN
There is one set of device and offset parameters for each stripe.
start
starting block in virtual device
length
length of this segment
#stripes
number of stripes for the virtual device
chunk_size
number of sectors written to each stripe before switching to the next; must be power of 2 at least as big as the kernel page size
device
block device, referenced by the device name in the filesystem or by the major and minor numbers in the format major:minor.
offset
starting offset of the mapping on the device
The following example shows a striped target with three stripes and a chunk size of 128:
0 73728 striped 3 128 8:9 384 8:8 384 8:7 9789824
0
starting block in virtual device
73728
length of this segment
striped 3 128
stripe across three devices with chunk size of 128 blocks
8:9
major:minor numbers of first device
384
starting offset of the mapping on the first device
8:8
major:minor numbers of second device
384
starting offset of the mapping on the second device
8:7
major:minor numbers of third device
9789824
starting offset of the mapping on the third device
The following example shows a striped target for 2 stripes with 256 KiB chunks, with the device parameters specified by the device names in the file system rather than by the major and minor numbers.
0 65536 striped 2 512 /dev/hda 0 /dev/hdb 0

A.1.3. The mirror Mapping Target

The mirror mapping target supports the mapping of a mirrored logical device. The format of a mirrored target is as follows:
start length mirror log_type #logargs logarg1 ... logargN #devs device1 offset1 ... deviceN offsetN
start
starting block in virtual device
length
length of this segment
log_type
The possible log types and their arguments are as follows:
core
The mirror is local and the mirror log is kept in core memory. This log type takes 1 - 3 arguments:
regionsize [[no]sync] [block_on_error]
disk
The mirror is local and the mirror log is kept on disk. This log type takes 2 - 4 arguments:
logdevice regionsize [[no]sync] [block_on_error]
clustered_core
The mirror is clustered and the mirror log is kept in core memory. This log type takes 2 - 4 arguments:
regionsize UUID [[no]sync] [block_on_error]
clustered_disk
The mirror is clustered and the mirror log is kept on disk. This log type takes 3 - 5 arguments:
logdevice regionsize UUID [[no]sync] [block_on_error]
LVM maintains a small log which it uses to keep track of which regions are in sync with the mirror or mirrors. The regionsize argument specifies the size of these regions.
In a clustered environment, the UUID argument is a unique identifier associated with the mirror log device so that the log state can be maintained throughout the cluster.
The optional [no]sync argument can be used to specify the mirror as "in-sync" or "out-of-sync". The block_on_error argument is used to tell the mirror to respond to errors rather than ignoring them.
#log_args
number of log arguments that will be specified in the mapping
logargs
the log arguments for the mirror; the number of log arguments provided is specified by the #log-args parameter and the valid log arguments are determined by the log_type parameter.
#devs
the number of legs in the mirror; a device and an offset is specified for each leg
device
block device for each mirror leg, referenced by the device name in the filesystem or by the major and minor numbers in the format major:minor. A block device and offset is specified for each mirror leg, as indicated by the #devs parameter.
offset
starting offset of the mapping on the device. A block device and offset is specified for each mirror leg, as indicated by the #devs parameter.
The following example shows a mirror mapping target for a clustered mirror with a mirror log kept on disk.
0 52428800 mirror clustered_disk 4 253:2 1024 UUID block_on_error 3 253:3 0 253:4 0 253:5 0
0
starting block in virtual device
52428800
length of this segment
mirror clustered_disk
mirror target with a log type specifying that mirror is clustered and the mirror log is maintained on disk
4
4 mirror log arguments will follow
253:2
major:minor numbers of log device
1024
region size the mirror log uses to keep track of what is in sync
UUID
UUID of mirror log device to maintain log information throughout a cluster
block_on_error
mirror should respond to errors
3
number of legs in mirror
253:3 0 253:4 0 253:5 0
major:minor numbers and offset for devices constituting each leg of mirror

A.1.4. The snapshot and snapshot-origin Mapping Targets

When you create the first LVM snapshot of a volume, four Device Mapper devices are used:
  1. A device with a linear mapping containing the original mapping table of the source volume.
  2. A device with a linear mapping used as the copy-on-write (COW) device for the source volume; for each write, the original data is saved in the COW device of each snapshot to keep its visible content unchanged (until the COW device fills up).
  3. A device with a snapshot mapping combining #1 and #2, which is the visible snapshot volume.
  4. The "original" volume (which uses the device number used by the original source volume), whose table is replaced by a "snapshot-origin" mapping from device #1.
A fixed naming scheme is used to create these devices, For example, you might use the following commands to create an LVM volume named base and a snapshot volume named snap based on that volume.
# lvcreate -L 1G -n base volumeGroup
# lvcreate -L 100M --snapshot -n snap volumeGroup/base
This yields four devices, which you can view with the following commands:
# dmsetup table|grep volumeGroup
volumeGroup-base-real: 0 2097152 linear 8:19 384
volumeGroup-snap-cow: 0 204800 linear 8:19 2097536
volumeGroup-snap: 0 2097152 snapshot 254:11 254:12 P 16
volumeGroup-base: 0 2097152 snapshot-origin 254:11

# ls -lL /dev/mapper/volumeGroup-*
brw-------  1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real
brw-------  1 root root 254, 12 29 ago 18:15 /dev/mapper/volumeGroup-snap-cow
brw-------  1 root root 254, 13 29 ago 18:15 /dev/mapper/volumeGroup-snap
brw-------  1 root root 254, 10 29 ago 18:14 /dev/mapper/volumeGroup-base
The format for the snapshot-origin target is as follows:
start length snapshot-origin origin
start
starting block in virtual device
length
length of this segment
origin
base volume of snapshot
The snapshot-origin will normally have one or more snapshots based on it. Reads will be mapped directly to the backing device. For each write, the original data will be saved in the COW device of each snapshot to keep its visible content unchanged until the COW device fills up.
The format for the snapshot target is as follows:
start length snapshot origin COW-device P|N chunksize
start
starting block in virtual device
length
length of this segment
origin
base volume of snapshot
COW-device
Device on which changed chunks of data are stored
P|N
P (Persistent) or N (Not persistent); indicates whether snapshot will survive after reboot. For transient snapshots (N) less metadata must be saved on disk; they can be kept in memory by the kernel.
chunksize
Size in sectors of changed chunks of data that will be stored on the COW device
The following example shows a snapshot-origin target with an origin device of 254:11.
0 2097152 snapshot-origin 254:11
The following example shows a snapshot target with an origin device of 254:11 and a COW device of 254:12. This snapshot device is persistent across reboots and the chunk size for the data stored on the COW device is 16 sectors.
0 2097152 snapshot 254:11 254:12 P 16

A.1.5. The error Mapping Target

With an error mapping target, any I/O operation to the mapped sector fails.
An error mapping target can be used for testing. To test how a device behaves in failure, you can create a device mapping with a bad sector in the middle of a device, or you can swap out the leg of a mirror and replace the leg with an error target.
An error target can be used in place of a failing device, as a way of avoiding timeouts and retries on the actual device. It can serve as an intermediate target while you rearrange LVM metadata during failures.
The error mapping target takes no additional parameters besides the start and length parameters.
The following example shows an error target.
0 65536 error

A.1.6. The zero Mapping Target

The zero mapping target is a block device equivalent of /dev/zero. A read operation to this mapping returns blocks of zeros. Data written to this mapping is discarded, but the write succeeds. The zero mapping target takes no additional parameters besides the start and length parameters.
The following example shows a zero target for a 16Tb Device.
0 65536 zero

A.1.7. The multipath Mapping Target

The multipath mapping target supports the mapping of a multipathed device. The format for the multipath target is as follows:
start length  multipath  #features [feature1 ... featureN] #handlerargs [handlerarg1 ... handlerargN] #pathgroups pathgroup pathgroupargs1 ... pathgroupargsN
There is one set of pathgroupargs parameters for each path group.
start
starting block in virtual device
length
length of this segment
#features
The number of multipath features, followed by those features. If this parameter is zero, then there is no feature parameter and the next device mapping parameter is #handlerargs. Currently there is one supported feature that can be set with the features attribute in the multipath.conf file, queue_if_no_path. This indicates that this multipathed device is currently set to queue I/O operations if there is no path available.
In the following example, the no_path_retry attribute in the multipath.conf file has been set to queue I/O operations only until all paths have been marked as failed after a set number of attempts have been made to use the paths. In this case, the mapping appears as follows until all the path checkers have failed the specified number of checks.
0 71014400 multipath 1 queue_if_no_path 0 2 1 round-robin 0 2 1 66:128 \
1000 65:64 1000 round-robin 0 2 1 8:0 1000 67:192 1000
After all the path checkers have failed the specified number of checks, the mapping would appear as follows.
0 71014400 multipath 0 0 2 1 round-robin 0 2 1 66:128 1000 65:64 1000 \
round-robin 0 2 1 8:0 1000 67:192 1000
#handlerargs
The number of hardware handler arguments, followed by those arguments. A hardware handler specifies a module that will be used to perform hardware-specific actions when switching path groups or handling I/O errors. If this is set to 0, then the next parameter is #pathgroups.
#pathgroups
The number of path groups. A path group is the set of paths over which a multipathed device will load balance. There is one set of pathgroupargs parameters for each path group.
pathgroup
The next path group to try.
pathgroupsargs
Each path group consists of the following arguments:
pathselector #selectorargs #paths #pathargs device1 ioreqs1 ... deviceN ioreqsN 
There is one set of path arguments for each path in the path group.
pathselector
Specifies the algorithm in use to determine what path in this path group to use for the next I/O operation.
#selectorargs
The number of path selector arguments which follow this argument in the multipath mapping. Currently, the value of this argument is always 0.
#paths
The number of paths in this path group.
#pathargs
The number of path arguments specified for each path in this group. Currently this number is always 1, the ioreqs argument.
device
The block device number of the path, referenced by the major and minor numbers in the format major:minor
ioreqs
The number of I/O requests to route to this path before switching to the next path in the current group.
Figure A.1, “Multipath Mapping Target” shows the format of a multipath target with two path groups.
Multipath Mapping Target

Figure A.1. Multipath Mapping Target


The following example shows a pure failover target definition for the same multipath device. In this target there are four path groups, with only one open path per path group so that the multipathed device will use only one path at a time.
0 71014400 multipath 0 0 4 1 round-robin 0 1 1 66:112 1000 \
round-robin 0 1 1 67:176 1000 round-robin 0 1 1 68:240 1000 \
round-robin 0 1 1 65:48 1000
The following example shows a full spread (multibus) target definition for the same multipathed device. In this target there is only one path group, which includes all of the paths. In this setup, multipath spreads the load evenly out to all of the paths.
0 71014400 multipath 0 0 1 1 round-robin 0 4 1 66:112 1000 \
 67:176 1000 68:240 1000 65:48 1000
For further information about multipathing, see the Using Device Mapper Multipath document.

A.1.8. The crypt Mapping Target

The crypt target encrypts the data passing through the specified device. It uses the kernel Crypto API.
The format for the crypt target is as follows:
start length crypt cipher key IV-offset device offset
start
starting block in virtual device
length
length of this segment
cipher
Cipher consists of cipher[-chainmode]-ivmode[:iv options].
cipher
Ciphers available are listed in /proc/crypto (for example, aes).
chainmode
Always use cbc. Do not use ebc; it does not use an initial vector (IV).
ivmode[:iv options]
IV is an initial vector used to vary the encryption. The IV mode is plain or essiv:hash. An ivmode of -plain uses the sector number (plus IV offset) as the IV. An ivmode of -essiv is an enhancement avoiding a watermark weakness.
key
Encryption key, supplied in hex
IV-offset
Initial Vector (IV) offset
device
block device, referenced by the device name in the filesystem or by the major and minor numbers in the format major:minor
offset
starting offset of the mapping on the device
The following is an example of a crypt target.
0 2097152 crypt aes-plain 0123456789abcdef0123456789abcdef 0 /dev/hda 0