Chapter 14. Storage

DM rebase to version 4.2

Device Mapper (DM) has been upgraded to upstream version 4.2, which provides a number of bug fixes and enhancements over the previous version including a significant DM crypt performance update and DM core update to support Multi-Queue Block I/O Queueing Mechanism (blk-mq).

Multiqueue I/O scheduling with blk-mq

Red Hat Enterprise Linux 7.2 includes a new multiple queue I/O scheduling mechanism for block devices known as blk-mq. It can improve performance by allowing certain device drivers to map I/O requests to multiple hardware or software queues. The improved performance comes from reducing lock contention present when multiple threads of execution perform I/O to a single device. Newer devices, such as Non-Volatile Memory Express (NVMe), are best positioned to take advantage of this feature due to their native support for multiple hardware submission and completion queues, and their low-latency performance characteristics. Performance gains, as always, will depend on the exact hardware and workload.
The blk-mq feature is currently implemented, and enabled by default, in the following drivers: virtio-blk, mtip32xx, nvme, and rbd.
The related feature, scsi-mq, allows Small Computer System Interface (SCSI) device drivers to use the blk-mq infrastructure. The scsi-mq feature is provided as a Technology Preview in Red Hat Enterprise Linux 7.2. To enable scsi-mq, specify scsi_mod.use_blk_mq=y on the kernel command line. The default value is n (disabled).
The device mapper (DM) multipath target, which uses request-based DM, can also be configured to use the blk-mq infrastructure if the dm_mod.use_blk_mq=y kernel option is specified. The default value is n (disabled).
It may be beneficial to set dm_mod.use_blk_mq=y if the underlying SCSI devices are also using blk-mq, as doing so reduces locking overhead at the DM layer.
To determine whether DM multipath is using blk-mq on a system, cat the file /sys/block/dm-X/dm/use_blk_mq, where dm-X is replaced by the DM multipath device of interest. This file is read-only and reflects what the global value in /sys/module/dm_mod/parameters/use_blk_mq was at the time the request-based DM multipath device was created.

New delay_watch_checks and delay_wait_checks options in the multipath.conf file

Should a path be unreliable, as when the connection drops in and out frequently, multipathd will still continuously attempt to use that path. The timeout before multipathd realizes that the path is no longer accessible is 300 seconds, which can give the appearance that multipathd has stalled.
To fix this, two new configuration options have been added: delay_watch_checks and delay_wait_checks. Set the delay_watch_checks to how many cycles multipathd is to watch the path for after it comes online. Should the path fail in under that assigned value, multipathd will not use it. multipathd will then rely on the delay_wait_checks option to tell it how many consecutive cycles it must pass until the path becomes valid again. This prevents unreliable paths from immediately being used as soon as they come back online.

New config_dir option in the multipath.conf file

Users were unable to split their configuration between /etc/multipath.conf and other configuration files. This prevented users from setting up one main configuration file for all their machines and keep machine-specific configuration information in separate configuration files for each machine.
To address this, a new config_dir option was added in the multipath.config file. Users must change the config_dir option to either an empty string or a fully qualified directory path name. When set to anything other than an empty string, multipath will read all .conf files in alphabetical order. It will then apply the configurations exactly as if they had been added to the /etc/multipath.conf. If this change is not made, config_dir defaults to /etc/multipath/conf.d.

New dmstats command to display and manage I/O statistics for regions of devices that use the device-mapper driver

The dmstats command provides userspace support for device-mapper I/O statistics. This allows a user to create, manage and report I/O counters, metrics and latency histogram data for user-defined arbitrary regions of device-mapper devices. Statistics fields are now available in dmsetup reports and the dmstats command adds new specialized reporting modes designed for use with statistics information. For information on the dmstats command, see the dmstats(8) man page.

⁠LVM Cache

LVM cache has been fully supported since Red Hat Enterprise Linux 7.1. This feature allows users to create logical volumes (LVs) with a small fast device performing as a cache to larger slower devices. Refer to the lvmcache(7) manual page for information on creating cache logical volumes.
Note the following restrictions on the use of cache LVs:
* The cache LV must be a top-level device. It cannot be used as a thin-pool LV, an image of a RAID LV, or any other sub-LV type.
* The cache LV sub-LVs (the origin LV, metadata LV, and data LV) can only be of linear, stripe, or RAID type.
* The properties of the cache LV cannot be changed after creation. To change cache properties, remove the cache as described in lvmcache(7) and recreate it with the desired properties.

New LVM/DM cache policy

A new smq dm-cache policy has been written that the reduces memory consumption and improves performance for most use cases. It is now the default cache policy for new LVM cache logical volumes. Users who prefer to use the legacy mq cache policy can still do so by supplying the —cachepolicy argument when creating the cache logical volume.

LVM systemID

LVM volume groups can now be assigned an owner. The volume group owner is the system ID of a host. Only the host with the given system ID can use the VG. This can benefit volume groups that exist on shared devices, visible to multiple hosts, which are otherwise not protected from concurrent use from multiple hosts. LVM volume groups on shared devices with an assigned system ID are owned by one host and protected from other hosts.

New lvmpolld daemon

The lvmpolld daemon provides a polling method for long-running LVM commands. When enabled, control of long-running LVM commands is transferred from the original LVM command to the lvmpolld daemon. This allows the operation to continue independent of the original LVM command. The lvmpolld daemon is enabled by default.
Before the introduction of the lvmpolld daemon, any background polling process originating in an lvm2 command initiated inside a cgroup of a systemd service could get killed if the main process (the main service) exited in the cgroup. This could lead to premature termination of the lvm2 polling process. Additionally, lvmpolld helps to prevent spawning lvm2 polling processes querying for progress on the same task multiple times because it tracks the progress for all polling tasks in progress.
For further information on the lvmpolld daemon, see the lvm.conf configuration file.

Enhancements to LVM selection criteria

The Red Hat Enterprise Linux 7.2 release supports several enhancements to LVM selection criteria. Previously, it was possible to use selection criteria only for reporting commands; LVM now supports selection criteria for several LVM processing commands as well. Additionally, there are several changes in this release to provide better support for time reporting fields and selection.
For information on the implementation of these new features, see the LVM Selection Criteria appendix in the Logical Volume Administration manual.

The default maximum number of SCSI LUNs is increased

The default value for the max_report_luns parameter has been increased from 511 to 16393. This parameter specifies the maximum number of logical units that may be configured when the systems scans the SCSI interconnect using the Report LUNs mechanism.