What is the suggested I/O scheduler to improve disk performance when using Red Hat Enterprise Linux with virtualization?
Red Hat Insights can detect this issue
Environment
- Red Hat Enterprise Linux (RHEL) 4, 5, 6, 7, 8, and 9
- Virtualization, e.g. KVM, Xen, VMware or Microsoft Hyper-V
- Virtualization guest or virtualization host
- Virtual disk
Issue
- What is the recommended I/O scheduler for Red Hat Enterprise Linux as a virtualization host?
Resolution
There is no single one "best" I/O scheduler recommendation which applies to all situations or for any given generic environment. Any changes made to the I/O scheduler should be done in conjunction with testing to determine which one provides the most advantages for the application suite's specific I/O workload that is present.
The following are the common starting recommendations for an I/O scheduler for a RHEL based virtual guest based upon kernel version and disk type.
-
RHEL 8,9 :
mq-deadline
is default I/O scheduler unless otherwise changed [FN.1]- Virtual disks: keep current io scheduler setting (
mq-deadline
) - Physical disks: keep current io scheduler setting (
mq-deadline
, or for NVMenone
)
- Virtual disks: keep current io scheduler setting (
-
RHEL 7.5+ :
deadline
is default io scheduler unless otherwise changed- Virtual disks: keep current io scheduler setting (
deadline
) - Physical disks: keep current io scheduler setting (
deadline
)
- Virtual disks: keep current io scheduler setting (
-
RHEL 4,5,6,(7.0-7.4) :
cfq
is default I/O scheduler unless otherwise changed- Virtual disks: change to
noop
scheduler [FN.2] - Physical disks: keep current io scheduler setting
- Virtual disks: change to
Online configuring the I/O scheduler on Red Hat Enterprise Linux
-
See "Can I change the I/O scheduler for a particular disk without the system rebooting?"
-
Determine which schedulers are available:
# cat /sys/block/sda/queue/scheduler [mq-deadline] kyber bfq none
-
Change the scheduler for a device and verify using one of the above:
# echo 'none' > /sys/block/sda/queue/scheduler # cat /sys/block/sda/queue/scheduler mq-deadline kyber bfq [none]
-
-
Additional documentation references:
- RHEL 9: Chapter 11. Setting the disk scheduler
- RHEL 8: Chapter 12. Setting the disk scheduler in Monitoring and Managing System Status and Performance
- Chapter 19. Setting the disk scheduler in Managing Storage Devices
- RHEL 7: Section 6.2.1. Configuring the I/O Scheduler for Red Hat Enterprise Linux 7
- RHEL 6: Section 6.4. Configuring the I/O Scheduler
Configuring the I/O scheduler on Red Hat Enterprise Linux 8 and 9
- In RHEL8 and newer new I/O schedulers are available. These are
mq-deadline
,none
,kyber
, andbfq
. Note that thenoop
scheduler is callednone
.- More info on the
noop
/none
schedulers can be found in "How to use the Noop or None IO Schedulers " - In RHEL8 and newer, the default scheduler is
mq-deadline
.
- More info on the
- The I/O scheduler can be set via using tuned or udev rules. The udev method of setting is often preferred due to its robust configuration options.
Configuring the I/O sheduler via tuned
and udev
- Another way to change the default I/O scheduler is to use tuned.
- More information on creating a custom tuned profile can be found in Chapter 2. Customizing TuneD profiles
- "How do I create my own tuned profile on RHEL7 ?"
- The default scheduler in Red Hat Enterprise Linux 4, 5 and 6 is
cfq
. The available tuned profiles use thedeadline
elevator. See "How do I create my own tuned profile on RHEL6?" on creating a custom I/O scheduler via tuned.
- See also "How to set a permanent I/O scheduler on one or more specific devices using udev"
Footnotes
-
FN.1 | With the advent of
mq-deadline
being the default scheduler, there is no longer a compelling reason to change to thenone
I/O scheduler for virtual disks.- An exception to that would be if the virtual disks are backed by high speed NVMe or NVDIMM technology within the hypervisor and the guest is performing a very large number of I/O per second along with small I/O sizes (4kb). In this corner case, then switching to the
none
scheduler can provide a slight overall improvement in iops and therefore throughput. This is due to the slightly longer execution code path ofmq-deadline
vsnone
schedulers.- Testing with NVMe backed virtual disks between
none
andmq-deadline
with simple single io depth test showed ~2% difference with block I/O sizes 32kb or larger, and less than 1% difference at 512kb io size. In normal circumstances, with more real life (more complex) I/O loads this difference can be within statistical noise when testing between the two schedulers as it will depend on what other operations the hypervisor is busy with at any given time and whether the NVMe device is utilized by more than one virtual disk and/or more than one guest.
- Testing with NVMe backed virtual disks between
- The
mq-deadline
scheduler is also the scheduler used by thetuned
profilevirtual-host
and the default scheduler in RHEL 8 and 9 (changed tomq-deadline
fromdeadline
in 7.6+) for all but direct attached SATA rotating media disks.
- An exception to that would be if the virtual disks are backed by high speed NVMe or NVDIMM technology within the hypervisor and the guest is performing a very large number of I/O per second along with small I/O sizes (4kb). In this corner case, then switching to the
-
FN.2 | On RHEL 7.5 and earlier: while the default
cfq
scheduler is a reasonable choice even for virtual disks within virtual guests, but it does have drawbacks. The main one being it is tuned to maximize I/O to a single rotating media physical disk. Moreover, most hypervisors also perform their own I/O scheduling for the physical resouces behind the virtual disks. And multiple virtual disks can use the same physical storage resource and presented to one or more guests. Under these circumstances switching to thenoop
I/O scheduler for virtual disks is recommended. Doing so reduces code path time (e.g. removes the slice idle time incfq
) associated withcfq
and other schedulers. Usingnoop
reduces the time the I/O spends in the scheduler layer of the linux I/O stack and allows the hypervisor to better schedule the I/O against the physical resources it is managing.- If the hypervisor is known to do its own I/O scheduling -- which is normally the case -- then guests often have sufficient benefit from the
noop
I/O scheduler versuscfq
, andbfq
, schedulers to make the recommended switch tonoop
as a first step. - If there are heavy I/O loads from a guest, then sometimes switching the guest to
deadline
can provide a small performance edge over thenoop
scheduler -- but it highly dependent on the I/O load itself. For example, a database i/O load of synchronous reads and asynchronous writes can benefit fromdeadline
by biasing dispatch of reads first (that are blocking processes) over writes (which are non-blocking I/O).
- If the hypervisor is known to do its own I/O scheduling -- which is normally the case -- then guests often have sufficient benefit from the
-
FN.3 | The storage technology in use can affect which io scheduler produces the best results for a given configuration and I/O workload.
- If physical disks, iSCSI, SR-IOV pass-through are provisioned to guests, then the
none
(noop) scheduler should not be used. Usingnone
does not allow the linux virtual host to optimize I/O requests in terms of type or order to the underlying physical device. Only the guest itself should perform I/O scheduling in such configurations so choosemq-deadline
(ordeadline
depending on kernel version). - If virtual disks are presented to guests, then for most I/O workloads, the
mq-deadline
scheduler is likely statistically close enough vs usingnone
. Given thatmq-deadline
is the default in later kernel versions, there is no compelling reason to change tonone
. - If the hypervisor is known to do its own I/O scheduling -- which is normally the case -- then guests often benefit greatly from switching to the
noop
I/O scheduler fromcfq
orbfq
schedulers. There is a much less messurable change in performance if switching tonone
frommq-deadline
ordeadline
schedulers.- Switching to
noop
fromcfq
allows the hypervisor to optimize the I/O requests and prioritize based on it's view on all of the I/O from one or multiple guests. The hypervisor receives I/O from the guest in as submitted order within the guest. Within the linux virtual guest, thenoop
scheduler can still combine sequential small requests into larger requests before submitting the I/O to the hypervisor. - Switching to
none
frommq-deadline
results in a slightly shorter code expecution path associated withnone
overmq-deadline
. But in performing the switch, the linux virtual guest looses the ability to prioritize dispatching blocking I/O over non-blocking I/O.
- Switching to
- If physical disks, iSCSI, SR-IOV pass-through are provisioned to guests, then the
-
FN.4 | While there is a significant difference between the default
cfq
andnoop
use in RHEL 4, 5, and 6, there is less difference in performance in a virtual disk envoronment betweend the defaultmq-deadline
andnone
in RHEL 8 and 9. However, to minimize I/O latency within the guest is more important than maximizing I/O throughput on the guest's I/O workloads then it may be beneficial to switch tonone
in RHEL 8 and 9 environments. Just be aware that nominal measured differences are typically in the range or +/- 1-3% differences between the two schedulers for virtual disks. But every I/O workload is different - so be sure to perform proper testing within your environment to determine how the scheduler change impacts your specific workload.
Root Cause
Testing
NOTE: All scheduler tuning should be tested under normal operating conditions, as synthetic benchmarks typically do not accurately compare performance of systems using shared resources in virtual environments.
In this document, we refer to testing and comparing multiple schedulers. Some hints:
- All scheduler tuning should be tested under normal operating conditions, as synthetic benchmarks typically do not accurately compare performance of systems using shared resources in virtual environments. Recommendations and defaults are only a place to start.
- Outside of some specific corner cases, the typical change in performance when comparing different schedulers is nominally in the +/- 5% range. Its very unusual, even in corner cases like all sequential reads for video streaming, to see more than a 10-20% improvement in I/O performance via just a scheduler change. So desiring a 5-10x improvement by finding the right scheduler is not very likely to happen.
- One should first be clear about the goal or the goals one wants to optimize for. Do I want as many I/O as possible to storage? Do I want to optimize an application to provide service in a certain way, for example "this apache webserver should be able to hand out as many static files (fetched from storage) as possible"?
- With the goal clear, one can decide on the best tool to measure. Applications can then be started, and measured. Not changing the conditions, several schedulers can be tried out, and the measurement might change.
- Special attention should be payed to mutual influence of the components. A RHEL might host 10 KVM guests, and each of the guests various applications. Benchmarking should consider this whole system.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments