What is the suggested I/O scheduler to improve disk performance when using Red Hat Enterprise Linux with virtualization?
Red Hat Lightspeed can detect this issue
Environment
- Red Hat Enterprise Linux (RHEL) 4, 5, 6, 7, 8, and 9
- Virtualization, e.g. KVM, Xen, VMware or Microsoft Hyper-V
- Virtualization guest or virtualization host
- Virtual disk
Issue
- What is the recommended I/O scheduler for Red Hat Enterprise Linux as a virtualization host?
Resolution
There is no single one "best" I/O scheduler recommendation which applies to all situations or for any given generic environment. Any changes made to the I/O scheduler should be done in conjunction with testing to determine which one provides the most advantages for the application suite's specific I/O workload that is present.
The following are the common starting recommendations for an I/O scheduler for a RHEL based virtual guest based upon kernel version and disk type.
-
RHEL 8,9 :
mq-deadlineis default I/O scheduler unless otherwise changed [FN.1]- Virtual disks: keep current io scheduler setting (
mq-deadline) - Physical disks: keep current io scheduler setting (
mq-deadline, or for NVMenone)
- Virtual disks: keep current io scheduler setting (
-
RHEL 7.5+ :
deadlineis default io scheduler unless otherwise changed- Virtual disks: keep current io scheduler setting (
deadline) - Physical disks: keep current io scheduler setting (
deadline)
- Virtual disks: keep current io scheduler setting (
-
RHEL 4,5,6,(7.0-7.4) :
cfqis default I/O scheduler unless otherwise changed- Virtual disks: change to
noopscheduler [FN.2] - Physical disks: keep current io scheduler setting
- Virtual disks: change to
Online configuring the I/O scheduler on Red Hat Enterprise Linux
-
See "Can I change the I/O scheduler for a particular disk without the system rebooting?"
-
Determine which schedulers are available:
# cat /sys/block/sda/queue/scheduler [mq-deadline] kyber bfq none -
Change the scheduler for a device and verify using one of the above:
# echo 'none' > /sys/block/sda/queue/scheduler # cat /sys/block/sda/queue/scheduler mq-deadline kyber bfq [none]
-
-
Additional documentation references:
- RHEL 9: Chapter 11. Setting the disk scheduler
- RHEL 8: Chapter 12. Setting the disk scheduler in Monitoring and Managing System Status and Performance
- Chapter 19. Setting the disk scheduler in Managing Storage Devices
- RHEL 7: Section 6.2.1. Configuring the I/O Scheduler for Red Hat Enterprise Linux 7
- RHEL 6: Section 6.4. Configuring the I/O Scheduler
Configuring the I/O scheduler on Red Hat Enterprise Linux 8 and 9
- In RHEL8 and newer new I/O schedulers are available. These are
mq-deadline,none,kyber, andbfq. Note that thenoopscheduler is callednone.- More info on the
noop/noneschedulers can be found in "How to use the Noop or None IO Schedulers " - In RHEL8 and newer, the default scheduler is
mq-deadline.
- More info on the
- The I/O scheduler can be set via using tuned or udev rules. The udev method of setting is often preferred due to its robust configuration options.
Configuring the I/O sheduler via tuned and udev
- Another way to change the default I/O scheduler is to use tuned.
- More information on creating a custom tuned profile can be found in Chapter 2. Customizing TuneD profiles
- "How do I create my own tuned profile on RHEL7 ?"
- The default scheduler in Red Hat Enterprise Linux 4, 5 and 6 is
cfq. The available tuned profiles use thedeadlineelevator. See "How do I create my own tuned profile on RHEL6?" on creating a custom I/O scheduler via tuned.
- See also "How to set a permanent I/O scheduler on one or more specific devices using udev"
Footnotes
-
FN.1 | With the advent of
mq-deadlinebeing the default scheduler, there is no longer a compelling reason to change to thenoneI/O scheduler for virtual disks.- An exception to that would be if the virtual disks are backed by high speed NVMe or NVDIMM technology within the hypervisor and the guest is performing a very large number of I/O per second along with small I/O sizes (4kb). In this corner case, then switching to the
nonescheduler can provide a slight overall improvement in iops and therefore throughput. This is due to the slightly longer execution code path ofmq-deadlinevsnoneschedulers.- Testing with NVMe backed virtual disks between
noneandmq-deadlinewith simple single io depth test showed ~2% difference with block I/O sizes 32kb or larger, and less than 1% difference at 512kb io size. In normal circumstances, with more real life (more complex) I/O loads this difference can be within statistical noise when testing between the two schedulers as it will depend on what other operations the hypervisor is busy with at any given time and whether the NVMe device is utilized by more than one virtual disk and/or more than one guest.
- Testing with NVMe backed virtual disks between
- The
mq-deadlinescheduler is also the scheduler used by thetunedprofilevirtual-hostand the default scheduler in RHEL 8 and 9 (changed tomq-deadlinefromdeadlinein 7.6+) for all but direct attached SATA rotating media disks.
- An exception to that would be if the virtual disks are backed by high speed NVMe or NVDIMM technology within the hypervisor and the guest is performing a very large number of I/O per second along with small I/O sizes (4kb). In this corner case, then switching to the
-
FN.2 | On RHEL 7.5 and earlier: while the default
cfqscheduler is a reasonable choice even for virtual disks within virtual guests, but it does have drawbacks. The main one being it is tuned to maximize I/O to a single rotating media physical disk. Moreover, most hypervisors also perform their own I/O scheduling for the physical resouces behind the virtual disks. And multiple virtual disks can use the same physical storage resource and presented to one or more guests. Under these circumstances switching to thenoopI/O scheduler for virtual disks is recommended. Doing so reduces code path time (e.g. removes the slice idle time incfq) associated withcfqand other schedulers. Usingnoopreduces the time the I/O spends in the scheduler layer of the linux I/O stack and allows the hypervisor to better schedule the I/O against the physical resources it is managing.- If the hypervisor is known to do its own I/O scheduling -- which is normally the case -- then guests often have sufficient benefit from the
noopI/O scheduler versuscfq, andbfq, schedulers to make the recommended switch tonoopas a first step. - If there are heavy I/O loads from a guest, then sometimes switching the guest to
deadlinecan provide a small performance edge over thenoopscheduler -- but it highly dependent on the I/O load itself. For example, a database i/O load of synchronous reads and asynchronous writes can benefit fromdeadlineby biasing dispatch of reads first (that are blocking processes) over writes (which are non-blocking I/O).
- If the hypervisor is known to do its own I/O scheduling -- which is normally the case -- then guests often have sufficient benefit from the
-
FN.3 | The storage technology in use can affect which io scheduler produces the best results for a given configuration and I/O workload.
- If physical disks, iSCSI, SR-IOV pass-through are provisioned to guests, then the
none(noop) scheduler should not be used. Usingnonedoes not allow the linux virtual host to optimize I/O requests in terms of type or order to the underlying physical device. Only the guest itself should perform I/O scheduling in such configurations so choosemq-deadline(ordeadlinedepending on kernel version). - If virtual disks are presented to guests, then for most I/O workloads, the
mq-deadlinescheduler is likely statistically close enough vs usingnone. Given thatmq-deadlineis the default in later kernel versions, there is no compelling reason to change tonone. - If the hypervisor is known to do its own I/O scheduling -- which is normally the case -- then guests often benefit greatly from switching to the
noopI/O scheduler fromcfqorbfqschedulers. There is a much less messurable change in performance if switching tononefrommq-deadlineordeadlineschedulers.- Switching to
noopfromcfqallows the hypervisor to optimize the I/O requests and prioritize based on it's view on all of the I/O from one or multiple guests. The hypervisor receives I/O from the guest in as submitted order within the guest. Within the linux virtual guest, thenoopscheduler can still combine sequential small requests into larger requests before submitting the I/O to the hypervisor. - Switching to
nonefrommq-deadlineresults in a slightly shorter code expecution path associated withnoneovermq-deadline. But in performing the switch, the linux virtual guest looses the ability to prioritize dispatching blocking I/O over non-blocking I/O.
- Switching to
- If physical disks, iSCSI, SR-IOV pass-through are provisioned to guests, then the
-
FN.4 | While there is a significant difference between the default
cfqandnoopuse in RHEL 4, 5, and 6, there is less difference in performance in a virtual disk envoronment betweend the defaultmq-deadlineandnonein RHEL 8 and 9. However, to minimize I/O latency within the guest is more important than maximizing I/O throughput on the guest's I/O workloads then it may be beneficial to switch tononein RHEL 8 and 9 environments. Just be aware that nominal measured differences are typically in the range or +/- 1-3% differences between the two schedulers for virtual disks. But every I/O workload is different - so be sure to perform proper testing within your environment to determine how the scheduler change impacts your specific workload.
Root Cause
Testing
NOTE: All scheduler tuning should be tested under normal operating conditions, as synthetic benchmarks typically do not accurately compare performance of systems using shared resources in virtual environments.
In this document, we refer to testing and comparing multiple schedulers. Some hints:
- All scheduler tuning should be tested under normal operating conditions, as synthetic benchmarks typically do not accurately compare performance of systems using shared resources in virtual environments. Recommendations and defaults are only a place to start.
- Outside of some specific corner cases, the typical change in performance when comparing different schedulers is nominally in the +/- 5% range. Its very unusual, even in corner cases like all sequential reads for video streaming, to see more than a 10-20% improvement in I/O performance via just a scheduler change. So desiring a 5-10x improvement by finding the right scheduler is not very likely to happen.
- One should first be clear about the goal or the goals one wants to optimize for. Do I want as many I/O as possible to storage? Do I want to optimize an application to provide service in a certain way, for example "this apache webserver should be able to hand out as many static files (fetched from storage) as possible"?
- With the goal clear, one can decide on the best tool to measure. Applications can then be started, and measured. Not changing the conditions, several schedulers can be tried out, and the measurement might change.
- Special attention should be payed to mutual influence of the components. A RHEL might host 10 KVM guests, and each of the guests various applications. Benchmarking should consider this whole system.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments