What is the suggested I/O scheduler to improve disk performance when using Red Hat Enterprise Linux with virtualization?

Solution Verified - Updated -

Red Hat Insights can detect this issue

Proactively detect and remediate issues impacting your systems.
View matching systems and remediation

Environment

  • Red Hat Enterprise Linux (RHEL) 4, 5, 6, 7, 8, and 9
  • Virtualization, e.g. KVM, Xen, VMware or Microsoft Hyper-V
  • Virtualization guest or virtualization host
  • Virtual disk

Issue

  • What is the recommended I/O scheduler for Red Hat Enterprise Linux as a virtualization host?

Resolution

There is no single one "best" I/O scheduler recommendation which applies to all situations or for any given generic environment. Any changes made to the I/O scheduler should be done in conjunction with testing to determine which one provides the most advantages for the application suite's specific I/O workload that is present.

The following are the common starting recommendations for an I/O scheduler for a RHEL based virtual guest based upon kernel version and disk type.

Online configuring the I/O scheduler on Red Hat Enterprise Linux

Configuring the I/O scheduler on Red Hat Enterprise Linux 8 and 9

  • In RHEL8 and newer new I/O schedulers are available. These are mq-deadline, none, kyber, and bfq. Note that the noop scheduler is called none.
  • The I/O scheduler can be set via using tuned or udev rules. The udev method of setting is often preferred due to its robust configuration options.

Configuring the I/O sheduler via tuned and udev

Footnotes

  • FN.1 | With the advent of mq-deadline being the default scheduler, there is no longer a compelling reason to change to the none I/O scheduler for virtual disks.

    • An exception to that would be if the virtual disks are backed by high speed NVMe or NVDIMM technology within the hypervisor and the guest is performing a very large number of I/O per second along with small I/O sizes (4kb). In this corner case, then switching to the none scheduler can provide a slight overall improvement in iops and therefore throughput. This is due to the slightly longer execution code path of mq-deadline vs none schedulers.
      • Testing with NVMe backed virtual disks between none and mq-deadline with simple single io depth test showed ~2% difference with block I/O sizes 32kb or larger, and less than 1% difference at 512kb io size. In normal circumstances, with more real life (more complex) I/O loads this difference can be within statistical noise when testing between the two schedulers as it will depend on what other operations the hypervisor is busy with at any given time and whether the NVMe device is utilized by more than one virtual disk and/or more than one guest.
    • The mq-deadline scheduler is also the scheduler used by the tuned profile virtual-host and the default scheduler in RHEL 8 and 9 (changed to mq-deadline from deadline in 7.6+) for all but direct attached SATA rotating media disks.
  • FN.2 | On RHEL 7.5 and earlier: while the default cfq scheduler is a reasonable choice even for virtual disks within virtual guests, but it does have drawbacks. The main one being it is tuned to maximize I/O to a single rotating media physical disk. Moreover, most hypervisors also perform their own I/O scheduling for the physical resouces behind the virtual disks. And multiple virtual disks can use the same physical storage resource and presented to one or more guests. Under these circumstances switching to the noop I/O scheduler for virtual disks is recommended. Doing so reduces code path time (e.g. removes the slice idle time in cfq) associated with cfq and other schedulers. Using noop reduces the time the I/O spends in the scheduler layer of the linux I/O stack and allows the hypervisor to better schedule the I/O against the physical resources it is managing.

    • If the hypervisor is known to do its own I/O scheduling -- which is normally the case -- then guests often have sufficient benefit from the noop I/O scheduler versus cfq, and bfq, schedulers to make the recommended switch to noop as a first step.
    • If there are heavy I/O loads from a guest, then sometimes switching the guest to deadline can provide a small performance edge over the noop scheduler -- but it highly dependent on the I/O load itself. For example, a database i/O load of synchronous reads and asynchronous writes can benefit from deadline by biasing dispatch of reads first (that are blocking processes) over writes (which are non-blocking I/O).
  • FN.3 | The storage technology in use can affect which io scheduler produces the best results for a given configuration and I/O workload.

    • If physical disks, iSCSI, SR-IOV pass-through are provisioned to guests, then the none (noop) scheduler should not be used. Using none does not allow the linux virtual host to optimize I/O requests in terms of type or order to the underlying physical device. Only the guest itself should perform I/O scheduling in such configurations so choose mq-deadline (or deadline depending on kernel version).
    • If virtual disks are presented to guests, then for most I/O workloads, the mq-deadline scheduler is likely statistically close enough vs using none. Given that mq-deadline is the default in later kernel versions, there is no compelling reason to change to none.
    • If the hypervisor is known to do its own I/O scheduling -- which is normally the case -- then guests often benefit greatly from switching to the noop I/O scheduler from cfq or bfq schedulers. There is a much less messurable change in performance if switching to none from mq-deadline or deadline schedulers.
      • Switching to noop from cfq allows the hypervisor to optimize the I/O requests and prioritize based on it's view on all of the I/O from one or multiple guests. The hypervisor receives I/O from the guest in as submitted order within the guest. Within the linux virtual guest, the noop scheduler can still combine sequential small requests into larger requests before submitting the I/O to the hypervisor.
      • Switching to none from mq-deadline results in a slightly shorter code expecution path associated with none over mq-deadline. But in performing the switch, the linux virtual guest looses the ability to prioritize dispatching blocking I/O over non-blocking I/O.
  • FN.4 | While there is a significant difference between the default cfq and noop use in RHEL 4, 5, and 6, there is less difference in performance in a virtual disk envoronment betweend the default mq-deadline and none in RHEL 8 and 9. However, to minimize I/O latency within the guest is more important than maximizing I/O throughput on the guest's I/O workloads then it may be beneficial to switch to none in RHEL 8 and 9 environments. Just be aware that nominal measured differences are typically in the range or +/- 1-3% differences between the two schedulers for virtual disks. But every I/O workload is different - so be sure to perform proper testing within your environment to determine how the scheduler change impacts your specific workload.

Root Cause

Testing

NOTE: All scheduler tuning should be tested under normal operating conditions, as synthetic benchmarks typically do not accurately compare performance of systems using shared resources in virtual environments.

In this document, we refer to testing and comparing multiple schedulers. Some hints:

  • All scheduler tuning should be tested under normal operating conditions, as synthetic benchmarks typically do not accurately compare performance of systems using shared resources in virtual environments. Recommendations and defaults are only a place to start.
    • Outside of some specific corner cases, the typical change in performance when comparing different schedulers is nominally in the +/- 5% range. Its very unusual, even in corner cases like all sequential reads for video streaming, to see more than a 10-20% improvement in I/O performance via just a scheduler change. So desiring a 5-10x improvement by finding the right scheduler is not very likely to happen.
  • One should first be clear about the goal or the goals one wants to optimize for. Do I want as many I/O as possible to storage? Do I want to optimize an application to provide service in a certain way, for example "this apache webserver should be able to hand out as many static files (fetched from storage) as possible"?
  • With the goal clear, one can decide on the best tool to measure. Applications can then be started, and measured. Not changing the conditions, several schedulers can be tried out, and the measurement might change.
  • Special attention should be payed to mutual influence of the components. A RHEL might host 10 KVM guests, and each of the guests various applications. Benchmarking should consider this whole system.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments