6.4.1. Completely Fair Queuing (CFQ)
ionicecommand, or programmatically assigned via the
ioprio_setsystem call. By default, processes are placed in the best-effort scheduling class. The real-time and best-effort scheduling classes are further subdivided into eight I/O priorities within each class, priority 0 being the highest and 7 the lowest. Processes in the real-time scheduling class are scheduled much more aggressively than processes in either best-effort or idle, so any scheduled real-time I/O is always performed before best-effort or idle I/O. This means that real-time priority I/O can starve out both the best-effort and idle classes. Best effort scheduling is the default scheduling class, and 4 is the default priority within this class. Processes in the idle scheduling class are only serviced when there is no other I/O pending in the system. Thus, it is very important to only set the I/O scheduling class of a process to idle if I/O from the process is not at all required for making forward progress.
slice_idle = 0 quantum = 64 group_idle = 1
group_idleis set to 1, there is still the potential for I/O stalls (whereby the back-end storage is not busy due to idling). However, these stalls will be less frequent than idling on every queue in the system.
- Backward seeks are typically bad for performance, as they can incur greater delays in repositioning the heads than forward seeks do. However, CFQ will still perform them, if they are small enough. This tunable controls the maximum distance in KB the I/O scheduler will allow backward seeks. The default is
- Because of the inefficiency of backward seeks, a penalty is associated with each one. The penalty is a multiplier; for example, consider a disk head position at 1024KB. Assume there are two requests in the queue, one at 1008KB and another at 1040KB. The two requests are equidistant from the current head position. However, after applying the back seek penalty (default: 2), the request at the later position on disk is now twice as close as the earlier request. Thus, the head will move forward.
- This tunable controls how long an async (buffered write) request can go unserviced. After the expiration time (in milliseconds), a single starved async request will be moved to the dispatch list. The default is
- This is the same as the fifo_expire_async tunable, for for synchronous (read and O_DIRECT write) requests. The default is
- When set, CFQ will idle on the last process issuing I/O in a cgroup. This should be set to
1when using proportional weight I/O cgroups and setting
0(typically done on fast storage).
- If group isolation is enabled (set to
1), it provides a stronger isolation between groups at the expense of throughput. Generally speaking, if group isolation is disabled, fairness is provided for sequential workloads only. Enabling group isolation provides fairness for both sequential and random workloads. The default value is
0(disabled). Refer to
Documentation/cgroups/blkio-controller.txtfor further information.
- When low latency is enabled (set to
1), CFQ attempts to provide a maximum wait time of 300 ms for each process issuing I/O on a device. This favors fairness over throughput. Disabling low latency (setting it to
0) ignores target latency, allowing each process in the system to get a full time slice. Low latency is enabled by default.
- The quantum controls the number of I/Os that CFQ will send to the storage at a time, essentially limiting the device queue depth. By default, this is set to
8. The storage may support much deeper queue depths, but increasing
quantumwill also have a negative impact on latency, especially in the presence of large sequential write workloads.
- This tunable controls the time slice allotted to each process issuing asynchronous (buffered write) I/O. By default it is set to
- This specifies how long CFQ should idle while waiting for further requests. The default value in Red Hat Enterprise Linux 6.1 and earlier is
8ms. In Red Hat Enterprise Linux 6.2 and later, the default value is
0. The zero value improves the throughput of external RAID storage by removing all idling at the queue and service tree level. However, a zero value can degrade throughput on internal non-RAID storage, because it increases the overall number of seeks. For non-RAID storage, we recommend a
slice_idlevalue that is greater than 0.
- This tunable dictates the time slice allotted to a process issuing synchronous (read or direct write) I/O. The default is