- In some cases, the overhead of I/O events contributing to the entropy pool for
/dev/randomis measurable. In such cases, it may be desirable to set this value to 0.
- By default, the maximum request size sent to disk is
512KB. This tunable can be used to either raise or lower that value. The minimum value is limited by the logical block size; the maximum value is limited by
max_hw_sectors_kb. There are some SSDs which perform worse when I/O sizes exceed the internal erase block size. In such cases, it is recommended to tune
max_hw_sectors_kbdown to the erase block size. You can test for this using an I/O generator such as iozone or aio-stress, varying the record size from, for example,
- This tunable is primarily a debugging aid. Most workloads benefit from request merging (even on faster storage such as SSDs). In some cases, however, it is desirable to disable merging, such as when you want to see how many IOPS a storage back-end can process without disabling read-ahead or performing random I/O.
- Each request queue has a limit on the total number of request descriptors that can be allocated for each of read and write I/Os. By default, the number is
128, meaning 128 reads and 128 writes can be queued at a time before putting a process to sleep. The process put to sleep is the next to try to allocate a request, not necessarily the process that has allocated all of the available requests.If you have a latency-sensitive application, then you should consider lowering the value of
nr_requestsin your request queue and limiting the command queue depth on the storage to a low number (even as low as
1), so that writeback I/O cannot allocate all of the available request descriptors and fill up the device queue with write I/O. Once
nr_requestshave been allocated, all other processes attempting to perform I/O will be put to sleep to wait for requests to become available. This makes things more fair, as the requests are then distributed in a round-robin fashion (instead of letting one process consume them all in rapid succession). Note that this is only a problem when using the deadline or noop schedulers, as the default CFQ configuration protects against this situation.
- In some circumstances, the underlying storage will report an optimal I/O size. This is most common in hardware and software RAID, where the optimal I/O size is the stripe size. If this value is reported, applications should issue I/O aligned to and in multiples of the optimal I/O size whenever possible.
- The operating system can detect when an application is reading data sequentially from a file or from disk. In such cases, it performs an intelligent read-ahead algorithm, whereby more data than is requested by the user is read from disk. Thus, when the user next attempts to read a block of data, it will already by in the operating system's page cache. The potential down side to this is that the operating system can read more data from disk than necessary, which occupies space in the page cache until it is evicted because of high memory pressure. Having multiple processes doing false read-ahead would increase memory pressure in this circumstance.For device mapper devices, it is often a good idea to increase the value of
read_ahead_kbto a large number, such as
8192. The reason is that a device mapper device is often made up of multiple underlying devices. Setting this value to the default (
128KB) multiplied by the number of devices you are mapping is a good starting point for tuning.
- Traditional hard disks have been rotational (made up of spinning platters). SSDs, however, are not. Most SSDs will advertise this properly. If, however, you come across a device that does not advertise this flag properly, it may be necessary to set rotational to
0manually; when rotational is disabled, the I/O elevator does not use logic that is meant to reduce seeks, since there is little penalty for seek operations on non-rotational media.
- I/O completions can be processed on a different CPU from the one that issued the I/O. Setting
1causes the kernel to deliver completions to the CPU on which the I/O was issued. This can improve CPU data caching effectiveness.