2.4.4. Monitoring Storage

Monitoring storage normally takes place at two different levels:
  • Monitoring for sufficient disk space
  • Monitoring for storage-related performance problems
The reason for this is that it is possible to have dire problems in one area and no problems whatsoever in the other. For example, it is possible to cause a disk drive to run out of disk space without once causing any kind of performance-related problems. Likewise, it is possible to have a disk drive that has 99% free space, yet is being pushed past its limits in terms of performance.
However, it is more likely that the average system experiences varying degrees of resource shortages in both areas. Because of this, it is also likely that -- to some extent -- problems in one area impact the other. Most often this type of interaction takes the form of poorer and poorer I/O performance as a disk drive nears 0% free space although, in cases of extreme I/O loads, it might be possible to slow I/O throughput to such a level that applications no longer run properly.
In any case, the following statistics are useful for monitoring storage:
Free Space
Free space is probably the one resource all system administrators watch closely; it would be a rare administrator that never checks on free space (or has some automated way of doing so).
File System-Related Statistics
These statistics (such as number of files/directories, average file size, etc.) provide additional detail over a single free space percentage. As such, these statistics make it possible for system administrators to configure the system to give the best performance, as the I/O load imposed by a file system full of many small files is not the same as that imposed by a file system filled with a single massive file.
Transfers per Second
This statistic is a good way of determining whether a particular device's bandwidth limitations are being reached.
Reads/Writes per Second
A slightly more detailed breakdown of transfers per second, these statistics allow the system administrator to more fully understand the nature of the I/O loads a storage device is experiencing. This can be critical, as some storage technologies have widely different performance characteristics for read versus write operations.