3.3. Red Hat Enterprise Linux-Specific Information

Monitoring bandwidth and CPU utilization under Red Hat Enterprise Linux entails using the tools discussed in Chapter 2, Resource Monitoring; therefore, if you have not yet read that chapter, you should do so before continuing.

3.3.1. Monitoring Bandwidth on Red Hat Enterprise Linux

As stated in Section 2.4.2, “Monitoring Bandwidth”, it is difficult to directly monitor bandwidth utilization. However, by examining device-level statistics, it is possible to roughly gauge whether insufficient bandwidth is an issue on your system.
By using vmstat, it is possible to determine if overall device activity is excessive by examining the bi and bo fields; in addition, taking note of the si and so fields give you a bit more insight into how much disk activity is due to swap-related I/O:
 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 0 0 248088 158636 480804 0 0 2 6 120 120 10 3 87 0 
In this example, the bi field shows two blocks/second written to block devices (primarily disk drives), while the bo field shows six blocks/second read from block devices. We can determine that none of this activity was due to swapping, as the si and so fields both show a swap-related I/O rate of zero kilobytes/second.
By using iostat, it is possible to gain a bit more insight into disk-related activity:
 Linux 2.4.21-1.1931.2.349.2.2.entsmp (raptor.example.com) 07/21/2003 avg-cpu: %user %nice %sys %idle 5.34 4.60 2.83 87.24 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn dev8-0 1.10 6.21 25.08 961342 3881610 dev8-1 0.00 0.00 0.00 16 0 
This output shows us that the device with major number 8 (which is /dev/sda, the first SCSI disk) averaged slightly more than one I/O operation per second (the tsp field). Most of the I/O activity for this device were writes (the Blk_wrtn field), with slightly more than 25 blocks written each second (the Blk_wrtn/s field).
If more detail is required, use iostat's -x option:
 Linux 2.4.21-1.1931.2.349.2.2.entsmp (raptor.example.com) 07/21/2003 avg-cpu: %user %nice %sys %idle 5.37 4.54 2.81 87.27 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz /dev/sda 13.57 2.86 0.36 0.77 32.20 29.05 16.10 14.53 54.52 /dev/sda1 0.17 0.00 0.00 0.00 0.34 0.00 0.17 0.00 133.40 /dev/sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 11.56 /dev/sda3 0.31 2.11 0.29 0.62 4.74 21.80 2.37 10.90 29.42 /dev/sda4 0.09 0.75 0.04 0.15 1.06 7.24 0.53 3.62 43.01 
Over and above the longer lines containing more fields, the first thing to keep in mind is that this iostat output is now displaying statistics on a per-partition level. By using df to associate mount points with device names, it is possible to use this report to determine if, for example, the partition containing /home/ is experiencing an excessive workload.
Actually, each line output from iostat -x is longer and contains more information than this; here is the remainder of each line (with the device column added for easier reading):
 Device: avgqu-sz await svctm %util /dev/sda 0.24 20.86 3.80 0.43 /dev/sda1 0.00 141.18 122.73 0.03 /dev/sda2 0.00 6.00 6.00 0.00 /dev/sda3 0.12 12.84 2.68 0.24 /dev/sda4 0.11 57.47 8.94 0.17 
In this example, it is interesting to note that /dev/sda2 is the system swap partition; it is obvious from the many fields reading 0.00 for this partition that swapping is not a problem on this system.
Another interesting point to note is /dev/sda1. The statistics for this partition are unusual; the overall activity seems low, but why are the average I/O request size (the avgrq-sz field), average wait time (the await field), and the average service time (the svctm field) so much larger than the other partitions? The answer is that this partition contains the /boot/ directory, which is where the kernel and initial ramdisk are stored. When the system boots, the read I/Os (notice that only the rsec/s and rkB/s fields are non-zero; no writing is done here on a regular basis) used during the boot process are for large numbers of blocks, resulting in the relatively long wait and service times iostat displays.
It is possible to use sar for a longer-term overview of I/O statistics; for example, sar -b displays a general I/O report:
 Linux 2.4.21-1.1931.2.349.2.2.entsmp (raptor.example.com) 07/21/2003 12:00:00 AM tps rtps wtps bread/s bwrtn/s 12:10:00 AM 0.51 0.01 0.50 0.25 14.32 12:20:01 AM 0.48 0.00 0.48 0.00 13.32 … 06:00:02 PM 1.24 0.00 1.24 0.01 36.23 Average: 1.11 0.31 0.80 68.14 34.79 
Here, like iostat's initial display, the statistics are grouped for all block devices.
Another I/O-related report is produced using sar -d:
 Linux 2.4.21-1.1931.2.349.2.2.entsmp (raptor.example.com) 07/21/2003 12:00:00 AM DEV tps sect/s 12:10:00 AM dev8-0 0.51 14.57 12:10:00 AM dev8-1 0.00 0.00 12:20:01 AM dev8-0 0.48 13.32 12:20:01 AM dev8-1 0.00 0.00 … 06:00:02 PM dev8-0 1.24 36.25 06:00:02 PM dev8-1 0.00 0.00 Average: dev8-0 1.11 102.93 Average: dev8-1 0.00 0.00 
This report provides per-device information, but with little detail.
While there are no explicit statistics showing bandwidth utilization for a given bus or datapath, we can at least determine what the devices are doing and use their activity to indirectly determine the bus loading.