How can I monitor/record IO wait time?

Latest response

Trying to get a closer look at some systems that crunch a lot of data in the AM hours. Looking for bottlenecks. What can I use to monitor CPU wait time? If I wanted to sit up between 4 and 6am to watch "top" - I would... but I'd rather not do that. 

 

Looking for a utility like MRTG/PRTG/Orion, etc that can give me this information. 

 

What do you guys use? 

Responses

Might want to look at vmstat:

http://linux.die.net/man/8/vmstat

 

Also iostat:

http://linux.die.net/man/1/iostat

 

You could write a quick script to chop up output of vmstat and write to a file, then drop it in cron and let it fill up there. Not sure as far as the graphing utility. We use Nimsoft, which can simply read the contents of a file and use them in a graph, so we could just read it and graph the contents. I bet Nagios can do the same (not 100% on that), and I bet many/most other monitoring suites could do the same.

 

If you are just interested in the monitoring software, we use Nimsoft and really like it. It's pretty complex and without a good idea of Nimsoft beforehand it's nearly impossible to set any alarming/graphing up, but it's very versatile and powerful. So, generally, you need a dedicated Nimsoft team. Nagios could most likely do the same thing though.

yum -y install sysstat

You can modify the /etc/cron.d/sysstat cronjob to add the "-d" flag to sa1 to gather disk statistics over time (if you are looking for disk access latency/performance.)

Then, use sar -f /var/log/sa/saXX, where XX is the day of the month.
You can use all of the regular sar options, to see things like CPU usage (including %iowait) and individual disk access latency/performance.

...why the RHEL sadc configs don't collect diskstats by default. Other UNIX implementations of sadc collect the disk-related data by default.

[root@m0000001 sa]# sar -f sa01
Linux 2.6.18-194.26.1.el5.centos.plus (m0000001) 08/01/2011

12:00:01 AM CPU %user %nice %system %iowait %steal %idle
12:10:01 AM all 1.63 0.00 0.86 0.03 0.00 97.48
12:20:01 AM all 1.70 0.00 0.87 1.09 0.00 96.34
12:30:01 AM all 1.65 0.00 0.90 0.02 0.00 97.42
12:40:01 AM all 2.37 0.00 1.24 0.03 0.00 96.35
12:50:01 AM all 1.57 0.00 0.84 0.03 0.00 97.56

 

 
root@setl202:~$ cat /etc/cron.d/sysstat
# run system activity accounting tool every 10 minutes
*/10 * * * * root /usr/lib64/sa/sa1 -d 1 1
# generate a daily summary of process accounting at 23:53
53 23 * * * root /usr/lib64/sa/sa2 -A
 
 
I added the "-d" but the output of sar looks to be the same... I have what you have posted above, which I had before adding the d... (I'm on RHEL 5.5) - does that seem correct to you?

Looks good. You will want to add the -d option to sar to see disk access statistics collected by sadc/sa1. Now that you've modified the cron job, try:

sar -d -f /var/log/sa/saXX

Where is the -d flag documented?

It is in the man page for sadc(8) in RHEL 5.

Please note that the parameters may have changed for sadc in a later release. It looks like they are using "-S DISK" now in RHEL 6.

I use this:

http://sourceforge.net/projects/pumpedup/

 

but only because I wrote it.

 

It's written in perl and runs on Linux, Solaris, and Windows.

 

It allows down to the second graphing of performance stats, 24x7, including disk read and write wait in ms.

 

Pumped Up Graphs

 

Hope that helps.

 

Axel

Of a cross between MRTG and Sarge.