How can I monitor/record IO wait time?
Trying to get a closer look at some systems that crunch a lot of data in the AM hours. Looking for bottlenecks. What can I use to monitor CPU wait time? If I wanted to sit up between 4 and 6am to watch "top" - I would... but I'd rather not do that.
Looking for a utility like MRTG/PRTG/Orion, etc that can give me this information.
What do you guys use?
Responses
Might want to look at vmstat:
http://linux.die.net/man/8/vmstat
Also iostat:
http://linux.die.net/man/1/iostat
You could write a quick script to chop up output of vmstat and write to a file, then drop it in cron and let it fill up there. Not sure as far as the graphing utility. We use Nimsoft, which can simply read the contents of a file and use them in a graph, so we could just read it and graph the contents. I bet Nagios can do the same (not 100% on that), and I bet many/most other monitoring suites could do the same.
If you are just interested in the monitoring software, we use Nimsoft and really like it. It's pretty complex and without a good idea of Nimsoft beforehand it's nearly impossible to set any alarming/graphing up, but it's very versatile and powerful. So, generally, you need a dedicated Nimsoft team. Nagios could most likely do the same thing though.
yum -y install sysstat
You can modify the /etc/cron.d/sysstat cronjob to add the "-d" flag to sa1 to gather disk statistics over time (if you are looking for disk access latency/performance.)
Then, use sar -f /var/log/sa/saXX, where XX is the day of the month.
You can use all of the regular sar options, to see things like CPU usage (including %iowait) and individual disk access latency/performance.
[root@m0000001 sa]# sar -f sa01
Linux 2.6.18-194.26.1.el5.centos.plus (m0000001) 08/01/2011
12:00:01 AM CPU %user %nice %system %iowait %steal %idle
12:10:01 AM all 1.63 0.00 0.86 0.03 0.00 97.48
12:20:01 AM all 1.70 0.00 0.87 1.09 0.00 96.34
12:30:01 AM all 1.65 0.00 0.90 0.02 0.00 97.42
12:40:01 AM all 2.37 0.00 1.24 0.03 0.00 96.35
12:50:01 AM all 1.57 0.00 0.84 0.03 0.00 97.56
I use this:
http://sourceforge.net/projects/pumpedup/
but only because I wrote it.
It's written in perl and runs on Linux, Solaris, and Windows.
It allows down to the second graphing of performance stats, 24x7, including disk read and write wait in ms.
Hope that helps.
Axel