per process I/O data

Latest response

We are running RHEL 6.10 ( 2.6.32-754.2.1.el6.x86_64), I am trying to get per process I/O utilization data.
First I tried pidstat, but it produces some meaningless results, this is part of the output from 'pidstat -d -p ALL 5 60000':

22:41:04          PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
22:41:04        18285      0.00 3602879701896396.00      0.00  xxxx
22:41:04        18302      0.00      0.00      0.00  xxxx
22:41:04        18303      0.00 3602879701893008.50      0.00  xxxx
22:41:04        18541      0.00      0.00      0.00  sh
22:41:04        18542      0.00   3066.40      0.00  xxxx
22:41:04        18577      0.00 3602879701893313.50      0.00  sh
22:41:04        18578      0.00      0.00      0.00  xxxx
22:41:04        18628      0.00      0.00      0.00  java
22:41:04        18651      0.00   3084.80      0.00  java
22:41:04        20710      0.00 3602879701893451.00      0.00  xxxx

Is there a bug in pidstat?

Then I tried /proc/PID/io, but for processes I am interested in it has nearly identical values in write_bytes and cancelled_write_bytes fields ( I added commas to the output to make it easier to read):

rchar: 72758590191001
wchar: 928,196,066,002
syscr: 26979658101
syscw: 361381018
read_bytes: 175023194112
write_bytes: 933,716,340,736
cancelled_write_bytes: 934,937,964,544

Does it mean that the process writes into small files and immediately deletes these files so next to nothing actually gets written to disk?

Responses

Hi Ipg,

Somehow I did not see your post last year :)

Yes, I believe you are correct. If a process writes some blocks to a file and then deletes the file, it will log cancelled_write_bytes . It will still report write_bytes as well.

You can check it by running "man proc":

cancelled_write_bytes:
     The  big  inaccuracy  here is truncate.  If a process writes 1MB to a file and then deletes the file, it will in fact 
     perform no writeout.  But it will have been accounted as having caused 1MB of write.  In other words: this field 
     represents the number of bytes which this process caused to not happen,  by  truncating pagecache.  A task
     can cause "negative" I/O too.  If this task truncates some dirty pagecache, some I/O which another task has
     been accounted for (in its write_bytes) will not be happening.

Regards,

Dusan Baljevic (amateur radio VK2COT)