Introduction to storage performance analysis with PCP

Updated -

This article is an introduction and tutorial on basic storage performance analysis tasks using Performance Co-Pilot (PCP).

INTRODUCTION

Performance Co-Pilot is a system level performance monitoring and management suite, particularly suited for use in enterprise environments. PCP has shipped and been fully supported by Red Hat since RHEL 6.6 and in many other distributions (e.g. Fedora since F15, Debian, Ubuntu, etc.). It's a stable, actively maintained and developed open source project, with a supportive community, see the PCP Home Page and the project page on GITHUB and the git repo.
PCP Architecture

  • Metrics coverage - PCP captures just about everything exported by the Linux /proc and /sys pseudo filesystems, including per-process data. PCP is easily extensible and has around 70 plugins available, e.g. Linux platform, Oracle, mmv (for Java/JVM), libvirt, Docker containers and many many others, with new agents being added in every release.
  • Namespace and metadata - PCP has a uniform hierarchical metrics namespace with metadata for all metrics - data type (integer/string/block/event), semantics (counter/instant/etc), scaling units (time/count/space dimensions), help text and so forth. Generic monitoring tools can use this meta-data to do appropriate scaling, formatting and rate conversion of metric data, which mostly eliminates the need for custom monitoring applications. See for example pmrep(1) and pmchart(1) and other tools. For users familiar with legacy tools such as sysstat, top and so forth, PCP includes front-end tools with similar usage, see How does Performance Co-Pilot (PCP) compare with sysstat and the Side-by-side comparison of PCP tools with legacy tools.
  • Distributed Operation and Archive capture - PCP can export live data over secure IPC channels and also has logger services to capture archives for offsite or retrospective analysis. In essence, PCP has complete separation of the performance data capture (live or archive capture) from it's replay (live local, live remote, or archive replay). All PCP monitoring tools can do both live monitoring and archive replay and share common command line options, see PCPINTRO(1). This allows users to query live data (local or remote hosts), as well as traverse historical data for statistical reduction over whatever time period and sampling interval is required. PCP archives are also automatically included in sos-reports.
  • APIs - PCP has shipped in RHEL since RHEL6.6 and is included in all versions of RHEL7. There are well documented, stable plugin and data-ingest libraries for developing new agents, and libraries for new monitoring tools. All APIs have C/C++, Python and Perl bindings. A Go binding is available too (search for PCP speed on github). There is also JSON support and a webapi for devops style javascript applications with support for Grafana, Vector and native web applications.
  • Tools - PCP boasts a large variety of tools, with many console tools compatible with legacy sysstat tools (e.g. iostat, pidstat, vmstat, mpstat, uptime, free, atop, numastat, etc), as well as many generic tools for data probing and scripting. There are also GUI tools for advanced interactive analysis and report generation.
  • Security - PCP can be configured for localhost only (unix domain only with no inet domain sockets), or fully remote. When used for remote access, the PCP service daemons and libraries also support certificate based authentication with SSL encrypted IPC channels. A dedicated proxy daemon is also available for access through firewalls.
  • Documentation - PCP has extensive documentation, including two books and numerous white papers, HOWTOs, KCS solutions and articles (such as this one), etc. See references listed below.
  • Support - PCP is fully supported in RHEL. You can open a case to get assistance from Red Hat support, or you can engage the community directly.

STORAGE ANALYSIS TUTORIAL

We install and set up PCP with the logging services enabled on a small server (4 cores/4 disks/8G RAM). We also set up a fake workload and then capture some performance logs over several days. We then demonstrate basic storage performance analysis tasks using various PCP monitoring tools in archive log replay mode.
If you just want to learn about PCP monitoring tools without setting up a server to grab perf stats, then the PCP archives used in this tutorial are available for download:

# wget http://people.redhat.com/~mgoodwin/pcp/archives/goody.tgz
# tar xf goody.tgz

After unpacking the archive tarball, you can skip directly to Setup - PCP monitoring / analysis environment below.

Setup - Server Environment

Install and configure PCP according to the standard KCS solution for this task - this is applicable for any RHEL server from RHEL5 thru RHEL7, Fedora, Debian, etc. PCP is officially included in RHEL6.6 onwards. Prior to that, EPEL (and bintray.com) packages are available.
- How do I install Performance Co-Pilot (PCP) on my RHEL server to capture performance logs

Enable the standard pmcd and pmlogger services and tweak the default logging interval, reducing it from 30 seconds down to 5 seconds:
- How can I change the default logging interval used by Performance Co-Pilot (PCP)?

Enable logging of per-process metrics with a 30 second interval for later replay with pcp-atop(1) and other PCP monitoring tools. This step isn't always needed for I/O performance analysis, but sometimes it's useful to check which processes are active, doing I/O and using other system resources:
- How can I customize the Performance Co-Pilot logging configuration

The resulting configuration captures roughly 0.5 GB/day and saves it in PCP archive logs below the standard PCP logging directory: /var/log/pcp. Recent versions of sos(1) include this directory (if present) in a standard sosreport. If an older version of sos(1) is in use, customers can manually create a gzipped tar file and upload it to the case:
- How do I gather performance data logs to upload to my support case using Performance Co-Pilot (PCP)

Setup - Fake Workload

Deploy a "fake" workload via a crontab entry to simulate backups running at 8pm every day for approximately two hours. This is just so we can demonstrate the monitoring tools by providing something to look for, i.e. a heavy read-only load on the storage, occurring daily. We use a fake workload because this KCS article is public, and hence we can't use any real customer data. A typical customer case might complain about "High I/O wait and poor server performance every evening, causing poor application performance in other timezones" .. sound familiar?

# 
# cat /etc/cron.d/fakebackups 
# Fake workload - every day at 8pm, pretend to run backups of /home
0  20  *  *  *  root  dd if=/dev/mapper/vg-home bs=1M of=/dev/null >/dev/null 2>&1

Setup - PCP monitoring / analysis environment

Install the pcp-system-tools package (at least) on your workstation. Also install pcp-doc - you'll need the man pages too. These packages are NOT needed on the server - you only need the monitoring tools on the system you're intending to use for the analysis. Since PCP is portable and has both forward/backward version compatibility with complete separation of data capture from replay, the performance logs can be captured on the server of interest and then analysed later on any system with PCP monitoring tools installed. This applies to live monitoring and also to retrospective analysis from previously captured logs. Note that pcp-monitor is a meta-RPM package - it just has dependencies on the various RPM packages containing the PCP client tools, e.g. pcp-system-tools and others.

# yum -y install pcp-system-tools pcp-doc

Wait for a while - then examine the PCP archives

After the PCP pmlogger service has been running for a few hours, days or whatever - the resulting logs can be uploaded to the case or copied to your workstation:
- How do I gather performance data logs to upload to my support case using Performance Co-Pilot (PCP)
Note that PCP archives compress very well, so use tar with the 'z' option for compression. Once unpacked, the logs for our fake customer server will be in a subdirectory, one for each host being logged - in the simple case, only one host is logged. In our case, the host is called goody and the logs are in /var/log/pcp/pmlogger/goody. After two days of logging, that directory contains the following files:

# ls -hl /var/log/pcp/pmlogger/goody
total 1.4G
-rw-r--r-- 1 pcp pcp 472M Jul 19 12:48 20160718.12.38.0
-rw-r--r-- 1 pcp pcp  86K Jul 19 12:48 20160718.12.38.index
-rw-r--r-- 1 pcp pcp 3.2M Jul 19 12:47 20160718.12.38.meta
-rw-r--r-- 1 pcp pcp 474M Jul 20 13:04 20160719.12.55.0
-rw-r--r-- 1 pcp pcp  86K Jul 20 13:05 20160719.12.55.index
-rw-r--r-- 1 pcp pcp 9.2M Jul 20 13:01 20160719.12.55.meta
-rw-r--r-- 1 pcp pcp 418M Jul 21 10:18 20160720.13.25.0
-rw-r--r-- 1 pcp pcp  74K Jul 21 10:18 20160720.13.25.index
-rw-r--r-- 1 pcp pcp 4.2M Jul 21 10:15 20160720.13.25.meta
-rw-r--r-- 1 pcp pcp  216 Jul 20 13:25 Latest
-rw-r--r-- 1 pcp pcp  11K Jul 20 13:25 pmlogger.log
-rw-r--r-- 1 pcp pcp  11K Jul 20 13:05 pmlogger.log.prior

PCP logs are rolled over daily by the pmlogger_daily(1) cron script. Each archive basename uses the date and time the archive was created in the format YYMMDD.HH.MM., and consists of at least three files: a temporal index (.index), meta data (.meta), and one or more data volumes (.0, `.1', etc). In the above listing, we can see there are daily logs for July 18, 19 and 20th. Note that when a PCP archive name is supplied as the argument to a monitoring tool, only the archive basename should be supplied.

We can examine the archive label to determine the time bounds using the pmdumplog -L command, e.g.

# cd /var/log/pcp/pmlogger/goody
# pmdumplog -L 20160718.12.38
Log Label (Log Format Version 2)
Performance metrics from host goody
  commencing Mon Jul 18 12:38:49.106 2016
  ending     Tue Jul 19 12:48:44.134 2016
Archive timezone: AEST-10
PID for pmlogger: 12976

Other files in the pmlogger/goody directory are
- pmlogger.log - this is the log for the pmlogger service - it details which metrics are being logged and at what logging interval. Different groups of metrics can be logged at different rates. In our fake customer site case, we're logging most data at 5 seconds and per-process data at 30 second intervals. The pmlogger.log file also provides an indication of the approximate logging data volume per day - this is useful when configuring the logging interval and for /var filesystem capacity planning.

Timezones and the -z flag (important)

Note in the pmdumplog -L listing above, the timezone of the server is listed in the log label header. When using a monitoring tool to replay a PCP archive, the default reporting timezone is the local timezone of the host the monitoring tool is being run on. This is because many PCP monitoring tools can replay multiple archives, which may have been captured on hosts in different timezones around the world. In practice, when replaying an archive from a customer site, we are usually more interested in the customer's timezone and this is often different to your timezone. To use the timezone of the server rather than your local timezone (if they are different), use the -z flag with the monitoring tool you're using - that way your analysis will make the most sense when reported to the customer and the timestamps reported in syslog and other logs will correlate correctly. You can also use the -Z zone flag to specify that the reporting timezone to use is zone, e.g. -Z EDT or -Z PST, -Z UTC, etc.

Merging multiple archives into one with pmlogextract

Often you will want to use a monitoring tool to examine several days or even weeks of performance data - as a summary and overview before zooming in to smaller time periods of interest. This is achieved easily using the pmlogextract tool to merge the archives into one larger archive, which you can then use with the -a flag passed to the monitoring tool :
- How can I merge multiple PCP archives into one for a system level performance analysis covering multiple days or weeks
For example:

# pwd
/var/log/pcp/pmlogger/goody
# pmlogextract *.0 ../myarchive
# pmdumplog -L ../myarchive
Log Label (Log Format Version 2)
Performance metrics from host goody
  commencing Mon Jul 18 12:38:49.106 2016
  ending     Thu Jul 21 08:13:12.404 2016
Archive timezone: AEST-10
PID for pmlogger: 22379

As you can see, myarchive spans several days; this will allow us to examine the data over several days before zooming in on times of interest - a very useful feature. The pmlogextract tool can also extract only certain metrics if needed, see the man page. Also note that recent versions of PCP (pcp-3.11.2 and later) allow the -a flag to be specified more than once, and the tool will transparently merge the archives "on the fly".

Using pmchart for a performance overview spanning days

So we can now use myarchive report a performance summary with any PCP monitoring tool. For example, let's just examine CPU usage over the three days to see if there's any excessive I/O Wait time (idle time with pending I/O):

# pmchart -z -a myarchive -t 10m -O-0 -s 400 -v 400 -geometry 800x450 -c CPU -c Loadavg

So that's a 10 minute sampling interval (-t 10m) with 400 intervals (-s 400 -v 400) and thus spanning (10 x 400 / 60) = 66 hours, or just under 3 days. We've used the timezone of the sever (-z), offset to the end of the archive (-O-0 that's -O negative zero, meaning zero seconds before the end of the archive), 800x450 window size, and to pre-load two pmchart "views" (-c CPU -c Loadavg) :
CPU and Loadavg summary over 3 days
Clearly there are daily load average spikes at 8pm each evening along with elevated I/O wait time (aqua in the CPU chart at the top) - in this simplistic example, you could upload that image to the case and ask the customer what applications run at 8pm every evening ... "oh, yeah the backups! case dismissed" :)

Using pmrep for text based output:

# pmrep -z -a myarchive -t 10m -p -S'@19:45' -s 20 kernel.all.load kernel.all.cpu.{user,sys,wait.total} disk.all.total_bytes
          k.a.load  k.a.load  k.a.load  k.a.c.user  k.a.c.sys  k.a.c.w.total  d.a.total_bytes
          1 minute  5 minute  15 minut                                                       
                                              util       util           util          Kbyte/s
19:45:00     0.010     0.020     0.050         N/A        N/A            N/A              N/A
19:55:00     0.010     0.020     0.050       0.012      0.004          0.009           12.507
20:05:00     1.100     0.690     0.330       0.016      0.055          0.292        44104.872
20:15:00     1.010     0.990     0.670       0.012      0.088          0.754        79864.410
20:25:00     1.050     1.060     0.860       0.012      0.074          0.908        66039.610
20:35:00     1.060     1.070     0.960       0.015      0.074          0.913        63837.190
20:45:00     1.310     1.160     1.020       0.012      0.070          0.792        62741.337
20:55:00     1.000     1.030     1.030       0.012      0.070          0.792        61975.538
21:05:00     1.000     1.060     1.060       0.016      0.071          0.882        60047.950
21:15:00     1.000     1.020     1.050       0.013      0.065          0.907        56705.562
21:25:00     1.000     1.020     1.050       0.012      0.060          0.907        52059.462
21:35:00     1.030     1.040     1.050       0.016      0.057          0.921        46970.013
21:45:00     1.020     1.110     1.090       0.012      0.047          0.960        41684.985
21:55:00     0.130     0.680     0.920       0.013      0.035          0.769        29428.372
22:05:00     0.000     0.130     0.510       0.016      0.007          0.013           25.687
22:15:00     0.000     0.020     0.270       0.012      0.004          0.010           13.060
22:25:00     0.000     0.010     0.150       0.014      0.006          0.012           44.713
22:35:00     0.000     0.010     0.080       0.017      0.008          0.010           17.532
22:45:00     0.030     0.030     0.050       0.012      0.004          0.010           13.690
22:55:00     0.000     0.010     0.050       0.012      0.005          0.009           14.037

In the above command, we've used pmrep to show timestamps (-p) with an update interval of 10 minutes (-t 10m), for 20 samples (-s 20), starting at 19:45pm (-S@19:45) and we've asked for the load averages (kernel.all.load), user, sys and wait CPU measurements (kernel.all.cpu.{user,sys,wait.total} and the total disk thruput (disk.all.total_bytes). So that's reporting the first I/O spike in the pmchart graph, on Jul 18th at around 8pm, which lasted about two hours.

Using pmiostat for storage performance analysis

The pmiostat tool is a CLI tool that can replay PCP archive logs. The pmiostat report is very similar to it's namesake from sysstat(1), but more powerful in many ways.
Here is a similar example to the pmrep usage above - we're using pmiostat on the same archive, but we've chosen the second I/O spike on July 19th, and only asked for the report for the sda disk (using -R sda$ as a regex pattern) :

# pmiostat -z -a myarchive -x t -t 10m -S'@Jul 19 19:55:00' -s 10 -P0 -R 'sda$'
# Timestamp              Device       rrqm/s wrqm/s   r/s  w/s  rkB/s  wkB/s avgrq-sz avgqu-sz await r_await w_await %util
Tue Jul 19 20:05:00 2016 sda               0      0     0    2      0      5      2.7      0.0     7       8       7     1
Tue Jul 19 20:15:00 2016 sda               0      0     0    1      0      3      2.0      0.0     4      10       4     1
Tue Jul 19 20:25:00 2016 sda               0      0     0    1      0      3      2.0      0.0     4       0       4     1
Tue Jul 19 20:35:00 2016 sda               1      0   419    2  53299      4    126.8      1.6     4       4      26    81
Tue Jul 19 20:45:00 2016 sda               0      0    32    1   4108      3    121.5      0.1     4       4       6     8
Tue Jul 19 20:55:00 2016 sda               0      0     0    2      0      4      2.4      0.0     6      14       6     1
Tue Jul 19 21:05:00 2016 sda               0      0     0    2      0      5      2.5      0.0     5      11       5     1
Tue Jul 19 21:15:00 2016 sda               0      0     0    1      0      3      2.0      0.0     4       0       4     1
Tue Jul 19 21:25:00 2016 sda               0      0     0    2      0      4      2.4      0.0     6      15       6     1
Tue Jul 19 21:35:00 2016 sda               0      0     0    1      0      3      2.2      0.0     5       0       5     1

Check what device-mapper volumes we have :

# pminfo -a myarchive -f disk.dm.read

disk.dm.read
    inst [0 or "vg-swap"] value 638067
    inst [1 or "vg-root"] value 1699398
    inst [3 or "vg-libvirt"] value 283719
    inst [5 or "vg-home"] value 10175251
    inst [7 or "docker-253:1-4720837-pool"] value 2441254
    inst [273 or "docker-253:1-4720837-bd20b3551af42d2a0a6a3ed96cccea0fd59938b70e57a967937d642354e04859"] value 1218
    inst [274 or "docker-253:1-4720837-b4aca4676507814fcbd56f81f3b68175e0d675d67bbbe371d16d1e5b7d95594e"] value 1491

Ok, let's check the logical volume stats, excluding the docker volumes:

# pmiostat -z -a myarchive -x t -x dm -t 10m -S'@Jul 19 19:55:00' -s 5 -P0 -R'vg-*'
# Timestamp              Device       rrqm/s wrqm/s   r/s  w/s  rkB/s  wkB/s avgrq-sz avgqu-sz await r_await w_await %util
Tue Jul 19 20:05:00 2016 vg-home           0      0   347    0  44025      0    126.8      1.0     3       3       0    50
Tue Jul 19 20:05:00 2016 vg-libvirt        0      0     0    2      0      5      2.6      0.0     7       8       7     1
Tue Jul 19 20:05:00 2016 vg-root           0      0     0    2      5     12      9.7      0.1    70      33      74     2
Tue Jul 19 20:05:00 2016 vg-swap           0      0     0    0      0      0      4.1      0.0    25      25       0     0
Tue Jul 19 20:15:00 2016 vg-home           0      0   631    0  79965      0    126.7      1.9     3       3       0   100
Tue Jul 19 20:15:00 2016 vg-libvirt        0      0     0    1      0      3      2.0      0.0     4      10       4     1
Tue Jul 19 20:15:00 2016 vg-root           0      0     0    1      4     10     10.6      0.1    66      41      73     3
Tue Jul 19 20:15:00 2016 vg-swap           0      0     0    0      0      0      4.0      0.0    68      68       0     0
Tue Jul 19 20:25:00 2016 vg-home           0      0   522    0  65960      0    126.4      2.0     4       4       0   100
Tue Jul 19 20:25:00 2016 vg-libvirt        0      0     0    1      0      3      2.0      0.0     4       0       4     1
Tue Jul 19 20:25:00 2016 vg-root           0      0     1    1     28     12     14.7      0.2    56      30      77     5
Tue Jul 19 20:25:00 2016 vg-swap           0      0     0    0      0      0      4.0      0.0    59      59       0     0
Tue Jul 19 20:35:00 2016 vg-home           0      0   501    0  63784      0    127.3      2.0     4       4       0   100
Tue Jul 19 20:35:00 2016 vg-libvirt        0      0     0    2      0      4      2.5      0.0    28      37      28     3
Tue Jul 19 20:35:00 2016 vg-root           0      0     1    2     12     12     11.3      0.1    55      34      62     2
Tue Jul 19 20:35:00 2016 vg-swap           0      0     0    0      0      0      4.0      0.0    33      33       0     0
Tue Jul 19 20:45:00 2016 vg-home           0      0   485    0  61965      0    127.8      2.0     4       4       0   100
Tue Jul 19 20:45:00 2016 vg-libvirt        0      0     0    1      0      3      2.2      0.0     7      12       7     1
Tue Jul 19 20:45:00 2016 vg-root           0      0     0    1      4     10     11.6      0.0    15      17      15     1
Tue Jul 19 20:45:00 2016 vg-swap           0      0     0    0      0      0      4.0      0.0    13      13       0     0

Looks like the vg-home LVM logical volume has all the traffic:

# pmiostat -z -a myarchive -x t -x dm -t 10m -S'@Jul 19 19:45:00' -s 15 -P0 -R'vg-home'
# Timestamp              Device       rrqm/s wrqm/s   r/s  w/s  rkB/s  wkB/s avgrq-sz avgqu-sz await r_await w_await %util
Tue Jul 19 19:55:00 2016 vg-home           0      0     0    0      0      0      0.0      0.0     0       0       0     0
Tue Jul 19 20:05:00 2016 vg-home           0      0   347    0  44025      0    126.8      1.0     3       3       0    50
Tue Jul 19 20:15:00 2016 vg-home           0      0   631    0  79965      0    126.7      1.9     3       3       0   100
Tue Jul 19 20:25:00 2016 vg-home           0      0   522    0  65960      0    126.4      2.0     4       4       0   100
Tue Jul 19 20:35:00 2016 vg-home           0      0   501    0  63784      0    127.3      2.0     4       4       0   100
Tue Jul 19 20:45:00 2016 vg-home           0      0   485    0  61965      0    127.8      2.0     4       4       0   100
Tue Jul 19 20:55:00 2016 vg-home           0      0   479    0  61362      0    128.0      1.9     4       4       0   100
Tue Jul 19 21:05:00 2016 vg-home           0      0   467    0  59751      0    128.0      2.0     4       4       0   100
Tue Jul 19 21:15:00 2016 vg-home           0      0   440    0  56380      0    128.0      2.0     4       4       0   100
Tue Jul 19 21:25:00 2016 vg-home           0      0   405    0  51875      0    128.0      2.0     5       5       0   100
Tue Jul 19 21:35:00 2016 vg-home           0      0   367    0  46996      0    128.0      2.0     5       5       0   100
Tue Jul 19 21:45:00 2016 vg-home           0      0   326    0  41727      0    128.0      2.0     6       6       0   100
Tue Jul 19 21:55:00 2016 vg-home           0      0   242    0  31030      0    128.0      1.7     7       7       0    85
Tue Jul 19 22:05:00 2016 vg-home           0      0     0    0      0      0      0.0      0.0     0       0       0     0
Tue Jul 19 22:15:00 2016 vg-home           0      0     0    0      0      0      0.0      0.0     0       0       0     0

Checking the sos-report, the vg volume group uses the following physvols :

# 
# pvs
  PV         VG   Fmt  Attr PSize   PFree
  /dev/sda1  vg   lvm2 a--  232.88g    0 
  /dev/sdb2  vg   lvm2 a--  230.88g    0 
  /dev/sdc   vg   lvm2 a--  232.88g    0 

So let's check those devices with pmiostat:

# pmiostat -z -a myarchive -x t -t 10m -S'@Jul 19 19:45:00' -s 10 -P0 -R'sd[abc]'
# Timestamp              Device       rrqm/s wrqm/s   r/s  w/s  rkB/s  wkB/s avgrq-sz avgqu-sz await r_await w_await %util
Tue Jul 19 19:55:00 2016 sda               0      0     0    1      0      3      2.0      0.0     5       0       5     1
Tue Jul 19 19:55:00 2016 sdb               0      0     0    1      1     10     10.8      0.0     8      12       7     1
Tue Jul 19 19:55:00 2016 sdc               0      0     0    0      0      0      0.0      0.0     0       0       0     0
Tue Jul 19 20:05:00 2016 sda               0      0     0    2      0      5      2.7      0.0     7       8       7     1
Tue Jul 19 20:05:00 2016 sdb               0      0   347    1  44030     12    126.4      1.1     3       3      78    50
Tue Jul 19 20:05:00 2016 sdc               0      0     0    0      0      0      0.0      0.0     0       0       0     0
Tue Jul 19 20:15:00 2016 sda               0      0     0    1      0      3      2.0      0.0     4      10       4     1
Tue Jul 19 20:15:00 2016 sdb               1      0   630    1  79970     10    126.7      2.0     3       3      66   100
Tue Jul 19 20:15:00 2016 sdc               0      0     0    0      0      0      0.0      0.0     0       0       0     0
Tue Jul 19 20:25:00 2016 sda               0      0     0    1      0      3      2.0      0.0     4       0       4     1
Tue Jul 19 20:25:00 2016 sdb               1      0   522    1  65988     12    126.2      2.1     4       4      77   100
Tue Jul 19 20:25:00 2016 sdc               0      0     0    0      0      0      0.0      0.0     0       0       0     0
Tue Jul 19 20:35:00 2016 sda               1      0   419    2  53299      4    126.8      1.6     4       4      26    81
Tue Jul 19 20:35:00 2016 sdb               0      0    82    1  10496     12    125.5      0.5     6       5      69    19
Tue Jul 19 20:35:00 2016 sdc               0      0     0    0      0      0      0.0      0.0     0       0       0     0
Tue Jul 19 20:45:00 2016 sda               0      0    32    1   4108      3    121.5      0.1     4       4       6     8
Tue Jul 19 20:45:00 2016 sdb               0      0     0    1      4     10     12.8      0.0    17      16      17     1
Tue Jul 19 20:45:00 2016 sdc               0      0   452    0  57858      0    127.9      1.8     4       4       0    93
Tue Jul 19 20:55:00 2016 sda               0      0     0    2      0      4      2.4      0.0     6      14       6     1
Tue Jul 19 20:55:00 2016 sdb               0      0     0    1      3     10     12.1      0.0    24      21      24     1
Tue Jul 19 20:55:00 2016 sdc               0      0   479    0  61362      0    128.0      1.9     4       4       0   100
Tue Jul 19 21:05:00 2016 sda               0      0     0    2      0      5      2.5      0.0     5      11       5     1
Tue Jul 19 21:05:00 2016 sdb               0      0     0    1      7     12     10.5      0.1    34      20      38     1
Tue Jul 19 21:05:00 2016 sdc               0      0   467    0  59751      0    128.0      2.0     4       4       0   100
Tue Jul 19 21:15:00 2016 sda               0      0     0    1      0      3      2.0      0.0     4       0       4     1
Tue Jul 19 21:15:00 2016 sdb               0      0     0    1      2     10     12.0      0.0    19      15      19     1
Tue Jul 19 21:15:00 2016 sdc               0      0   440    0  56380      0    128.0      2.0     4       4       0   100
Tue Jul 19 21:25:00 2016 sda               0      0     0    2      0      4      2.4      0.0     6      15       6     1
Tue Jul 19 21:25:00 2016 sdb               0      0     4    1    150     11     32.4      0.0     9       3      31     1
Tue Jul 19 21:25:00 2016 sdc               0      0   405    0  51875      0    128.0      2.0     5       5       0   100

Note that the physvols in the vg volume group are linear, and we see sdb, then sda and then sdc become busy in turn, as our fake workload progresses in reading the vg-home logical volume. We can see this more clearly with pmchart, e.g. :
pmchart per-disk spike on July 19th

Here's the pmchart IOSTAT view:
pmchart IOSTAT view for the whole archive - traffic tab

And here's the other tab showing the other iostat columns:
pmchart IOSTAT view for the whole archive - traffic tab

pmiostat - device selection and aggregation operators

A new feature in pmiostat (version 3.11.2 or later) is the -R regex and -G sum|avg|min|max command line options. The regex selects a subset of the available devices (either sd devices, or device-mapper logical devices if the -x dm flag has been given). Only the selected devices will be reported, e.g.

# pmiostat -z -a myarchive -x t -t 10m -S'@Jul 19 19:45:00' -s 4 -P0 -R'sd[ab]$'
# Timestamp              Device       rrqm/s wrqm/s   r/s  w/s  rkB/s  wkB/s avgrq-sz avgqu-sz await r_await w_await %util
Tue Jul 19 19:55:00 2016 sda               0      0     0    1      0      3      2.0      0.0     5       0       5     1
Tue Jul 19 19:55:00 2016 sdb               0      0     0    1      1     10     10.8      0.0     8      12       7     1
Tue Jul 19 20:05:00 2016 sda               0      0     0    2      0      5      2.7      0.0     7       8       7     1
Tue Jul 19 20:05:00 2016 sdb               0      0   347    1  44030     12    126.4      1.1     3       3      78    50
Tue Jul 19 20:15:00 2016 sda               0      0     0    1      0      3      2.0      0.0     4      10       4     1
Tue Jul 19 20:15:00 2016 sdb               1      0   630    1  79970     10    126.7      2.0     3       3      66   100
Tue Jul 19 20:25:00 2016 sda               0      0     0    1      0      3      2.0      0.0     4       0       4     1
Tue Jul 19 20:25:00 2016 sdb               1      0   522    1  65988     12    126.2      2.1     4       4      77   100

Note the trailing $ in the regex - otherwise we'll match sdaa, sdab, etc.
The -G flag allows aggregation of the statistics for the devices matching the regex supplied with the -R flag, or all devices by default if -R isn't supplied. As a practical example, we can check the HBTL scsi-id by querying the hinv.map.scsi metric, e.g. :

# pminfo -a myarchive -f hinv.map.scsi

hinv.map.scsi
    inst [0 or "scsi0:0:0:0 Direct-Access"] value "sda"
    inst [1 or "scsi1:0:0:0 Direct-Access"] value "sdb"
    inst [2 or "scsi2:0:0:0 Direct-Access"] value "sdc"
    inst [3 or "scsi3:0:0:0 Direct-Access"] value "sdd"

And let's say we want to sum the statistics for sda and sdb (e.g. because they might be alternate paths to the same target, or on the same PCI bus, whatever - and you're interested in summing that traffic) :

# pmiostat -z -a myarchive -x t -t 10m -S'@Jul 19 19:45:00' -s 4 -P0 -R'sd[ab]$' -Gsum
# Timestamp              Device       rrqm/s wrqm/s   r/s  w/s  rkB/s  wkB/s avgrq-sz avgqu-sz await r_await w_await %util
Tue Jul 19 19:55:00 2016 sum(sd[ab]$)      0      1     0    2      1     13     12.8      0.0    12      12      12     1
Tue Jul 19 20:05:00 2016 sum(sd[ab]$)      0      1   347    3  44030     17    129.1      1.1    10      11      85    51
Tue Jul 19 20:15:00 2016 sum(sd[ab]$)      1      0   630    2  79970     13    128.7      2.0     7      13      70   101
Tue Jul 19 20:25:00 2016 sum(sd[ab]$)      1      1   522    3  65988     15    128.2      2.1     8       4      80   101

Note that the device name column is now reported as the supplied regex pattern and that the %util column is always averaged, despite using the -Gsum flag.

Non-interpolated mode

By default, all PCP monitoring tools use an interpolation mechanism, to allow the reporting interval to differ from the underlying native sampling interval. For details on how this works, see PCPIntro(1). Sometimes we do not want to use interpolation, e.g. when correlating blocktrace(1) data with pmiostat data, we will need to match timestamps exactly. To disable interpolation, use the -u flag with pmrep or pmiostat.

Note that the -u flag is incompatible with the -t flag - when not using interpolation the sampling interval cannot be changed, it's always the same as the underlying sampling interval that the data was originally recorded with.

Tools for reporting per-process I/O

The pcp-atop(1) command can be used to examine per-process I/O statistics, and a wealth of other system level performance data. Like all PCP tools, pcp-atop can replay archives, provided the appropriate pmlogconf settings have been enabled - see above. For per-process I/O analysis, use the pcp -a ARCHIVE atop -d flag. You probably also want to specify a fairly long sampling interval because atop runs fairly slowly on large archives, especially when proc data is present. There are a large number of options. For process I/O, the -d flag is most useful.

# pcp atop --help
Usage: pcp-atop [-flags] [interval [samples]]
        or
Usage: pcp-atop -w  file  [-S] [-a] [interval [samples]]
       pcp-atop -r  file  [-b hh:mm] [-e hh:mm] [-flags]

    generic flags:
      -a  show or log all processes (i.s.o. active processes only)
      -R  calculate proportional set size (PSS) per process
      -P  generate parseable output for specified label(s)
      -L  alternate line length (default 80) in case of non-screen output
      -f  show fixed number of lines with system statistics
      -F  suppress sorting of system resources
      -G  suppress exited processes in output
      -l  show limited number of lines for certain resources
      -y  show individual threads
      -1  show average-per-second i.s.o. total values

      -x  no colors in case of high occupation
      -g  show general process-info (default)
      -m  show memory-related process-info
      -d  show disk-related process-info
      -n  show network-related process-info
      -s  show scheduling-related process-info
      -v  show various process-info (ppid, user/group, date/time)
      -c  show command line per process
      -o  show own defined process-info
      -u  show cumulated process-info per user
      -p  show cumulated process-info per program (i.e. same name)

      -C  sort processes in order of cpu-consumption (default)
      -M  sort processes in order of memory-consumption
      -D  sort processes in order of disk-activity
      -N  sort processes in order of network-activity
      -A  sort processes in order of most active resource (auto mode)

    specific flags for raw logfiles:
      -w  write raw data to PCP archive folio
      -r  read  raw data from PCP archive folio
      -S  finish pcp-atop automatically before midnight (i.s.o. #samples)
      -b  begin showing data from specified time
      -e  finish showing data after specified time

    interval: number of seconds   (minimum 0)
    samples:  number of intervals (minimum 1)

Finding blocked processes

One common task in I/O performance analysis involves blocked tasks, i.e. processes in 'D' state or UN uninterruptible sleep. It's easy to check this with pmchart by plotting the proc.runq.runnable and proc.runq.blocked process counts, e.g.

# pmchart -z -a myarchive -t 10m -O-0 -s 400 -v 400 -geometry 800x450 -c CPU

pmchart running and blocked processes

To monitor specific processes, you can also use the pmrep and pcp-pidstat tools, and others. The pcp-pidstat tool is currently being developed with additional features to detect and monitor blocked processes.

References

Knowledge Base Articles and Solutions

White Papers, Guide Books and Case Studies and Presentations

1 Comments

Havent tried the GUI yet.. may be I wont like it.. but for customer very good.