How to use the collectl utility to troubleshoot performance issues in Red Hat Enterprise Linux
Collectl is neither shipped nor supported by Red Hat but is sometimes used by users and third party vendors.
Note! While a Red Hat Engineer now is a maintainer of the upstream collectl project on github, Red Hat still does not deliver it as a part of RHEL, nor does it provide support for collectl on RHEL.
This guide exists because some users do use collectl and it is present in the Fedora EPEL community project. Collectl is considered Third Party Software as defined in As a customer how does Red Hat support me when I use third party components?.
Installing collectl does not render a system unsupportable by Red Hat Global Support Services; however, Red Hat Global Support Services will be unable to support or debug problems with collectl or resulting from installing and using collectl as it is not shipped in standard Red Hat Enterprise Linux channels. Installing third-party packages is done with the user's understanding of Red Hat's limitations in supporting issues with or resulting from the third-party packages.
How to obtain Collectl
- What is collectl?
- How do I use collectl?
- How do I install collectl?
The collectl community project is maintained at https://github.com/sharkcz/collectl ⧉ as well as provided in the Fedora community project.
For RHEL 6 and RHEL 7, the easiest way to install collectl is via the EPEL repositories (Extra Packages for Enterprise Linux) maintained by the Fedora community.
Note: Previously the main community project was located at http://collectl.sourceforge.net/ ⧉ -- but while still present, it lists the latest version as '4.3.1 Oct 31, 2018' at the top of that webpage. Whereas https://sourceforge.net/projects/collectl/files/collectl/ ⧉ lists the current version available as of June 2023 as '4.3.8 Feb 07, 2023'. The main place for updates has been moved to the above git hub per this comment on sourceforge.net.
Follow these instructions to set up the EPEL repositories. Once set up, collectl can be installed with the following command:
# yum install collectl
The packages are also available for direct download using the following links:
- RHEL 8 collectl for now needs to be directly downloaded from
Sourceforge
until collectl is added toRHEL 8 EPEL
:
wget https://sourceforge.net/projects/collectl/files/collectl/collectl-4.3.2/collectl-4.3.2.src.tar.gz/download -O /tmp/collectl-4.3.2.src.tar.gz tar -hxvf /tmp/collectl-4.3.2.src.tar.gz cd collectl ./INSTALL cd ../ Note! RHEL9+ need to ensure the Perl-English package is installed. We have much newer versions on https://github.com/sharkcz/collectl It is best to clone that and use the INSTALL script so you get the latest fixes. systemctl start collectl # start data collection service on host systemctl enable collectl # optional: enable collectl server to be started at boot time ls -ltr /var/log/collectl/* # where output from collectl is kept
- RHEL 7 x86_64 https://archives.fedoraproject.org/pub/archive/epel/7/x86_64/
- RHEL 6 x86_64 https://archives.fedoraproject.org/pub/archive/epel/6/x86_64/
- RHEL 5 x86_64 (available in the EPEL archives) https://archive.fedoraproject.org/pub/archive/epel/5/x86_64/
Note!
collectl is now available in a git repo
A simple git clone of the below will get you what you need
While this is publicly accessible, the code changes and commits are only maintained and updated by Red Hat engineers
https://github.com/sharkcz/collectl.git
After cloning
# cd collectl
# ./INSTALL
For RHEL7 +
# systemctl start collectl
# systemctl enable collectl
General usage of collectl
Enable Collectl
The collectl utility can be run manually via the command line or as a service. Data will be logged to /var/log/collectl/*.raw.gz
. The logs will be rotated every 24 hours by default. To run as a service:
# chkconfig collectl on ## Optional step, enabled in runlevel 3, to start at boot time
# service collectl start
Sample Intervals
When run manually from the command line, the first Interval value is 1.
When running as a service, default sample intervals are as show below. It might sometimes be desired to lower these to avoid averaging, such as 1,30,60.
# grep -i interval /etc/collectl.conf
#Interval = 10
#Interval2 = 60
#Interval3 = 120
Log file
When run automatically from the daemon, the output log file location is specified from within the /etc/collect.conf
setting. This can be changed to a new location is desired. Edit /etc/collectl.conf
and change the file path after the -f
option to filename at a new location.
grep -i daemoncommands /etc/collectl.conf | grep -v "^#"
DaemonCommands = -f /var/log/collectl -r00:00,7 -m -F60 -s+YZ -i1
Using collectl to troubleshoot disk or SAN storage performance
The defaults of 10s for all but process data which is collected at 60s intervals are best left as is, even for storage performance analysis.
The SAR Equivalence Matrix shows common SAR command equivalents to help experienced SAR users learn to use Collectl.
The following example command will view summary detail of the CPU, Network and Disk from the file /var/log/collectl/HOSTNAME-20130416-164506.raw.gz:
collectl -scnd -oT -p HOSTNAME-20130416-164506.raw.gz
# <----CPU[HYPER]-----><----------Disks-----------><----------Network---------->
#Time cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut
16:46:10 9 2 14470 20749 0 0 69 9 0 1 0 2
16:46:20 13 4 14820 22569 0 0 312 25 253 174 7 79
16:46:30 10 3 15175 21546 0 0 54 5 0 2 0 3
16:46:40 9 2 14741 21410 0 0 57 9 1 2 0 4
16:46:50 10 2 14782 23766 0 0 374 8 250 171 5 75
....
The next example will output the 1 minute period from 17:00 - 17:01.
collectl -scnd -oT --from 17:00 --thru 17:01 -p HOSTNAME-20130416-164506.raw.gz
# <----CPU[HYPER]-----><----------Disks-----------><----------Network---------->
#Time cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut
17:00:00 13 3 15870 25320 0 0 67 9 251 172 6 90
17:00:10 16 4 16386 24539 0 0 315 17 246 170 6 84
17:00:20 10 2 14959 22465 0 0 65 26 5 6 1 8
17:00:30 11 3 15056 24852 0 0 323 12 250 170 5 69
17:00:40 18 5 16595 23826 0 0 463 13 1 5 0 5
17:00:50 12 3 15457 23663 0 0 57 9 250 170 6 76
17:01:00 13 4 15479 24488 0 0 304 7 254 176 5 70
The next example will output Detailed Disk data.
collectl -scnD -oT -p HOSTNAME-20130416-164506.raw.gz
### RECORD 7 >>> tabserver <<< (1366318860.001) (Thu Apr 18 17:01:00 2013) ###
# CPU[HYPER] SUMMARY (INTR, CTXSW & PROC /sec)
# User Nice Sys Wait IRQ Soft Steal Idle CPUs Intr Ctxsw Proc RunQ Run Avg1 Avg5 Avg15 RunT BlkT
8 0 3 0 0 0 0 86 8 15K 24K 0 638 5 1.07 1.05 0.99 0 0
# DISK STATISTICS (/sec)
# <---------reads---------><---------writes---------><--------averages--------> Pct
#Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize QLen Wait SvcTim Util
sda 0 0 0 0 304 11 7 44 44 2 16 6 4
sdb 0 0 0 0 0 0 0 0 0 0 0 0 0
dm-0 0 0 0 0 0 0 0 0 0 0 0 0 0
dm-1 0 0 0 0 5 0 1 4 4 1 2 2 0
dm-2 0 0 0 0 298 0 14 22 22 1 4 3 4
dm-3 0 0 0 0 0 0 0 0 0 0 0 0 0
dm-4 0 0 0 0 0 0 0 0 0 0 0 0 0
dm-5 0 0 0 0 0 0 0 0 0 0 0 0 0
dm-6 0 0 0 0 0 0 0 0 0 0 0 0 0
dm-7 0 0 0 0 0 0 0 0 0 0 0 0 0
dm-8 0 0 0 0 0 0 0 0 0 0 0 0 0
dm-9 0 0 0 0 0 0 0 0 0 0 0 0 0
dm-10 0 0 0 0 0 0 0 0 0 0 0 0 0
dm-11 0 0 0 0 0 0 0 0 0 0 0 0 0
# NETWORK SUMMARY (/sec)
# KBIn PktIn SizeIn MultI CmpI ErrsI KBOut PktOut SizeO CmpO ErrsO
253 175 1481 0 0 0 5 70 79 0 0
....
Commonly used options
These generate summary, which is the total of ALL data for a particular type
- b
- buddy info (memory fragmentationc cpu
- d
- disk
- f
- nfs
- i
- inodes
- j
- interrupts by CPU
- l
- lustre
- m
- memory
- n
- network
- s
- sockets
- t
- tcp
- x
- Interconnect
- y
- Slabs (system object caches)
These generate detail data, typically but not limited to the device level
- C
- individual CPUs, including interrupts if sj or sJ
- D
- individual Disks
- E
- environmental (fan, power, temp) [requires ipmitool]
- F
- nfs data
- J
- interrupts by CPU by interrupt number
- L
- lustre
- M
- memory numa/node
- N
- individual Networks
- T
- tcp details (lots of data!)
- X
- interconnect ports/rails (Infiniband/Quadrics)
- Y
- slabs/slubs
- Z
- processes
The most useful switches are listed here
- -sD
detailed disk data
- -sC
detailed CPU data
- -sN
detailed network data
- -sZ
detailed process data
Comments