Red Hat Training

A Red Hat training course is available for Red Hat Enterprise Linux

4.3. Profiling

The following sections showcase scripts that profile kernel activity by monitoring function calls.

4.3.1. Counting Function Calls Made

This section describes how to identify how many times the system called a specific kernel function in a 30-second sample. Depending on your use of wildcards, you can also use this script to target multiple kernel functions.
functioncallcount.stp

#! /usr/bin/env stap
# The following line command will probe all the functions
# in kernel's memory management code:
#
# stap  functioncallcount.stp "*@mm/*.c"

probe kernel.function(@1).call {  # probe functions listed on commandline
  called[probefunc()] <<< 1  # add a count efficiently
}

global called

probe end {
  foreach (fn in called-)  # Sort by call count (in decreasing order)
  #       (fn+ in called)  # Sort by function name
    printf("%s %d\n", fn, @count(called[fn]))
  exit()
}

functioncallcount.stp takes the targeted kernel function as an argument. The argument supports wildcards, which enables you to target multiple kernel functions up to a certain extent.
You can increase the sample time by editing the timer in the second probe (timer.ms()). The output of functioncallcount.stp contains the name of the function called and how many times it was called during the sample time (in alphabetical order). Example 4.10, “functioncallcount.stp Sample Output” contains an excerpt from the output of stap countcalls.stp "*@mm/*.c":

Example 4.10. functioncallcount.stp Sample Output

[...]
__vma_link 97
__vma_link_file 66
__vma_link_list 97
__vma_link_rb 97
__xchg 103
add_page_to_active_list 102
add_page_to_inactive_list 19
add_to_page_cache 19
add_to_page_cache_lru 7
all_vm_events 6
alloc_pages_node 4630
alloc_slabmgmt 67
anon_vma_alloc 62
anon_vma_free 62
anon_vma_lock 66
anon_vma_prepare 98
anon_vma_unlink 97
anon_vma_unlock 66
arch_get_unmapped_area_topdown 94
arch_get_unmapped_exec_area 3
arch_unmap_area_topdown 97
atomic_add 2
atomic_add_negative 97
atomic_dec_and_test 5153
atomic_inc 470
atomic_inc_and_test 1
[...]

4.3.2. Call Graph Tracing

This section describes how to trace incoming and outgoing function calls.
para-callgraph-simple.stp

function trace(entry_p) {
  if(tid() in trace)
    printf("%s%s%s\n",thread_indent(entry_p),
           (entry_p>0?"->":"<-"),
           probefunc())
}

global trace
probe kernel.function(@1).call {
  if (execname() == "stapio") next # skip our own helper process
  trace[tid()] = 1
  trace(1)
}
probe kernel.function(@1).return {
  trace(-1)
  delete trace[tid()]
}

probe kernel.function(@2).call { trace(1) }
probe kernel.function(@2).return { trace(-1) }
function trace(entry_p) {
  if(tid() in trace)
    printf("%s%s%s\n",thread_indent(entry_p),
           (entry_p>0?"->":"<-"),
           probefunc())
}

global trace
probe kernel.function(@1).call {
  if (execname() == "stapio") next # skip our own helper process
  trace[tid()] = 1
  trace(1)
}
probe kernel.function(@1).return {
  trace(-1)
  delete trace[tid()]
}

probe kernel.function(@2).call { trace(1) }
probe kernel.function(@2).return { trace(-1) }

para-callgraph-simple.stp takes two command-line arguments:
  • A trigger function (@1), which enables or disables tracing on a per-thread basis. Tracing in each thread will continue as long as the trigger function has not exited yet.
  • The kernel function/s whose entry/exit call you'd like to trace (@2).
para-callgraph-simple.stp uses thread_indent(); as such, its output contains the timestamp, process name, and thread ID of @2 (i.e. the probe function you are tracing). For more information about thread_indent(), refer to its entry in SystemTap Functions.
The following example contains an excerpt from the output for stap para-callgraph.stp sys_read '*@fs/*.c':

Example 4.11. para-callgraph-simple.stp Sample Output

[...]
     0 klogd(1391):->sys_read
    14 klogd(1391): ->fget_light
    22 klogd(1391): <-fget_light
    27 klogd(1391): ->vfs_read
    35 klogd(1391):  ->rw_verify_area
    43 klogd(1391):  <-rw_verify_area
    49 klogd(1391):  ->kmsg_read
     0 sendmail(1696):->sys_read
    17 sendmail(1696): ->fget_light
    26 sendmail(1696): <-fget_light
    34 sendmail(1696): ->vfs_read
    44 sendmail(1696):  ->rw_verify_area
    52 sendmail(1696):  <-rw_verify_area
    58 sendmail(1696):  ->proc_file_read
    70 sendmail(1696):   ->loadavg_read_proc
    84 sendmail(1696):    ->proc_calc_metrics
    92 sendmail(1696):    <-proc_calc_metrics
    95 sendmail(1696):   <-loadavg_read_proc
   101 sendmail(1696):  <-proc_file_read
   106 sendmail(1696):  ->dnotify_parent
   115 sendmail(1696):  <-dnotify_parent
   119 sendmail(1696):  ->inotify_dentry_parent_queue_event
   127 sendmail(1696):  <-inotify_dentry_parent_queue_event
   133 sendmail(1696):  ->inotify_inode_queue_event
   141 sendmail(1696):  <-inotify_inode_queue_event
   146 sendmail(1696): <-vfs_read
   151 sendmail(1696):<-sys_read

4.3.3. Determining Time Spent in Kernel and User Space

This section illustrates how to determine the amount of time any given thread is spending in either kernel or user-space.
thread-times.stp

#! /usr/bin/stap

probe timer.profile {
  tid=tid()
  if (!user_mode())
    kticks[tid] <<< 1
  else
    uticks[tid] <<< 1
  ticks <<< 1
  tids[tid] <<< 1
}

global uticks, kticks, ticks

global tids

probe timer.s(5), end {
  allticks = @count(ticks)
  printf ("%5s %7s %7s (of %d ticks)\n",
          "tid", "%user", "%kernel", allticks)
  foreach (tid in tids- limit 20) {
    uscaled = @count(uticks[tid])*10000/allticks
    kscaled = @count(kticks[tid])*10000/allticks
    printf ("%5d %3d.%02d%% %3d.%02d%%\n",
      tid, uscaled/100, uscaled%100, kscaled/100, kscaled%100)
  }
  printf("\n")

  delete uticks
  delete kticks
  delete ticks
  delete tids
}

thread-times.stp lists the top 20 processes currently taking up CPU time within a 5-second sample, along with the total number of CPU ticks made during the sample. The output of this script also notes the percentage of CPU time each process used, as well as whether that time was spent in kernel space or user space.
Example 4.12, “thread-times.stp Sample Output” contains a 5-second sample of the output for thread-times.stp:

Example 4.12. thread-times.stp Sample Output

  tid   %user %kernel (of 20002 ticks)
    0   0.00%  87.88%
32169   5.24%   0.03%
 9815   3.33%   0.36%
 9859   0.95%   0.00%
 3611   0.56%   0.12%
 9861   0.62%   0.01%
11106   0.37%   0.02%
32167   0.08%   0.08%
 3897   0.01%   0.08%
 3800   0.03%   0.00%
 2886   0.02%   0.00%
 3243   0.00%   0.01%
 3862   0.01%   0.00%
 3782   0.00%   0.00%
21767   0.00%   0.00%
 2522   0.00%   0.00%
 3883   0.00%   0.00%
 3775   0.00%   0.00%
 3943   0.00%   0.00%
 3873   0.00%   0.00%

4.3.4. Monitoring Polling Applications

This section how to identify and monitor which applications are polling. Doing so allows you to track unnecessary or excessive polling, which can help you pinpoint areas for improvement in terms of CPU usage and power savings.
timeout.stp

#! /usr/bin/env stap
# Copyright (C) 2009 Red Hat, Inc.
# Written by Ulrich Drepper <drepper@redhat.com>
# Modified by William Cohen <wcohen@redhat.com>

global process, timeout_count, to
global poll_timeout, epoll_timeout, select_timeout, itimer_timeout
global nanosleep_timeout, futex_timeout, signal_timeout

probe syscall.poll, syscall.epoll_wait {
  if (timeout) to[pid()]=timeout
}

probe syscall.poll.return {
  p = pid()
  if ($return == 0 && to[p] > 0 ) {
    poll_timeout[p]++
    timeout_count[p]++
    process[p] = execname()
    delete to[p]
  }
}

probe syscall.epoll_wait.return {
  p = pid()
  if ($return == 0 && to[p] > 0 ) {
    epoll_timeout[p]++
    timeout_count[p]++
    process[p] = execname()
    delete to[p]
  }
}

probe syscall.select.return {
  if ($return == 0) {
    p = pid()
    select_timeout[p]++
    timeout_count[p]++
    process[p] = execname()
  }
}

probe syscall.futex.return {
  if (errno_str($return) == "ETIMEDOUT") {
    p = pid()
    futex_timeout[p]++
    timeout_count[p]++
    process[p] = execname()
  }
}

probe syscall.nanosleep.return {
  if ($return == 0) {
    p = pid()
    nanosleep_timeout[p]++
    timeout_count[p]++
    process[p] = execname()
  }
}

probe kernel.function("it_real_fn") {
  p = pid()
  itimer_timeout[p]++
  timeout_count[p]++
  process[p] = execname()
}

probe syscall.rt_sigtimedwait.return {
  if (errno_str($return) == "EAGAIN") {
    p = pid()
    signal_timeout[p]++
    timeout_count[p]++
    process[p] = execname()
  }
}

probe syscall.exit {
  p = pid()
  if (p in process) {
    delete process[p]
    delete timeout_count[p]
    delete poll_timeout[p]
    delete epoll_timeout[p]
    delete select_timeout[p]
    delete itimer_timeout[p]
    delete futex_timeout[p]
    delete nanosleep_timeout[p]
    delete signal_timeout[p]
  }
}

probe timer.s(1) {
  ansi_clear_screen()
  printf ("  pid |   poll  select   epoll  itimer   futex nanosle  signal| process\n")
  foreach (p in timeout_count- limit 20) {
     printf ("%5d |%7d %7d %7d %7d %7d %7d %7d| %-.38s\n", p,
              poll_timeout[p], select_timeout[p],
              epoll_timeout[p], itimer_timeout[p],
              futex_timeout[p], nanosleep_timeout[p],
              signal_timeout[p], process[p])
  }
}

timeout.stp tracks how many times each application used the following system calls over time:
  • poll
  • select
  • epoll
  • itimer
  • futex
  • nanosleep
  • signal
In some applications, these system calls are used excessively. As such, they are normally identified as "likely culprits" for polling applications. Note, however, that an application may be using a different system call to poll excessively; sometimes, it is useful to find out the top system calls used by the system (refer to Section 4.3.5, “Tracking Most Frequently Used System Calls” for instructions). Doing so can help you identify any additional suspects, which you can add to timeout.stp for tracking.

Example 4.13. timeout.stp Sample Output

  uid |   poll  select   epoll  itimer   futex nanosle  signal| process
28937 | 148793       0       0    4727   37288       0       0| firefox
22945 |      0   56949       0       1       0       0       0| scim-bridge
    0 |      0       0       0   36414       0       0       0| swapper
 4275 |  23140       0       0       1       0       0       0| mixer_applet2
 4191 |      0   14405       0       0       0       0       0| scim-launcher
22941 |   7908       1       0      62       0       0       0| gnome-terminal
 4261 |      0       0       0       2       0    7622       0| escd
 3695 |      0       0       0       0       0    7622       0| gdm-binary
 3483 |      0    7206       0       0       0       0       0| dhcdbd
 4189 |   6916       0       0       2       0       0       0| scim-panel-gtk
 1863 |   5767       0       0       0       0       0       0| iscsid
 2562 |      0    2881       0       1       0    1438       0| pcscd
 4257 |   4255       0       0       1       0       0       0| gnome-power-man
 4278 |   3876       0       0      60       0       0       0| multiload-apple
 4083 |      0    1331       0    1728       0       0       0| Xorg
 3921 |   1603       0       0       0       0       0       0| gam_server
 4248 |   1591       0       0       0       0       0       0| nm-applet
 3165 |      0    1441       0       0       0       0       0| xterm
29548 |      0    1440       0       0       0       0       0| httpd
 1862 |      0       0       0       0       0    1438       0| iscsid
You can increase the sample time by editing the timer in the second probe (timer.s()). The output of functioncallcount.stp contains the name and UID of the top 20 polling applications, along with how many times each application performed each polling system call (over time). Example 4.13, “timeout.stp Sample Output” contains an excerpt of the script:

4.3.5. Tracking Most Frequently Used System Calls

timeout.stp from Section 4.3.4, “Monitoring Polling Applications” helps you identify which applications are polling by pointing out which ones used the following system calls most frequently:
  • poll
  • select
  • epoll
  • itimer
  • futex
  • nanosleep
  • signal
However, in some systems, a different system call might be responsible for excessive polling. If you suspect that a polling application might is using a different system call to poll, you need to identify first the top system calls used by the system. To do this, use topsys.stp.
topsys.stp

#! /usr/bin/env stap
#
# This script continuously lists the top 20 systemcalls in the interval 
# 5 seconds
#

global syscalls_count

probe syscall.* {
  syscalls_count[name]++
}

function print_systop () {
  printf ("%25s %10s\n", "SYSCALL", "COUNT")
  foreach (syscall in syscalls_count- limit 20) {
    printf("%25s %10d\n", syscall, syscalls_count[syscall])
  }
  delete syscalls_count
}

probe timer.s(5) {
  print_systop ()
  printf("--------------------------------------------------------------\n")
}

topsys.stp lists the top 20 system calls used by the system per 5-second interval. It also lists how many times each system call was used during that period. Refer to Example 4.14, “topsys.stp Sample Output” for a sample output.

Example 4.14. topsys.stp Sample Output

--------------------------------------------------------------
                  SYSCALL      COUNT
             gettimeofday       1857
                     read       1821
                    ioctl       1568
                     poll       1033
                    close        638
                     open        503
                   select        455
                    write        391
                   writev        335
                    futex        303
                  recvmsg        251
                   socket        137
            clock_gettime        124
           rt_sigprocmask        121
                   sendto        120
                setitimer        106
                     stat         90
                     time         81
                sigreturn         72
                    fstat         66
--------------------------------------------------------------

4.3.6. Tracking System Call Volume Per Process

This section illustrates how to determine which processes are performing the highest volume of system calls. In previous sections, we've described how to monitor the top system calls used by the system over time (Section 4.3.5, “Tracking Most Frequently Used System Calls”). We've also described how to identify which applications use a specific set of "polling suspect" system calls the most (Section 4.3.4, “Monitoring Polling Applications”). Monitoring the volume of system calls made by each process provides more data in investigating your system for polling processes and other resource hogs.
syscalls_by_proc.stp

#! /usr/bin/env stap

# Copyright (C) 2006 IBM Corp.
#
# This file is part of systemtap, and is free software.  You can
# redistribute it and/or modify it under the terms of the GNU General
# Public License (GPL); either version 2, or (at your option) any
# later version.

#
# Print the system call count by process name in descending order.
#

global syscalls

probe begin {
  print ("Collecting data... Type Ctrl-C to exit and display results\n")
}

probe syscall.* {
  syscalls[execname()]++
}

probe end {
  printf ("%-10s %-s\n", "#SysCalls", "Process Name")
  foreach (proc in syscalls-)
    printf("%-10d %-s\n", syscalls[proc], proc)
}

syscalls_by_proc.stp lists the top 20 processes performing the highest number of system calls. It also lists how many system calls each process performed during the time period. Refer to Example 4.15, “topsys.stp Sample Output” for a sample output.

Example 4.15. topsys.stp Sample Output

Collecting data... Type Ctrl-C to exit and display results
#SysCalls  Process Name
1577       multiload-apple
692        synergyc
408        pcscd
376        mixer_applet2
299        gnome-terminal
293        Xorg
206        scim-panel-gtk
95         gnome-power-man
90         artsd
85         dhcdbd
84         scim-bridge
78         gnome-screensav
66         scim-launcher
[...]
If you prefer the output to display the process IDs instead of the process names, use the following script instead.
syscalls_by_pid.stp

#! /usr/bin/env stap

# Copyright (C) 2006 IBM Corp.
#
# This file is part of systemtap, and is free software.  You can
# redistribute it and/or modify it under the terms of the GNU General
# Public License (GPL); either version 2, or (at your option) any
# later version.

#
# Print the system call count by process ID in descending order.
#

global syscalls

probe begin {
  print ("Collecting data... Type Ctrl-C to exit and display results\n")
}

probe syscall.* {
  syscalls[pid()]++
}

probe end {
  printf ("%-10s %-s\n", "#SysCalls", "PID")
  foreach (pid in syscalls-)
    printf("%-10d %-d\n", syscalls[pid], pid)
}

As indicated in the output, you need to manually exit the script in order to display the results. You can add a timed expiration to either script by simply adding a timer.s() probe; for example, to instruct the script to expire after 5 seconds, add the following probe to the script:
probe timer.s(5)
{
	exit()
}