-
Language:
English
-
Language:
English
Red Hat Training
A Red Hat training course is available for RHEL 8
Chapter 15. Timestamping
15.1. Hardware clocks
Multiprocessor systems such as NUMA or SMP have multiple instances of clock sources. The way clocks interact among themselves and the way they react to system events, such as CPU frequency scaling or entering energy economy modes, determine whether they are suitable clock sources for the realtime kernel.
During boot time the kernel discovers the available clock sources and selects one to use. The preferred clock source is the Time Stamp Counter (TSC), but if it is not available the High Precision Event Timer (HPET) is the second best option. However, not all systems have HPET clocks and some HPET clocks can be unreliable.
In the absence of TSC and HPET, other options include the ACPI Power Management Timer (ACPI_PM), the Programmable Interval Timer (PIT) and the Real Time Clock (RTC). The last two options are either costly to read or have a low resolution (time granularity), therefore they are sub-optimal for the realtime kernel.
For the list of the available clock sources in your system, view the /sys/devices/system/clocksource/clocksource0/available_clocksource
file:
# cat /sys/devices/system/clocksource/clocksource0/available_clocksource tsc hpet acpi_pm
In the sample output above, the TSC, HPET and ACPI_PM clock sources are available.
The clock source currently in use can be inspected by reading the /sys/devices/system/clocksource/clocksource0/current_clocksource
file:
# cat /sys/devices/system/clocksource/clocksource0/current_clocksource tsc
It is possible to select a different clock source, from the list presented in the /sys/devices/system/clocksource/clocksource0/available_clocksource
file. To do so, write the name of the clock source into the /sys/devices/system/clocksource/clocksource0/current_clocksource
file. For example, the following command sets HPET as the clock source in use:
# echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource
The kernel selects the best available clock source. Overriding the selected clock source is not recommended unless the implications are well understood.
While TSC is generally the preferred clock source, some of its hardware implementations may have shortcomings. For example, some TSC clocks can stop when the system goes to an idle state, or become out of sync when their CPUs enter deeper C-states (energy saving states) or perform speed- or frequency-scaling operations.
However, you can work around some of these TSC shortcomings by configuring additional kernel boot parameters. For instance, the idle=poll
parameter forces the clock to avoid entering the idle state, and the processor.max_cstate=1
parameter prevents the clock from entering deeper C-states. Note however that in both cases there would be an increase on energy consumption, as the system would always run at top speed.
For a comprehensive list of clock sources see the Timing Measurements chapter in Understanding The Linux Kernel by Daniel P. Bovet and Marco Cesati.
15.1.1. Reading hardware clock sources
Reading from the TSC means reading a register from the processor. Reading from the HPET clock means reading a memory area. Reading from the TSC is faster, which provides a significant performance advantage when timestamping hundreds of thousands of messages per second.
Using a simple program that reads the current clock source 10,000,000 times in a row, it is possible to observe the duration required to read the clock sources available:
Example 15.1. Comparing the cost of reading hardware clock sources
In this example, the clock source currently in use is TSC. The time
command is used to view the duration required to read the clock source 10 million times:
# cat /sys/devices/system/clocksource/clocksource0/current_clocksource tsc # time ./clock_timing real 0m0.601s user 0m0.592s sys 0m0.002s
The clock source is changed to HPET to compare the duration required to generate 10 million timestamps:
# echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource # cat /sys/devices/system/clocksource/clocksource0/current_clocksource hpet # time ./clock_timing real 0m12.263s user 0m12.197s sys 0m0.001s
The steps are repeated with the ACPI_PM clock source:
# echo acpi_pm > /sys/devices/system/clocksource/clocksource0/current_clocksource # cat /sys/devices/system/clocksource/clocksource0/current_clocksource acpi_pm # time ./clock_timing real 0m24.461s user 0m0.504s sys 0m23.776s
The time(1)
manual page provides detailed information on how to use the command and interpret its output. The example above uses the following categories:
-
real
: The total time spent beginning from program invocation until the process ends.real
includesuser
andsys
times, and will usually be larger than the sum of the latter two. If this process is interrupted by an application with higher priority, or by a system event such as a hardware interrupt (IRQ), this time spent waiting is also computed underreal
. -
user
: The time the process spent in user space, performing tasks that did not require kernel intervention. -
sys
: The time spent by the kernel while performing tasks required by the user process. These tasks include opening files, reading and writing to files or I/O ports, memory allocation, thread creation and network related activities.
As seen from the results of Example 15.1, “Comparing the cost of reading hardware clock sources”, the efficiency of generating timestamps, in descending order, is: TSC, HPET, ACPI_PM. This is because of the increased overhead to access time values from the HPET and ACPI_PM timers.
15.2. POSIX clocks
POSIX is a standard for implementing and representing time sources. In contrast to the hardware clock, which is selected by the kernel and implemented across the system; the POSIX clock can be selected by each application, without affecting other applications in the system.
-
CLOCK_REALTIME
: it represents the time in the real world, also referred to as 'wall time' meaning the time as read from the clock on the wall. This clock is used to timestamp events, and when interfacing with the user. It can be modified by an user with the right privileges. However, user modification should be used with caution as it can lead to erroneous data if the clock has its value changed between two readings. -
CLOCK_MONOTONIC
: represents the time monotonically increased since the system boot. This clock cannot be set by any process, and is the preferred clock for calculating the time difference between events. The following examples in this section useCLOCK_MONOTONIC
as the POSIX clock.
For more information on POSIX clocks see the following manual page and book:
-
clock_gettime()
(2) - Linux System Programming by Robert Love
The function used to read a given POSIX clock is clock_gettime()
, which is defined at <time.h>
. The clock_gettime()
command takes two parameters: the POSIX clock ID and a timespec structure which will be filled with the duration used to read the clock. The following example shows the function to measure the cost of reading the clock:
Example 15.2. Using clock_gettime()
to measure the cost of reading POSIX clocks
#include <time.h> main() { int rc; long i; struct timespec ts; for(i=0; i<10000000; i++) { rc = clock_gettime(CLOCK_MONOTONIC, &ts); } }
You can improve upon the example above by adding more code to verify the return code of clock_gettime()
, to verify the value of the rc
variable, or to ensure the content of the ts
structure is to be trusted. The clock_gettime()
manpage provides more information to help you write more reliable applications.
Programs using the clock_gettime()
function must be linked with the rt
library by adding '-lrt'
to the gcc
command line:
$ gcc clock_timing.c -o clock_timing -lrt
15.2.1. CLOCK_MONOTONIC_COARSE
and CLOCK_REALTIME_COARSE
Functions such as clock_gettime()
and gettimeofday()
have a counterpart in the kernel, in the form of a system call. When a user process calls clock_gettime()
, the corresponding C library (glibc
) routine calls the sys_clock_gettime()
system call, which performs the requested operation and then returns the result to the user process.
However, this context switch from user application to kernel has a cost. Even though this cost is very low, if the operation is repeated thousands of times, the accumulated cost can have an impact on the overall performance of the application.
To avoid the context switch to the kernel, thus making it faster to read the clock, support for the CLOCK_MONOTONIC_COARSE
and CLOCK_REALTIME_COARSE
POSIX clocks was created in the form of a VDSO library function. The _COARSE
variants are faster to read and have a precision (also known as resolution) of one millisecond (ms).
15.2.2. Using clock_getres()
to compare clock resolution
Using the clock_getres()
function you can check the resolution of a given POSIX clock. clock_getres()
uses the same two parameters as clock_gettime()
: the ID of the POSIX clock to be used, and a pointer to the timespec structure where the result is returned. The following function enables you to compare the precision between CLOCK_MONOTONIC
and CLOCK_MONOTONIC_COARSE
:
main() { int rc; struct timespec res; rc = clock_getres(CLOCK_MONOTONIC, &res); if (!rc) printf("CLOCK_MONOTONIC: %ldns\n", res.tv_nsec); rc = clock_getres(CLOCK_MONOTONIC_COARSE, &res); if (!rc) printf("CLOCK_MONOTONIC_COARSE: %ldns\n", res.tv_nsec); }
Example 15.3. Sample output of clock_getres
TSC: # ./clock_resolution CLOCK_MONOTONIC: 1ns CLOCK_MONOTONIC_COARSE: 999848ns (about 1ms) HPET: # ./clock_resolution CLOCK_MONOTONIC: 1ns CLOCK_MONOTONIC_COARSE: 999848ns (about 1ms) ACPI_PM: # ./clock_resolution CLOCK_MONOTONIC: 1ns CLOCK_MONOTONIC_COARSE: 999848ns (about 1ms)
15.2.3. Using C code to compare clock resolution
Using the following code snippet it is possible to observe the format of the data read from the CLOCK_MONOTONIC
POSIX clock. All nine digits in the tv_nsec
field of the timespec structure are meaningful as the clock has a nanosecond resolution. The example function, named clock_test.c
, is as follows:
#include <stdio.h> #include <stdlib.h> #include <time.h> main() { int i; struct timespec ts; for(i=0; i<5; i++) { clock_gettime(CLOCK_MONOTONIC, &ts); printf("%ld.%ld\n", ts.tv_sec, ts.tv_nsec); usleep(200); } }
Example 15.4. Sample output of clock_test.c
and clock_test_coarse.c
As specified in the code above, the function reads the clock five times, with 200 microseconds between each reading:
# gcc clock_test.c -o clock_test -lrt # ./clock_test 218449.986980853 218449.987330908 218449.987590716 218449.987849549 218449.988108248
Using the same source code, renaming it to clock_test_coarse.c
and replacing CLOCK_MONOTONIC
with CLOCK_MONOTONIC_COARSE
, the result would look something like:
# ./clock_test_coarse 218550.844862154 218550.844862154 218550.844862154 218550.845862154 218550.845862154
The _COARSE
clocks have a one millisecond precision, therefore only the first three digits of the tv_nsec
field of the timespec structure are significant. The result above could be read as:
# ./clock_test_coarse 218550.844 218550.844 218550.844 218550.845 218550.845
The _COARSE
variants of the POSIX clocks are particularly useful in cases where timestamping can be performed with millisecond precision. The benefits are more evident on systems which use hardware clocks with high costs for the reading operations, such as ACPI_PM.
15.2.4. Using the time
command to compare cost of reading clocks
Using the time
command to read the clock source 10 million times in a row, you can compare the costs of reading CLOCK_MONOTONIC
and CLOCK_MONOTONIC_COARSE
representations of the hardware clocks available. The following example uses TSC, HPET and ACPI_PM hardware clocks. For more information on how to decipher the output of the time
command see Section 15.1.1, “Reading hardware clock sources”.
Example 15.5. Comparing the cost of reading POSIX clocks
TSC: # time ./clock_timing_monotonic real 0m0.567s user 0m0.559s sys 0m0.002s # time ./clock_timing_monotonic_coarse real 0m0.120s user 0m0.118s sys 0m0.001s HPET: # time ./clock_timing_monotonic real 0m12.257s user 0m12.179s sys 0m0.002s # time ./clock_timing_monotonic_coarse real 0m0.119s user 0m0.118s sys 0m0.000s ACPI_PM: # time ./clock_timing_monotonic real 0m25.524s user 0m0.451s sys 0m24.932s # time ./clock_timing_monotonic_coarse real 0m0.119s user 0m0.117s sys 0m0.001s
As seen from Example 15.5, “Comparing the cost of reading POSIX clocks”, the sys
time (the time spent by the kernel to perform tasks required by the user process) is greatly reduced when the _COARSE
clocks are used. This is particularly evident in the ACPI_PM clock timings, which indicates that _COARSE
variants of POSIX clocks yield high performance gains on clocks with high reading costs.