Red Hat Training

A Red Hat training course is available for RHEL 8

Chapter 15. Timestamping

15.1. Hardware clocks

Multiprocessor systems such as NUMA or SMP have multiple instances of clock sources. The way clocks interact among themselves and the way they react to system events, such as CPU frequency scaling or entering energy economy modes, determine whether they are suitable clock sources for the realtime kernel.

During boot time the kernel discovers the available clock sources and selects one to use. The preferred clock source is the Time Stamp Counter (TSC), but if it is not available the High Precision Event Timer (HPET) is the second best option. However, not all systems have HPET clocks and some HPET clocks can be unreliable.

In the absence of TSC and HPET, other options include the ACPI Power Management Timer (ACPI_PM), the Programmable Interval Timer (PIT) and the Real Time Clock (RTC). The last two options are either costly to read or have a low resolution (time granularity), therefore they are sub-optimal for the realtime kernel.

For the list of the available clock sources in your system, view the /sys/devices/system/clocksource/clocksource0/available_clocksource file:

# cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc hpet acpi_pm

In the sample output above, the TSC, HPET and ACPI_PM clock sources are available.

The clock source currently in use can be inspected by reading the /sys/devices/system/clocksource/clocksource0/current_clocksource file:

# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc

It is possible to select a different clock source, from the list presented in the /sys/devices/system/clocksource/clocksource0/available_clocksource file. To do so, write the name of the clock source into the /sys/devices/system/clocksource/clocksource0/current_clocksource file. For example, the following command sets HPET as the clock source in use:

# echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource
Important

The kernel selects the best available clock source. Overriding the selected clock source is not recommended unless the implications are well understood.

While TSC is generally the preferred clock source, some of its hardware implementations may have shortcomings. For example, some TSC clocks can stop when the system goes to an idle state, or become out of sync when their CPUs enter deeper C-states (energy saving states) or perform speed- or frequency-scaling operations.

However, you can work around some of these TSC shortcomings by configuring additional kernel boot parameters. For instance, the idle=poll parameter forces the clock to avoid entering the idle state, and the processor.max_cstate=1 parameter prevents the clock from entering deeper C-states. Note however that in both cases there would be an increase on energy consumption, as the system would always run at top speed.

Note

For a comprehensive list of clock sources see the Timing Measurements chapter in Understanding The Linux Kernel by Daniel P. Bovet and Marco Cesati.

15.1.1. Reading hardware clock sources

Reading from the TSC means reading a register from the processor. Reading from the HPET clock means reading a memory area. Reading from the TSC is faster, which provides a significant performance advantage when timestamping hundreds of thousands of messages per second.

Using a simple program that reads the current clock source 10,000,000 times in a row, it is possible to observe the duration required to read the clock sources available:

Example 15.1. Comparing the cost of reading hardware clock sources

In this example, the clock source currently in use is TSC. The time command is used to view the duration required to read the clock source 10 million times:

# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
# time ./clock_timing

	real	0m0.601s
	user	0m0.592s
	sys	0m0.002s

The clock source is changed to HPET to compare the duration required to generate 10 million timestamps:

# echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource
# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hpet
# time ./clock_timing

	real	0m12.263s
	user	0m12.197s
	sys	0m0.001s

The steps are repeated with the ACPI_PM clock source:

# echo acpi_pm > /sys/devices/system/clocksource/clocksource0/current_clocksource
# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
acpi_pm
# time ./clock_timing

	real	0m24.461s
	user	0m0.504s
	sys	0m23.776s

The time(1) manual page provides detailed information on how to use the command and interpret its output. The example above uses the following categories:

  • real: The total time spent beginning from program invocation until the process ends. real includes user and sys times, and will usually be larger than the sum of the latter two. If this process is interrupted by an application with higher priority, or by a system event such as a hardware interrupt (IRQ), this time spent waiting is also computed under real.
  • user: The time the process spent in user space, performing tasks that did not require kernel intervention.
  • sys: The time spent by the kernel while performing tasks required by the user process. These tasks include opening files, reading and writing to files or I/O ports, memory allocation, thread creation and network related activities.

As seen from the results of Example 15.1, “Comparing the cost of reading hardware clock sources”, the efficiency of generating timestamps, in descending order, is: TSC, HPET, ACPI_PM. This is because of the increased overhead to access time values from the HPET and ACPI_PM timers.

15.2. POSIX clocks

POSIX is a standard for implementing and representing time sources. In contrast to the hardware clock, which is selected by the kernel and implemented across the system; the POSIX clock can be selected by each application, without affecting other applications in the system.

  • CLOCK_REALTIME: it represents the time in the real world, also referred to as 'wall time' meaning the time as read from the clock on the wall. This clock is used to timestamp events, and when interfacing with the user. It can be modified by an user with the right privileges. However, user modification should be used with caution as it can lead to erroneous data if the clock has its value changed between two readings.
  • CLOCK_MONOTONIC: represents the time monotonically increased since the system boot. This clock cannot be set by any process, and is the preferred clock for calculating the time difference between events. The following examples in this section use CLOCK_MONOTONIC as the POSIX clock.
Note

For more information on POSIX clocks see the following manual page and book:

  • clock_gettime()(2)
  • Linux System Programming by Robert Love

The function used to read a given POSIX clock is clock_gettime(), which is defined at <time.h>. The clock_gettime() command takes two parameters: the POSIX clock ID and a timespec structure which will be filled with the duration used to read the clock. The following example shows the function to measure the cost of reading the clock:

Example 15.2. Using clock_gettime() to measure the cost of reading POSIX clocks

#include <time.h>

main()
{
	int rc;
	long i;
	struct timespec ts;

	for(i=0; i<10000000; i++) {
		rc = clock_gettime(CLOCK_MONOTONIC, &ts);
	}
}

You can improve upon the example above by adding more code to verify the return code of clock_gettime(), to verify the value of the rc variable, or to ensure the content of the ts structure is to be trusted. The clock_gettime() manpage provides more information to help you write more reliable applications.

Important

Programs using the clock_gettime() function must be linked with the rt library by adding '-lrt' to the gcc command line:

$ gcc clock_timing.c -o clock_timing -lrt

15.2.1. CLOCK_MONOTONIC_COARSE and CLOCK_REALTIME_COARSE

Functions such as clock_gettime() and gettimeofday() have a counterpart in the kernel, in the form of a system call. When a user process calls clock_gettime(), the corresponding C library (glibc) routine calls the sys_clock_gettime() system call, which performs the requested operation and then returns the result to the user process.

However, this context switch from user application to kernel has a cost. Even though this cost is very low, if the operation is repeated thousands of times, the accumulated cost can have an impact on the overall performance of the application.

To avoid the context switch to the kernel, thus making it faster to read the clock, support for the CLOCK_MONOTONIC_COARSE and CLOCK_REALTIME_COARSE POSIX clocks was created in the form of a VDSO library function. The _COARSE variants are faster to read and have a precision (also known as resolution) of one millisecond (ms).

15.2.2. Using clock_getres() to compare clock resolution

Using the clock_getres() function you can check the resolution of a given POSIX clock. clock_getres() uses the same two parameters as clock_gettime(): the ID of the POSIX clock to be used, and a pointer to the timespec structure where the result is returned. The following function enables you to compare the precision between CLOCK_MONOTONIC and CLOCK_MONOTONIC_COARSE:

main()
{
int rc;
struct timespec res;

	rc = clock_getres(CLOCK_MONOTONIC, &res);
	if (!rc)
		printf("CLOCK_MONOTONIC: %ldns\n", res.tv_nsec);
	rc = clock_getres(CLOCK_MONOTONIC_COARSE, &res);
	if (!rc)
		printf("CLOCK_MONOTONIC_COARSE: %ldns\n", res.tv_nsec);
}

Example 15.3. Sample output of clock_getres

TSC:
	# ./clock_resolution
	CLOCK_MONOTONIC: 1ns
	CLOCK_MONOTONIC_COARSE: 999848ns  (about 1ms)

HPET:
	# ./clock_resolution
	CLOCK_MONOTONIC: 1ns
	CLOCK_MONOTONIC_COARSE: 999848ns  (about 1ms)

ACPI_PM:
	# ./clock_resolution
	CLOCK_MONOTONIC: 1ns
	CLOCK_MONOTONIC_COARSE: 999848ns  (about 1ms)

15.2.3. Using C code to compare clock resolution

Using the following code snippet it is possible to observe the format of the data read from the CLOCK_MONOTONIC POSIX clock. All nine digits in the tv_nsec field of the timespec structure are meaningful as the clock has a nanosecond resolution. The example function, named clock_test.c, is as follows:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

main()
{
	int i;
	struct timespec ts;

	for(i=0; i<5; i++) {
		clock_gettime(CLOCK_MONOTONIC, &ts);
		printf("%ld.%ld\n", ts.tv_sec, ts.tv_nsec);
		usleep(200);
	}
}

Example 15.4. Sample output of clock_test.c and clock_test_coarse.c

As specified in the code above, the function reads the clock five times, with 200 microseconds between each reading:

# gcc clock_test.c -o clock_test -lrt
# ./clock_test
218449.986980853
218449.987330908
218449.987590716
218449.987849549
218449.988108248

Using the same source code, renaming it to clock_test_coarse.c and replacing CLOCK_MONOTONIC with CLOCK_MONOTONIC_COARSE, the result would look something like:

# ./clock_test_coarse
218550.844862154
218550.844862154
218550.844862154
218550.845862154
218550.845862154

The _COARSE clocks have a one millisecond precision, therefore only the first three digits of the tv_nsec field of the timespec structure are significant. The result above could be read as:

# ./clock_test_coarse
218550.844
218550.844
218550.844
218550.845
218550.845

The _COARSE variants of the POSIX clocks are particularly useful in cases where timestamping can be performed with millisecond precision. The benefits are more evident on systems which use hardware clocks with high costs for the reading operations, such as ACPI_PM.

15.2.4. Using the time command to compare cost of reading clocks

Using the time command to read the clock source 10 million times in a row, you can compare the costs of reading CLOCK_MONOTONIC and CLOCK_MONOTONIC_COARSE representations of the hardware clocks available. The following example uses TSC, HPET and ACPI_PM hardware clocks. For more information on how to decipher the output of the time command see Section 15.1.1, “Reading hardware clock sources”.

Example 15.5. Comparing the cost of reading POSIX clocks

TSC:
	# time ./clock_timing_monotonic

	real	0m0.567s
	user	0m0.559s
	sys	0m0.002s

	# time ./clock_timing_monotonic_coarse

	real	0m0.120s
	user	0m0.118s
	sys	0m0.001s

HPET:
	# time ./clock_timing_monotonic

	real	0m12.257s
	user	0m12.179s
	sys	0m0.002s

	# time ./clock_timing_monotonic_coarse

	real	0m0.119s
	user	0m0.118s
	sys	0m0.000s

ACPI_PM:
	# time ./clock_timing_monotonic

	real	0m25.524s
	user	0m0.451s
	sys	0m24.932s

	# time ./clock_timing_monotonic_coarse

	real	0m0.119s
	user	0m0.117s
	sys	0m0.001s

As seen from Example 15.5, “Comparing the cost of reading POSIX clocks”, the sys time (the time spent by the kernel to perform tasks required by the user process) is greatly reduced when the _COARSE clocks are used. This is particularly evident in the ACPI_PM clock timings, which indicates that _COARSE variants of POSIX clocks yield high performance gains on clocks with high reading costs.