Chapter 15. Timestamping
15.1. Hardware Clocks
Multiprocessor systems such as NUMA or SMP have multiple instances of clock sources. The way clocks interact among themselves and the way they react to system events, such as CPU frequency scaling or entering energy economy modes, determine whether they are suitable clock sources for the realtime kernel.
During boot time the kernel discovers the available clock sources and selects one to use. The preferred clock source is the Time Stamp Counter (TSC), but if it is not available the High Precision Event Timer (HPET) is the second best option. However, not all systems have HPET clocks and some HPET clocks can be unreliable.
In the absence of TSC and HPET, other options include the ACPI Power Management Timer (ACPI_PM), the Programmable Interval Timer (PIT) and the Real Time Clock (RTC). The last two options are either costly to read or have a low resolution (time granularity), therefore they are sub-optimal for the realtime kernel.
For the list of the available clock sources in your system, view the
cat /sys/devices/system/clocksource/clocksource0/available_clocksourcetsc hpet acpi_pm
In the sample output above, the TSC, HPET and ACPI_PM clock sources are available.
The clock source currently in use can be inspected by reading the
It is possible to select a different clock source, from the list presented in the
/sys/devices/system/clocksource/clocksource0/available_clocksourcefile. To do so, write the name of the clock source into the
/sys/devices/system/clocksource/clocksource0/current_clocksourcefile. For example, the following command sets HPET as the clock source in use:
echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource
The kernel selects the best available clock source. Overriding the selected clock source is not recommended unless the implications are well understood.
While TSC is generally the preferred clock source, some of its hardware implementations may have shortcomings. For example, some TSC clocks can stop when the system goes to an idle state, or become out of sync when their CPUs enter deeper C-states (energy saving states) or perform speed- or frequency-scaling operations.
However, you can work around some of these TSC shortcomings by configuring additional kernel boot parameters. For instance, the
idle=pollparameter forces the clock to avoid entering the idle state, and the
processor.max_cstate=1parameter prevents the clock from entering deeper C-states. Note however that in both cases there would be an increase on energy consumption, as the system would always run at top speed.
For a comprehensive list of clock sources see the Timing Measurements chapter in Understanding The Linux Kernel by Daniel P. Bovet and Marco Cesati.
15.1.1. Reading Hardware Clock Sources
Reading from the TSC means reading a register from the processor. Reading from the HPET clock means reading a memory area. Reading from the TSC is faster, which provides a significant performance advantage when timestamping hundreds of thousands of messages per second.
Using a simple program that reads the current clock source 10,000,000 times in a row, it is possible to observe the duration required to read the clock sources available:
Example 15.1. Comparing the Cost of Reading Hardware Clock Sources
In this example, the clock source currently in use is TSC, as shown by the output of the
timecommand is used to view the duration required to read the clock source 10 million times:
cat /sys/devices/system/clocksource/clocksource0/current_clocksourcetsc ~]#
time ./clock_timingreal 0m0.601s user 0m0.592s sys 0m0.002s
The clock source is changed to HPET to compare the duration required to generate 10 million timestamps:
echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource~]#
cat /sys/devices/system/clocksource/clocksource0/current_clocksourcehpet ~]#
time ./clock_timingreal 0m12.263s user 0m12.197s sys 0m0.001s
The steps are repeated with the ACPI_PM clock source:
echo acpi_pm > /sys/devices/system/clocksource/clocksource0/current_clocksource~]#
cat /sys/devices/system/clocksource/clocksource0/current_clocksourceacpi_pm ~]#
time ./clock_timingreal 0m24.461s user 0m0.504s sys 0m23.776s
time(1)man page provides detailed information on how to use the command and interpret its output. The example above uses the following categories:
real: The total time spent beginning from program invocation until the process ends.
systimes, and will usually be larger than the sum of the latter two. If this process is interrupted by an application with higher priority, or by a system event such as a hardware interrupt (IRQ), this time spent waiting is also computed under
user: The time the process spent in user space, performing tasks that did not require kernel intervention.
sys: The time spent by the kernel while performing tasks required by the user process. These tasks include opening files, reading and writing to files or I/O ports, memory allocation, thread creation and network related activities.
As seen from the results of Example 15.1, “Comparing the Cost of Reading Hardware Clock Sources”, the efficiency of generating timestamps, in descending order, is: TSC, HPET, ACPI_PM. This is because of the increased overhead to access time values from the HPET and ACPI_PM timers.