Reference Guide
Core concepts and terminology for using RHEL for Real Time
Marie Doleželová
Maxim Svistunov
Radek Bíba
David Ryan
Cheryn Tan
Lana Brindley
Alison Young
Abstract
Preface
Part I. Hardware
Chapter 1. Processor Cores
1.1. Caches
1.2. Interconnects
Chapter 2. Memory Allocation
2.1. Demand Paging
Figure 2.1. Red Hat Enterprise Linux for Real Time Virtual Memory System
pgfault
value in the /proc/vmstat
file.
/proc
directory. For a particular process PID, use the cat
command to view the /proc/PID/stat
file. The relevant entries in this file are:
Field 2
- filename of the executableField 10
- number of minor page faultsField 12
- number of major page faults
Example 2.1. Using the /proc/PID/stat
File to Check for Page Faults
/proc/PID/stat
file to check for page faults in a running process.
cat
command and a pipe function to return only the second, tenth, and twelfth lines of the /proc/PID/stat
file:
~]# cat /proc/3366/stat | cut -d\ -f2,10,12
(bash) 5389 0
bash
, and it has reported 5389 minor page faults, and no major page faults.
Note
- Linux System Programming by Robert Love
2.2. Using mlock
to Avoid Page I/O
mlock
and mlockall
system calls tell the system to lock to a specified memory range, and to not allow that memory to be paged. This means that once the physical page has been allocated to the page table entry, references to that page will always be fast.
mlock
system calls available. The mlock
and munlock
calls lock and unlock a specific range of addresses. The mlockall
and munlockall
calls lock or unlock the entire program space.
mlock
carefully and exercise caution. If the application is large, or if it has a large data domain, the mlock
calls can cause thrashing if the system cannot allocate memory for other tasks.
Note
mlock
with care. Using it excessively can lead to an out of memory (OOM) error. Do not put an mlockall
call at the start of your application. It is recommended that only the data and text of the realtime portion of the application be locked.
mlock
will not guarantee that the program will experience no page I/O. It is used to ensure that the data will stay in memory, but cannot ensure that it will stay in the same page. Other functions such as move_pages
and memory compactors can move data around despite the use of mlock
.
Important
CAP_IPC_LOCK
capability in order to be able to use mlockall
or mlock
on large buffers. See the capabilities(7) man page for details.
mlock
or mlockall
, they will be unlocked by a single call to munlock
for the corresponding page, or by munlockall
. Thus, the application must be aware of which pages it is unlocking in order to prevent this double-lock/single-unlock problem.
- Tracking the memory areas allocated and locked, and creating a wrapper function that, before unlocking a page, verifies how many users (allocations) that page has. This is the resource counting principle used in device drivers.
- Performing allocations considering the page size and alignment, in order to prevent a double-lock in the same page.
mlock
depends on the application's needs and system resources. Although there is no single solution for all the applications, the following code example can be used as a starting point for the implementation of a function that will allocate and lock memory buffers.
Example 2.2. Using mlock
in an Application
#include <stdlib.h> #include <unistd.h> #include <sys/mman.h> void * alloc_workbuf(size_t size) { void *ptr; int retval; /* * alloc memory aligned to a page, to prevent two mlock() in the * same page. */ retval = posix_memalign(&ptr, (size_t) sysconf(_SC_PAGESIZE), size); /* return NULL on failure */ if (retval) return NULL; /* lock this buffer into RAM */ if (mlock(ptr, size)) { free(ptr); return NULL; } return ptr; } void free_workbuf(void *ptr, size_t size) { /* unlock the address range */ munlock(ptr, size); /* free the memory */ free(ptr); }
alloc_workbuf
dynamically allocates a memory buffer and locks it. The memory allocation is performed by posix_memalig
in order to align the memory area to a page. If the size
variable is smaller then a page size, regular malloc
allocation will be able to use the remainder of the page. But, to safely use this method advantage, no mlock
calls can be made on regular malloc
allocations. This will prevent the double-lock/single-unlock problem. The function free_workbuf
will unlock and free the memory area.
mlock
and mlockall
, it is possible to allocate and lock a memory area using mmap
with the MAP_LOCKED
flag. The following example is the implementation of the aforementioned code using mmap
.
Example 2.3. Using mmap
in an Application
#include <sys/mman.h> #include <stdlib.h> void * alloc_workbuf(size_t size) { void *ptr; ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_LOCKED, -1, 0); if (ptr == MAP_FAILED) return NULL; return ptr; } void free_workbuf(void *ptr, size_t size) { munmap(ptr, size); }
mmap
allocates memory on a page basis, there are no two locks in the same page, helping to prevent the double-lock/single-unlock problem. On the other hand, if the size
variable is not a multiple of the page size, the rest of the page is wasted. Furthermore, a call to munlockall
unlocks the memory locked by mmap
.
mlockall
prior to entering a time-sensitive region of the code, followed by munlockall
at the end of the time-sensitive region. This can reduce paging while in the critical section. Similarly, mlock
can be used on a data region that is relatively static or that will grow slowly but needs to be accessed without page I/O.
Note
- capabilities(7)
- mlock(2)
- mlock(3)
- mlockall(2)
- mmap(2)
- move_pages(2)
- posix_memalign(3)
- posix_memalign(3p)
Chapter 3. Hardware Interrupts
Example 3.1. Viewing Interrupts on Your System
cat
command to view /proc/interrupts
:
~]$ cat /proc/interrupts
CPU0 CPU1
0: 13072311 0 IO-APIC-edge timer
1: 18351 0 IO-APIC-edge i8042
8: 190 0 IO-APIC-edge rtc0
9: 118508 5415 IO-APIC-fasteoi acpi
12: 747529 86120 IO-APIC-edge i8042
14: 1163648 0 IO-APIC-edge ata_piix
15: 0 0 IO-APIC-edge ata_piix
16: 12681226 126932 IO-APIC-fasteoi ahci, uhci_hcd:usb2, radeon, yenta, eth0
17: 3717841 0 IO-APIC-fasteoi uhci_hcd:usb3, HDA, iwl3945
18: 0 0 IO-APIC-fasteoi uhci_hcd:usb4
19: 577 68 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb5
NMI: 0 0 Non-maskable interrupts
LOC: 3755270 9388684 Local timer interrupts
RES: 1184857 2497600 Rescheduling interrupts
CAL: 12471 2914 function call interrupts
TLB: 14555 15567 TLB shootdowns
TRM: 0 0 Thermal event interrupts
SPU: 0 0 Spurious interrupts
ERR: 0
MIS: 0
3.1. Level-Signaled Interrupts
3.2. Message-Signaled Interrupts
pci=nomsi
on the kernel command line.
3.3. Non-Maskable Interrupts
3.4. System Management Interrupts
Note
hwlatdetect
utility, which is available in the rt-tests
package. This utility is designed to measure periods of time during which the CPU has been stolen by an SMI handling routine.
3.5. Advanced Programmable Interrupt Controller
Part II. Application Architecture
Chapter 4. Threads and Processes
- Process
- A UNIX®-style process is an operating system construct that contains:
- Address mappings for virtual memory
- An execution context (PC, stack, registers)
- State/Accounting information
Linux processes started as exactly this style of process. When the concept of more than one process running inside one address space was developed, Linux turned to a process structure that shares an address space with another process. This works well, as long as the process data structure is kept small. For the remainder of this document, the term process refers to an independent address space, potentially containing multiple threads. - Thread
- Strictly, a thread is a schedulable entity that contains:
- A program counter (PC)
- A register context
- A stack pointer
Multiple threads can exist within a process.
- Use the
fork
andexec
functions to create new processes - Use the Posix Threads (pthreads) API to create new threads within an already running process
Note
Note
- fork(2)
- exec(2)
- Programming with POSIX Threads, David R. Butenhof, Addison-Wesley, ISBN 0-201-63392-2
- Advanced Programming in the UNIX Environment, 2nd Ed., W. Richard Stevens and Stephen A. Rago, Addison-Wesley, ISBN 0-201-43307-9
- “POSIX Threads Programming”, Blaise Barney, Lawrence Livermore National Laboratory, http://www.llnl.gov/computing/tutorials/pthreads/
Chapter 5. Priorities and Policies
SCHED_OTHER
orSCHED_NORMAL
: The default policySCHED_BATCH
: Similar toSCHED_OTHER
, but with a throughput orientationSCHED_IDLE
: A lower priority thanSCHED_OTHER
SCHED_FIFO
: A first in/first out realtime policySCHED_RR
: A round-robin realtime policy
SCHED_OTHER
, SCHED_FIFO
, and SCHED_RR
.
SCHED_OTHER
or SCHED_NORMAL
is the default scheduling policy for Linux threads. It has a dynamic priority that is changed by the system based on the characteristics of the thread. Another thing that effects the priority of SCHED_OTHER
threads is their nice value. The nice value is a number between -20 (highest priority) and 19 (lowest priority). By default, SCHED_OTHER
threads have a nice value of 0. Adjusting the nice value will change the way the thread is handled.
SCHED_FIFO
policy will run ahead of SCHED_OTHER
tasks. Instead of using nice values, SCHED_FIFO
uses a fixed priority between 1 (lowest) and 99 (highest). A SCHED_FIFO
thread with a priority of 1 will always be scheduled ahead of any SCHED_OTHER
thread.
SCHED_RR
policy is very similar to the SCHED_FIFO
policy. In the SCHED_RR
policy, threads of equal priority are scheduled in a round-robin fashion. Generally, SCHED_FIFO
is preferred over SCHED_RR
.
SCHED_FIFO
and SCHED_RR
threads will run until one of the following events occurs:
- The thread goes to sleep or begins waiting for an event
- A higher-priority realtime thread becomes ready to run
Chapter 6. Affinity
- Reserve one CPU core for all system processes and allow the application to run on the remainder of the cores.
- Allow a thread application and a given kernel thread (such as the network softirq or a driver thread) on the same CPU.
- Pair producer and consumer threads on each CPU.
Tuna
tool, or through the use of shell scripts to modify the bitmask value. The taskset
command can be used to change the affinity of a process, while modifying the /proc
filesystem entry changes the affinity of an interrupt.
Note
6.1. Using the taskset
Command to Set Processor Affinity
taskset
command sets and checks affinity information for a given process. These tasks can also be achieved using the Tuna tool.
-p
or --pid
option and the PID of the process to be checked. The -c
or --cpu-list
option displays the information as a numerical list of cores, instead of as a bitmask.
~]# taskset -p -c 1000
pid 1000's current affinity list: 0,1
~]# taskset -p -c 1 1000
pid 1000's current affinity list: 0,1
pid 1000's new affinity list: 1
~]# taskset -p -c 0,1 1000
pid 1000's current affinity list: 1
pid 1000's new affinity list: 0,1
taskset
command can also be used to start a new process with a particular affinity. This command will run the /bin/my-app
application on CPU 4:
~]# taskset -c 4 /bin/my-app
/bin/my-app
application on CPU 4, with a SCHED_FIFO
policy and a priority of 78:
~]# taskset -c 5 chrt -f 78 /bin/my-app
6.2. Using the sched_setaffinity()
System Call to Set Processor Affinity
taskset
command, processor affinity can also be set using the sched_setaffinity()
system call.
#define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <errno.h> #include <sched.h> int main(int argc, char **argv) { int i, online=0; ulong ncores = sysconf(_SC_NPROCESSORS_CONF); cpu_set_t *setp = CPU_ALLOC(ncores); ulong setsz = CPU_ALLOC_SIZE(ncores); CPU_ZERO_S(setsz, setp); if (sched_getaffinity(0, setsz, setp) == -1) { perror("sched_getaffinity(2) failed"); exit(errno); } for (i=0; i < CPU_COUNT_S(setsz, setp); i++) { if (CPU_ISSET_S(i, setsz, setp)) online++; } printf("%d cores configured, %d cpus allowed in affinity mask\n", ncores, online); CPU_FREE(setp); }
Note
- sched_setaffinity(2)
Chapter 7. Thread Synchronization
7.1. Mutexes
mutex
is derived from the term mutual exclusion. A mutex is a POSIX threads construct, and is created using the pthread_create_mutex
library call. A mutex serializes access to each section of code, so that only one thread of an application is running the code at any one time.
futex
, or Fast User muTEX, which is an internal mechanism used to implement mutexes. Futexes use shared conventions between the kernel and the C library. This allows an uncontended mutex to be locked or freed without a context switch to kernel space.
7.2. Barriers
Barriers
operate in a very different way to other thread synchronization methods. Instead of serializing access to code regions, barriers block all threads until a pre-determined number of them have accumulated. The barrier will then allow all threads to continue. Barriers are used in situations where a running application needs to be certain that all threads have completed their tasks before execution can continue.
7.3. Condvars
condvar
, or condition variable, is a POSIX thread construct that waits for a particular condition to be achieved before proceeding. In general the condition being signaled pertains to the state of data that the thread shares with another thread. For example, a condvar can be used to signal that a data entry has been put into a processing queue and a thread waiting to process data from the queue can now proceed.
7.4. Other Types of Synchronization
Chapter 8. Sockets
8.1. Socket Options
TCP_NODELAY
and TCP_CORK
.
TCP_NODELAY
TCP is the most common transport protocol, which means it is often used to solve many different needs. As new application and hardware features are developed, and kernel architecture optimizations are made, TCP has had to introduce new heuristics to handle the changes effectively.
TCP_NODELAY
is a socket option that can be used to turn this behavior off. It can be enabled through the setsockopt
sockets API, with the following function:
int one = 1; setsockopt(descriptor, SOL_TCP, TCP_NODELAY, &one, sizeof(one));
TCP_NODELAY
can also interact with other optimization heuristics to result in poor overall performance.
TCP_NODELAY
enabled.
writev
on a socket with TCP_NODELAY
enabled.
TCP_CORK
Another TCP socket option that works in a similar way is TCP_CORK
. When enabled, TCP will delay all packets until the application removes the cork, and allows the stored packets to be sent. This allows applications to build a packet in kernel space, which is useful when different libraries are being used to provide layer abstractions.
TCP_CORK
option can can be enabled by using the following function:
int one = 1; setsockopt(descriptor, SOL_TCP, TCP_CORK, &one, sizeof(one));
TCP_CORK
is often referred to as corking the socket
.
int zero = 0; setsockopt(descriptor, SOL_TCP, TCP_CORK, &zero, sizeof(zero));Once the socket is uncorked, TCP will send the accumulated logical package immediately, without waiting for further packets from the application.
Example 8.1. Using TCP_NODELAY
and TCP_CORK
TCP_NODELAY
and TCP_CORK
can have on an application.
~]$ ./tcp_nodelay_server 5001 10000
no_delay
option to enable TCP_NODELAY
socket options. Use the cork
option to enable TCP_CORK
. In all cases it will send 15 packets, each of two bytes, and wait for a response from the server.
TCP_NODELAY
nor TCP_CORK
are in use. This is a baseline measurement. TCP coalesces writes and has to wait to check if the application has more data than can optimally fit in the network packet:
~]$ ./tcp_nodelay_client localhost 5001 10000
10000 packets of 30 bytes sent in 400129.781250 ms: 0.749757 bytes/ms
TCP_NODELAY
only. TCP is instructed not to coalesce small packets, but to send buffers immediately. This improves performance significantly, but creates a large number of network packets for each logical packet:
~]$ ./tcp_nodelay_client localhost 5001 10000 no_delay
10000 packets of 30 bytes sent in 1649.771240 ms: 181.843399 bytes/ms using TCP_NODELAY
TCP_CORK
only. It halves the time required to the send the same number of logical packets. This is because TCP coalesces full logical packets in its buffers, and sends fewer overall network packets:
~]$ ./tcp_nodelay_client localhost 5001 10000 cork
10000 packets of 30 bytes sent in 850.796448 ms: 352.610779 bytes/ms using TCP_CORK
TCP_CORK
is the best technique to use. It allows the application to precisely convey the information that a packet is finished and must be sent without delay. When developing programs, if they need to send bulk data from a file, consider using TCP_CORK
with sendfile
.
Note
- sendfile(2)
- “TCP nagle sample applications”, which are example applications of both socket options, written in C. To download them, right-click and save from the following links:
Chapter 9. Shared Memory
shmem
set of calls. These calls are quite capable, but overly complicated and cumbersome for the vast majority of use cases. For this reason, they have been deprecated on the Red Hat Enterprise Linux for Real Time kernel and should no longer be used.
shm_open
and mmap
.
Note
- shm_open(3)
- shm_overview(7)
- mmap(2)
Chapter 10. Shared Libraries
ld.so
system loader. From there, they are mapped into the address space of processes that require symbols from the library. Until the first reference to a symbol is encountered it cannot be evaluated. Evaluating the symbol only when it is referenced can be a source of latency. This is because memory pages can be on disk, and caches can become invalidated. Evaluating symbols in advance is a safe side procedure that can help to improve latency. .
LD_BIND_NOW
environment variable. Setting LD_BIND_NOW
to any value other than null will cause the system loader to lookup all unresolved symbols at program load time.
Note
- ld.so(8)
Part III. Library Services
Chapter 11. Setting the Scheduler
11.1. Using chrt
to Set the Scheduler
chrt
is used to check and adjust scheduler policies and priorities. It can start new processes with the desired properties, or change the properties of a running process.
--pid
or -p
option alone to specify the process ID (PID):
~]#chrt -p 468
pid 468's current scheduling policy: SCHED_FIFO pid 468's current scheduling priority: 85 ~]#chrt -p 476
pid 476's current scheduling policy: SCHED_OTHER pid 476's current scheduling priority: 0
Table 11.1. Policy Options for the chrt
Command
Short option | Long option | Description |
---|---|---|
-f | --fifo | Set schedule to SCHED_FIFO |
-o | --other | Set schedule to SCHED_OTHER |
-r | --rr | Set schedule to SCHED_RR |
SCHED_FIFO
, with a priority of 50:
~]# chrt -f -p 50 1000
SCHED_OTHER
, with a priority of 0:
~]# chrt -o -p 0 1000
SCHED_FIFO
and a priority of 36:
~]# chrt -f 36 /bin/my-app
Note
- chrt(1)
11.2. Preemption
/proc/PID/status
, where PID is the PID of the process. The following command checks the preemption of the process with PID 1000:
~]# grep voluntary /proc/1000/status
voluntary_ctxt_switches: 194529
nonvoluntary_ctxt_switches: 195338
11.3. Using Library Calls to Set Priority
nice
getpriority
setpriority
Important
sched.h
header file. Ensure you always check the return codes from functions. The appropriate man pages outline the various codes used.
11.3.1. sched_getscheduler
sched_getscheduler()
function retrieves the scheduler policy for a given PID:
#include <sched.h> int policy; policy = sched_getscheduler(pid_t pid);
SCHED_OTHER
, SCHED_RR
and SCHED_FIFO
are also defined in sched.h
. They can be used to check the defined policy or to set the policy:
#include <stdio.h> #include <unistd.h> #include <sched.h> main(int argc, char *argv[]) { pid_t pid; int policy; if (argc < 2) pid = 0; else pid = atoi(argv[1]); printf("Scheduler Policy for PID: %d -> ", pid); policy = sched_getscheduler(pid); switch(policy) { case SCHED_OTHER: printf("SCHED_OTHER\n"); break; case SCHED_RR: printf("SCHED_RR\n"); break; case SCHED_FIFO: printf("SCHED_FIFO\n"); break; default: printf("Unknown...\n"); } }
11.3.2. sched_setscheduler
sched_setscheduler()
function. Currently, realtime policies have one parameter, sched_priority
. This parameter is used to adjust the priority of the process.
sched_setscheduler
function requires three parameters, in the form: sched_setscheduler(pid_t pid, int policy, const struct sched_param *sp);
Note
sched_setscheduler
(2) man page lists all possible return values of sched_setscheduler
, including the error codes.
pid
is zero, the sched_setscheduler()
function will act on the calling process.
SCHED_FIFO
and the priority to 50:
struct sched_param sp = { .sched_priority = 50 }; int ret; ret = sched_setscheduler(0, SCHED_FIFO, &sp); if (ret == -1) { perror("sched_setscheduler"); return 1; }
11.3.3. sched_getparam
and sched_setparam
sched_setparam()
function is used to set the scheduling parameters of a particular process. This can then be verified using the sched_getparam()
function.
sched_getscheduler()
function, which only returns the scheduling policy, the sched_getparam()
function returns all scheduling parameters for the given process.
struct sched_param sp; int ret; /* reads priority and increments it by 2 */ ret = sched_getparam(0, &sp); sp.sched_priority += 2; /* sets the new priority */ ret = sched_setparam(0, &sp);
Important
11.3.4. sched_get_priority_min
and sched_get_priority_max
sched_get_priority_min
and sched_get_priority_max
functions are used to check the valid priority range for a given scheduler policy.
-1
and errno
will be set to EINVAL
:
#include <stdio.h> #include <unistd.h> #include <sched.h> main() { printf("Valid priority range for SCHED_OTHER: %d - %d\n", sched_get_priority_min(SCHED_OTHER), sched_get_priority_max(SCHED_OTHER)); printf("Valid priority range for SCHED_FIFO: %d - %d\n", sched_get_priority_min(SCHED_FIFO), sched_get_priority_max(SCHED_FIFO)); printf("Valid priority range for SCHED_RR: %d - %d\n", sched_get_priority_min(SCHED_RR), sched_get_priority_max(SCHED_RR)); }
Note
SCHED_FIFO
and SCHED_RR
can be any number within the range of 1 to 99. POSIX is not guaranteed to honor this range, however, and portable programs should use these calls.
11.3.5. sched_rr_get_interval
SCHED_RR
policy differs slightly from the SCHED_FIFO
policy. SCHED_RR
allocates concurrent processes that have the same priority in a round-robin rotation. In this way, each process is assigned a timeslice. The sched_rr_get_interval()
function will report the timeslice that has been allocated to each process.
SCHED_RR
processes, the sched_rr_get_interval()
function is able to retrieve the timeslice length of any process on Linux.
timespec
, or the number of seconds and nanoseconds since the base time of 00:00:00 GMT, 1 January 1970:
struct timespec { time_t tv_sec; /* seconds */ long tv_nsec; /* nanoseconds */ }
sched_rr_get_interval
function requires the PID of the process, and a struct timespec:
#include <stdio.h> #include <sched.h> main() { struct timespec ts; int ret; /* real apps must check return values */ ret = sched_rr_get_interval(0, &ts); printf("Timeslice: %lu.%lu\n", ts.tv_sec, ts.tv_nsec); }
sched_03
, with varying policies and priorities. Processes with a SCHED_FIFO
policy will return a timeslice of 0 seconds and 0 nanoseconds, indicating that it is infinite:
~]$ chrt -o 0 ./sched_03
Timeslice: 0.38994072
~]$ chrt -r 10 ./sched_03
Timeslice: 0.99984800
~]$ chrt -f 10 ./sched_03
Timeslice: 0.0
Note
- nice(2)
- getpriority(2)
- setpriority(2)
Chapter 12. Creating Threads and Processes
Chapter 13. Mmap
mmap
system call allows a file (or parts of a file) to be mapped to memory. This allows the file content to be changed with a memory operation, avoiding system calls and input/output operations.
Note
- mmap(2)
- Linux System Programming by Robert Love
Chapter 14. System Calls
14.1. sched_yield
sched_yield
function was originally designed to cause a processor to select a process other than the running one. This type of request is prone to failure when issued from within a poorly-written application.
sched_yield()
function is used within processes with realtime priorities, it can display unexpected behavior. The process that has called sched_yield
gets moved to the tail of the queue of processes running at that priority. When this occurs in a situation where there are no other processes running at the same priority, the process that called sched_yield
continues running. If the priority of that process is high, it can potentially create a busy loop, rendering the machine unusable.
sched_yield
on realtime processes.
14.2. getrusage()
getrusage
function is used to retrieve important information from a given process or its threads. This will not provide all the information available, but will report on information such as context switches and page faults.
getrusage()
function is used to retrieve important information from a given process or its threads, which would otherwise need to be cataloged from several different files in the /proc/
directory and would be hard to synchronize with specific actions or events on the application. Information such as the amount of voluntary and involuntary context switches, major and minor page faults, amount of memory in use and a few other pieces of information can be obtained with the getrusage()
function.
Note
getrusage()
results are set by the kernel. Some of them are kept for compatibility reasons only.
Chapter 15. Timestamping
15.1. Hardware Clocks
/sys/devices/system/clocksource/clocksource0/available_clocksource
file:
~]# cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc hpet acpi_pm
/sys/devices/system/clocksource/clocksource0/current_clocksource
file:
~]# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
/sys/devices/system/clocksource/clocksource0/available_clocksource
file. To do so, write the name of the clock source into the /sys/devices/system/clocksource/clocksource0/current_clocksource
file. For example, the following command sets HPET as the clock source in use:
~]# echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource
Important
idle=poll
parameter forces the clock to avoid entering the idle state, and the processor.max_cstate=1
parameter prevents the clock from entering deeper C-states. Note however that in both cases there would be an increase on energy consumption, as the system would always run at top speed.
Note
15.1.1. Reading Hardware Clock Sources
Example 15.1. Comparing the Cost of Reading Hardware Clock Sources
cat
command. The time
command is used to view the duration required to read the clock source 10 million times:
~]#cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc ~]#time ./clock_timing
real 0m0.601s user 0m0.592s sys 0m0.002s
~]#echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource
~]#cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hpet ~]#time ./clock_timing
real 0m12.263s user 0m12.197s sys 0m0.001s
~]#echo acpi_pm > /sys/devices/system/clocksource/clocksource0/current_clocksource
~]#cat /sys/devices/system/clocksource/clocksource0/current_clocksource
acpi_pm ~]#time ./clock_timing
real 0m24.461s user 0m0.504s sys 0m23.776s
time(1)
man page provides detailed information on how to use the command and interpret its output. The example above uses the following categories:
real
: The total time spent beginning from program invocation until the process ends.real
includesuser
andsys
times, and will usually be larger than the sum of the latter two. If this process is interrupted by an application with higher priority, or by a system event such as a hardware interrupt (IRQ), this time spent waiting is also computed underreal
.user
: The time the process spent in user space, performing tasks that did not require kernel intervention.sys
: The time spent by the kernel while performing tasks required by the user process. These tasks include opening files, reading and writing to files or I/O ports, memory allocation, thread creation and network related activities.
15.2. POSIX Clocks
CLOCK_REALTIME
: it represents the time in the real world, also referred to as 'wall time' meaning the time as read from the clock on the wall. This clock is used to timestamp events, and when interfacing with the user. It can be modified by an user with the right privileges. However, user modification should be used with caution as it can lead to erroneous data if the clock has its value changed between two readings.CLOCK_MONOTONIC
: represents the time monotonically increased since the system boot. This clock cannot be set by any process, and is the preferred clock for calculating the time difference between events. The following examples in this section useCLOCK_MONOTONIC
as the POSIX clock.
Note
- clock_gettime()
- Linux System Programming by Robert Love
clock_gettime()
, which is defined at <time.h>
. The clock_gettime()
command takes two parameters: the POSIX clock ID and a timespec structure which will be filled with the duration used to read the clock. The following example shows the function to measure the cost of reading the clock:
Example 15.2. Using clock_gettime()
to Measure the Cost of Reading POSIX Clocks
#include <time.h> main() { int rc; long i; struct timespec ts; for(i=0; i<10000000; i++) { rc = clock_gettime(CLOCK_MONOTONIC, &ts); } }
clock_gettime()
, to verify the value of the rc
variable, or to ensure the content of the ts
structure is to be trusted. The clock_gettime()
manpage provides more information to help you write more reliable applications.
Important
clock_gettime()
function must be linked with the rt
library by adding '-lrt'
to the gcc
command line:
~]$ gcc clock_timing.c -o clock_timing -lrt
15.2.1. CLOCK_MONOTONIC_COARSE
and CLOCK_REALTIME_COARSE
clock_gettime()
and gettimeofday()
have a counterpart in the kernel, in the form of a system call. When a user process calls clock_gettime()
, the corresponding C library (glibc
) routine calls the sys_clock_gettime()
system call, which performs the requested operation and then returns the result to the user process.
CLOCK_MONOTONIC_COARSE
and CLOCK_REALTIME_COARSE
POSIX clocks was created in the form of a VDSO library function. The _COARSE
variants are faster to read and have a precision (also known as resolution) of one millisecond (ms).
15.2.2. Using clock_getres()
to Compare Clock Resolution
clock_getres()
function you can check the resolution of a given POSIX clock. clock_getres()
uses the same two parameters as clock_gettime()
: the ID of the POSIX clock to be used, and a pointer to the timespec structure where the result is returned. The following function enables you to compare the precision between CLOCK_MONOTONIC
and CLOCK_MONOTONIC_COARSE
:
main() { int rc; struct timespec res; rc = clock_getres(CLOCK_MONOTONIC, &res); if (!rc) printf("CLOCK_MONOTONIC: %ldns\n", res.tv_nsec); rc = clock_getres(CLOCK_MONOTONIC_COARSE, &res); if (!rc) printf("CLOCK_MONOTONIC_COARSE: %ldns\n", res.tv_nsec); }
Example 15.3. Sample Output of clock_getres
TSC: ~]#./clock_resolution
CLOCK_MONOTONIC: 1ns CLOCK_MONOTONIC_COARSE: 999848ns (about 1ms) HPET: ~]#./clock_resolution
CLOCK_MONOTONIC: 1ns CLOCK_MONOTONIC_COARSE: 999848ns (about 1ms) ACPI_PM: ~]#./clock_resolution
CLOCK_MONOTONIC: 1ns CLOCK_MONOTONIC_COARSE: 999848ns (about 1ms)
15.2.3. Using C Code to Compare Clock Resolution
CLOCK_MONOTONIC
POSIX clock. All nine digits in the tv_nsec
field of the timespec structure are meaningful as the clock has a nanosecond resolution. The example function, named clock_test.c
, is as follows:
#include <stdio.h> #include <stdlib.h> #include <time.h> main() { int i; struct timespec ts; for(i=0; i<5; i++) { clock_gettime(CLOCK_MONOTONIC, &ts); printf("%ld.%ld\n", ts.tv_sec, ts.tv_nsec); usleep(200); } }
Example 15.4. Sample Output of clock_test.c
and clock_test_coarse.c
~]#gcc clock_test.c -o clock_test -lrt
~]#./clock_test
218449.986980853 218449.987330908 218449.987590716 218449.987849549 218449.988108248
clock_test_coarse.c
and replacing CLOCK_MONOTONIC
with CLOCK_MONOTONIC_COARSE
, the result would look something like:
~]# ./clock_test_coarse
218550.844862154
218550.844862154
218550.844862154
218550.845862154
218550.845862154
_COARSE
clocks have a one millisecond precision, therefore only the first three digits of the tv_nsec
field of the timespec structure are significant. The result above could be read as:
~]# ./clock_test_coarse
218550.844
218550.844
218550.844
218550.845
218550.845
_COARSE
variants of the POSIX clocks are particularly useful in cases where timestamping can be performed with millisecond precision. The benefits are more evident on systems which use hardware clocks with high costs for the reading operations, such as ACPI_PM.
15.2.4. Using the time
Command to Compare Cost of Reading Clocks
time
command to read the clock source 10 million times in a row, you can compare the costs of reading CLOCK_MONOTONIC
and CLOCK_MONOTONIC_COARSE
representations of the hardware clocks available. The following example uses TSC, HPET and ACPI_PM hardware clocks. For more information on how to decipher the output of the time
command see Section 15.1.1, “Reading Hardware Clock Sources”.
Example 15.5. Comparing the Cost of Reading POSIX Clocks
TSC: ~]#time ./clock_timing_monotonic
real 0m0.567s user 0m0.559s sys 0m0.002s ~]#time ./clock_timing_monotonic_coarse
real 0m0.120s user 0m0.118s sys 0m0.001s HPET: ~]#time ./clock_timing_monotonic
real 0m12.257s user 0m12.179s sys 0m0.002s ~]#time ./clock_timing_monotonic_coarse
real 0m0.119s user 0m0.118s sys 0m0.000s ACPI_PM: ~]#time ./clock_timing_monotonic
real 0m25.524s user 0m0.451s sys 0m24.932s ~]#time ./clock_timing_monotonic_coarse
real 0m0.119s user 0m0.117s sys 0m0.001s
sys
time (the time spent by the kernel to perform tasks required by the user process) is greatly reduced when the _COARSE
clocks are used. This is particularly evident in the ACPI_PM clock timings, which indicates that _COARSE
variants of POSIX clocks yield high performance gains on clocks with high reading costs.
Chapter 16. More Information
16.1. Reporting Bugs
Before you file a bug report, follow these steps to diagnose where the problem has been introduced. This will greatly assist in rectifying the problem.
- Check that you have the latest version of the Red Hat Enterprise Linux 7 kernel, then boot into it from the GRUB menu. Try reproducing the problem with the standard kernel. If the problem still occurs, report a bug against Red Hat Enterprise Linux 7.
- If the problem does not occur when using the standard kernel, then the bug is probably the result of changes introduced in the Red Hat Enterprise Linux for Real Time specific enhancements Red Hat has applied on top of the baseline (3.10.0) kernel.
If you have determined that the bug is specific to Red Hat Enterprise Linux for Real Time follow these instructions to enter a bug report:
- Create a Bugzilla account if you do not have it yet.
- Click on Enter A New Bug Report. Log in if necessary.
- Select the
Red Hat
classification. - Select the
Red Hat Enterprise Linux 7
product. - If it is a kernel issue, enter
kernel-rt
as the component. Otherwise, enter the name of the affected user-space component. - Continue to enter the bug information by giving a detailed problem description. When entering the problem description be sure to include details of whether you were able to reproduce the problem on the standard Red Hat Enterprise Linux 7 kernel.
Appendix A. Revision History
Revision History | |||
---|---|---|---|
Revision 1-8 | Tue Sep 29 2020 | Jaroslav Klech | |
| |||
Revision 1-7 | Tue Mar 31 2020 | Jaroslav Klech | |
| |||
Revision 1-6 | Tue Aug 6 2019 | Jaroslav Klech | |
| |||
Revision 1-5 | Thu Oct 18 2018 | Jaroslav Klech | |
| |||
Revision 1-4 | Tue Mar 20 2018 | Marie Doleželová | |
| |||
Revision 1-3 | Tue Jul 25 2017 | Jana Heves | |
| |||
Revision 1-2 | Mon Nov 3 2016 | Maxim Svistunov | |
| |||
Revision 1-1 | Fri Nov 06 2015 | Tomáš Čapek | |
| |||
Revision 1-0 | Thu Feb 12 2015 | Radek Bíba | |
|