Reference Guide
Reference guide for Red Hat Enterprise Linux for Real Time
Abstract
Preface
Part I. Hardware
Chapter 1. Processor Cores
1.1. Caches
1.2. Interconnects
Chapter 2. Memory Allocation
2.1. Demand Paging

Figure 2.1. Red Hat Enterprise Linux for Real Time Virtual Memory System
pgfault value in the /proc/vmstat file.
/proc directory. For a particular process PID, use the cat command to view the /proc/PID/stat file. The relevant entries in this file are:
Field 2- filename of the executableField 10- number of minor page faultsField 12- number of major page faults
Example 2.1. Using the /proc/PID/stat File to Check for Page Faults
/proc/PID/stat file to check for page faults in a running process.
cat command and a pipe function to return only the second, tenth, and twelfth lines of the /proc/PID/stat file:
~]# cat /proc/3366/stat | cut -d\ -f2,10,12
(bash) 5389 0bash, and it has reported 5389 minor page faults, and no major page faults.
Note
- Linux System Programming by Robert Love
2.2. Using mlock to Avoid Page I/O
mlock and mlockall system calls tell the system to lock to a specified memory range, and to not allow that memory to be paged. This means that once the physical page has been allocated to the page table entry, references to that page will always be fast.
mlock system calls available. The mlock and munlock calls lock and unlock a specific range of addresses. The mlockall and munlockall calls lock or unlock the entire program space.
mlock carefully and exercise caution. If the application is large, or if it has a large data domain, the mlock calls can cause thrashing if the system cannot allocate memory for other tasks.
Note
mlock with care. Using it excessively can lead to an out of memory (OOM) error. Do not put an mlockall call at the start of your application. It is recommended that only the data and text of the realtime portion of the application be locked.
mlock will not guarantee that the program will experience no page I/O. It is used to ensure that the data will stay in memory, but cannot ensure that it will stay in the same page. Other functions such as move_pages and memory compactors can move data around despite the use of mlock.
Important
CAP_IPC_LOCK capability in order to be able to use mlockall or mlock on large buffers. See the capabilities(7) man page for details.
mlock or mlockall, they will be unlocked by a single call to munlock for the corresponding page, or by munlockall. Thus, the application must be aware of which pages it is unlocking in order to prevent this double-lock/single-unlock problem.
- Tracking the memory areas allocated and locked, and creating a wrapper function that, before unlocking a page, verifies how many users (allocations) that page has. This is the resource counting principle used in device drivers.
- Performing allocations considering the page size and aignment, in order to prevent a double-lock in the same page.
mlock depends on the application's needs and system resources. Although there is no single solution for all the applications, the following code example can be used as a starting point for the implementation of a function that will allocate and lock memory buffers.
Example 2.2. Using mlock in an Application
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
void *
alloc_workbuf(size_t size)
{
void *ptr;
int retval;
/*
* alloc memory aligned to a page, to prevent two mlock() in the
* same page.
*/
retval = posix_memalign(&ptr, (size_t) sysconf(_SC_PAGESIZE), size);
/* return NULL on failure */
if (retval)
return NULL;
/* lock this buffer into RAM */
if (mlock(ptr, size)) {
free(ptr);
return NULL;
}
return ptr;
}
void
free_workbuf(void *ptr, size_t size)
{
/* unlock the address range */
munlock(ptr, size);
/* free the memory */
free(ptr);
}alloc_workbuf dynamically allocates a memory buffer and locks it. The memory allocation is performed by posix_memalig in order to align the memory area to a page. If the size variable is smaller then a page size, regular malloc allocation will be able to use the remainder of the page. But, to safely use this method advantage, no mlock calls can be made on regular malloc allocations. This will prevent the double-lock/single-unlock problem. The function free_workbuf will unlock and free the memory area.
mlock and mlockall, it is possible to allocate and lock a memory area using mmap with the MAP_LOCKED flag. The following example is the implementation of the aforementioned code using mmap.
Example 2.3. Using mmap in an Application
#include <sys/mman.h>
#include <stdlib.h>
void *
alloc_workbuf(size_t size)
{
void *ptr;
ptr = mmap(NULL, size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_LOCKED, -1, 0);
if (ptr == MAP_FAILED)
return NULL;
return ptr;
}
void
free_workbuf(void *ptr, size_t size)
{
munmap(ptr, size);
}mmap allocates memory on a page basis, there are no two locks in the same page, helping to prevent the double-lock/single-unlock problem. On the other hand, if the size variable is not a multiple of the page size, the rest of the page is wasted. Furthermore, a call to munlockall unlocks the memory locked by mmap.
mlockall prior to entering a time-sensitive region of the code, followed by munlockall at the end of the time-sensitive region. This can reduce paging while in the critical section. Similarly, mlock can be used on a data region that is relatively static or that will grow slowly but needs to be accessed without page I/O.
Note
- capabilities(7)
- mlock(2)
- mlock(3)
- mlockall(2)
- mmap(2)
- move_pages(2)
- posix_memalign(3)
- posix_memalign(3p)
Chapter 3. Hardware Interrupts
Example 3.1. Viewing Interrupts on Your System
cat command to view /proc/interrupts:
~]$ cat /proc/interrupts
CPU0 CPU1
0: 13072311 0 IO-APIC-edge timer
1: 18351 0 IO-APIC-edge i8042
8: 190 0 IO-APIC-edge rtc0
9: 118508 5415 IO-APIC-fasteoi acpi
12: 747529 86120 IO-APIC-edge i8042
14: 1163648 0 IO-APIC-edge ata_piix
15: 0 0 IO-APIC-edge ata_piix
16: 12681226 126932 IO-APIC-fasteoi ahci, uhci_hcd:usb2, radeon, yenta, eth0
17: 3717841 0 IO-APIC-fasteoi uhci_hcd:usb3, HDA, iwl3945
18: 0 0 IO-APIC-fasteoi uhci_hcd:usb4
19: 577 68 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb5
NMI: 0 0 Non-maskable interrupts
LOC: 3755270 9388684 Local timer interrupts
RES: 1184857 2497600 Rescheduling interrupts
CAL: 12471 2914 function call interrupts
TLB: 14555 15567 TLB shootdowns
TRM: 0 0 Thermal event interrupts
SPU: 0 0 Spurious interrupts
ERR: 0
MIS: 0
3.1. Level-Signaled Interrupts
3.2. Message-Signaled Interrupts
pci=nomsi on the kernel command line.
3.3. Non-Maskable Interrupts
3.4. System Management Interrupts
Note
hwlatdetect utility, which is available in the rt-tests package. This utility is designed to measure periods of time during which the CPU has been stolen by an SMI handling routine.
3.5. Advanced Programmable Interrupt Controller
Part II. Application Architecture
Chapter 4. Threads and Processes
- Process
- A UNIX®-style process is an operating system construct that contains:
- Address mappings for virtual memory
- An execution context (PC, stack, registers)
- State/Accounting information
Linux processes started as exactly this style of process. When the concept of more than one process running inside one address space was developed, Linux turned to a process structure that shares an address space with another process. This works well, as long as the process data structure is kept small. For the remainder of this document, the term process refers to an independent address space, potentially containing multiple threads. - Thread
- Strictly, a thread is a schedulable entity that contains:
- A program counter (PC)
- A register context
- A stack pointer
Multiple threads can exist within a process.
- Use the
forkandexecfunctions to create new processes - Use the Posix Threads (pthreads) API to create new threads within an already running process
Note
Note
- fork(2)
- exec(2)
- Programming with POSIX Threads, David R. Butenhof, Addison-Wesley, ISBN 0-201-63392-2
- Advanced Programming in the UNIX Environment, 2nd Ed., W. Richard Stevens and Stephen A. Rago, Addison-Wesley, ISBN 0-201-43307-9
- “POSIX Threads Programming”, Blaise Barney, Lawrence Livermore National Laboratory, http://www.llnl.gov/computing/tutorials/pthreads/
Chapter 5. Priorities and Policies
SCHED_OTHERorSCHED_NORMAL: The default policySCHED_BATCH: Similar toSCHED_OTHER, but with a throughput orientationSCHED_IDLE: A lower priority thanSCHED_OTHERSCHED_FIFO: A first in/first out realtime policySCHED_RR: A round-robin realtime policy
SCHED_OTHER, SCHED_FIFO, and SCHED_RR.
SCHED_OTHER or SCHED_NORMAL is the default scheduling policy for Linux threads. It has a dynamic priority that is changed by the system based on the characteristics of the thread. Another thing that effects the priority of SCHED_OTHER threads is their nice value. The nice value is a number between -20 (highest priority) and 19 (lowest priority). By default, SCHED_OTHER threads have a nice value of 0. Adjusting the nice value will change the way the thread is handled.
SCHED_FIFO policy will run ahead of SCHED_OTHER tasks. Instead of using nice values, SCHED_FIFO uses a fixed priority between 1 (lowest) and 99 (highest). A SCHED_FIFO thread with a priority of 1 will always be scheduled ahead of any SCHED_OTHER thread.
SCHED_RR policy is very similar to the SCHED_FIFO policy. In the SCHED_RR policy, threads of equal priority are scheduled in a round-robin fashion. Generally, SCHED_FIFO is preferred over SCHED_RR.
SCHED_FIFO and SCHED_RR threads will run until one of the following events occurs:
- The thread goes to sleep or begins waiting for an event
- A higher-priority realtime thread becomes ready to run
Chapter 6. Affinity
- Reserve one CPU core for all system processes and allow the application to run on the remainder of the cores.
- Allow a thread application and a given kernel thread (such as the network softirq or a driver thread) on the same CPU.
- Pair producer and consumer threads on each CPU.
Tuna tool, or through the use of shell scripts to modify the bitmask value. The taskset command can be used to change the affinity of a process, while modifying the /proc filesystem entry changes the affinity of an interrupt.
Note
6.1. Using the taskset Command to Set Processor Affinity
taskset command sets and checks affinity information for a given process. These tasks can also be achieved using the Tuna tool.
-p or --pid option and the PID of the process to be checked. The -c or --cpu-list option displays the information as a numerical list of cores, instead of as a bitmask.
~]# taskset -p -c 1000
pid 1000's current affinity list: 0,1
~]# taskset -p -c 1 1000
pid 1000's current affinity list: 0,1
pid 1000's new affinity list: 1
~]# taskset -p -c 0,1 1000
pid 1000's current affinity list: 1
pid 1000's new affinity list: 0,1
taskset command can also be used to start a new process with a particular affinity. This command will run the /bin/my-app application on CPU 4:
~]# taskset -c 4 /bin/my-app/bin/my-app application on CPU 4, with a SCHED_FIFO policy and a priority of 78:
~]# taskset -c 5 chrt -f 78 /bin/my-app6.2. Using the sched_setaffinity() System Call to Set Processor Affinity
taskset command, processor affinity can also be set using the sched_setaffinity() system call.
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <sched.h>
int main(int argc, char **argv)
{
int i, online=0;
ulong ncores = sysconf(_SC_NPROCESSORS_CONF);
cpu_set_t *setp = CPU_ALLOC(ncores);
ulong setsz = CPU_ALLOC_SIZE(ncores);
CPU_ZERO_S(setsz, setp);
if (sched_getaffinity(0, setsz, setp) == -1) {
perror("sched_getaffinity(2) failed");
exit(errno);
}
for (i=0; i < CPU_COUNT_S(setsz, setp); i++) {
if (CPU_ISSET_S(i, setsz, setp))
online++;
}
printf("%d cores configured, %d cpus allowed in affinity mask\n", ncores, online);
CPU_FREE(setp);
}Note
- sched_setaffinity(2)
Chapter 7. Thread Synchronization
7.1. Mutexes
mutex is derived from the term mutual exclusion. A mutex is a POSIX threads construct, and is created using the pthread_create_mutex library call. A mutex serializes access to each section of code, so that only one thread of an application is running the code at any one time.
futex, or Fast User muTEX, which is an internal mechanism used to implement mutexes. Futexes use shared conventions between the kernel and the C library. This allows an uncontended mutex to be locked or freed without a context switch to kernel space.
7.2. Barriers
Barriers operate in a very different way to other thread synchronization methods. Instead of serializing access to code regions, barriers block all threads until a pre-determined number of them have accumulated. The barrier will then allow all threads to continue. Barriers are used in situations where a running application needs to be certain that all threads have completed their tasks before execution can continue.
7.3. Condvars
condvar, or condition variable, is a POSIX thread construct that waits for a particular condition to be achieved before proceeding. In general the condition being signaled pertains to the state of data that the thread shares with another thread. For example, a condvar can be used to signal that a data entry has been put into a processing queue and a thread waiting to process data from the queue can now proceed.
7.4. Other Types of Synchronization
Chapter 8. Sockets
8.1. Socket Options
TCP_NODELAY and TCP_CORK.
TCP_NODELAYTCP is the most common transport protocol, which means it is often used to solve many different needs. As new application and hardware features are developed, and kernel architecture optimizations are made, TCP has had to introduce new heuristics to handle the changes effectively.
TCP_NODELAY is a socket option that can be used to turn this behavior off. It can be enabled through the setsockopt sockets API, with the following function:
int one = 1; setsockopt(descriptor, SOL_TCP, TCP_NODELAY, &one, sizeof(one));
TCP_NODELAY can also interact with other optimization heuristics to result in poor overall performance.
TCP_NODELAY enabled.
writev on a socket with TCP_NODELAY enabled.
TCP_CORK
Another TCP socket option that works in a similar way is TCP_CORK. When enabled, TCP will delay all packets until the application removes the cork, and allows the stored packets to be sent. This allows applications to build a packet in kernel space, which is useful when different libraries are being used to provide layer abstractions.
TCP_CORK option can can be enabled by using the following function:
int one = 1; setsockopt(descriptor, SOL_TCP, TCP_CORK, &one, sizeof(one));
TCP_CORK is often referred to as corking the socket.
int zero = 0; setsockopt(descriptor, SOL_TCP, TCP_CORK, &zero, sizeof(zero));Once the socket is uncorked, TCP will send the accumulated logical package immediately, without waiting for further packets from the application.
Example 8.1. Using TCP_NODELAY and TCP_CORK
TCP_NODELAY and TCP_CORK can have on an application.
~]$ ./tcp_nodelay_server 5001 10000no_delay option to enable TCP_NODELAY socket options. Use the cork option to enable TCP_CORK. In all cases it will send 15 packets, each of two bytes, and wait for a response from the server.
TCP_NODELAY nor TCP_CORK are in use. This is a baseline measurement. TCP coalesces writes and has to wait to check if the application has more data than can optimally fit in the network packet:
~]$ ./tcp_nodelay_client localhost 5001 10000
10000 packets of 30 bytes sent in 400129.781250 ms: 0.749757 bytes/ms
TCP_NODELAY only. TCP is instructed not to coalesce small packets, but to send buffers immediately. This improves performance significantly, but creates a large number of network packets for each logical packet:
~]$ ./tcp_nodelay_client localhost 5001 10000 no_delay
10000 packets of 30 bytes sent in 1649.771240 ms: 181.843399 bytes/ms using TCP_NODELAY
TCP_CORK only. It halves the time required to the send the same number of logical packets. This is because TCP coalesces full logical packets in its buffers, and sends fewer overall network packets:
~]$ ./tcp_nodelay_client localhost 5001 10000 cork
10000 packets of 30 bytes sent in 850.796448 ms: 352.610779 bytes/ms using TCP_CORK
TCP_CORK is the best technique to use. It allows the application to precisely convey the information that a packet is finished and must be sent without delay. When developing programs, if they need to send bulk data from a file, consider using TCP_CORK with sendfile.
Note
- sendfile(2)
- “TCP nagle sample applications”, which are example applications of both socket options, written in C. To download them, right-click and save from the following links:
Part III. Library Services
Chapter 11. Setting the Scheduler
11.1. Using chrt to Set the Scheduler
chrt is used to check and adjust scheduler policies and priorities. It can start new processes with the desired properties, or change the properties of a running process.
--pid or -p option alone to specify the process ID (PID):
~]#chrt -p 468pid 468's current scheduling policy: SCHED_FIFO pid 468's current scheduling priority: 85 ~]#chrt -p 476pid 476's current scheduling policy: SCHED_OTHER pid 476's current scheduling priority: 0
Table 11.1. Policy Options for the chrt Command
| Short option | Long option | Description |
|---|---|---|
-f | --fifo | Set schedule to SCHED_FIFO |
-o | --other | Set schedule to SCHED_OTHER |
-r | --rr | Set schedule to SCHED_RR |
SCHED_FIFO, with a priority of 50:
~]# chrt -f -p 50 1000SCHED_OTHER, with a priority of 0:
~]# chrt -o -p 0 1000SCHED_FIFO and a priority of 36:
~]# chrt -f 36 /bin/my-appNote
- chrt(1)
11.2. Preemption
/proc/PID/status, where PID is the PID of the process. The following command checks the preemption of the process with PID 1000:
~]# grep voluntary /proc/1000/status
voluntary_ctxt_switches: 194529
nonvoluntary_ctxt_switches: 195338
11.3. Using Library Calls to Set Priority
nicegetprioritysetpriority
Important
sched.h header file. Ensure you always check the return codes from functions. The appropriate man pages outline the various codes used.
11.3.1. sched_getscheduler
sched_getscheduler() function retrieves the scheduler policy for a given PID:
#include <sched.h> int policy; policy = sched_getscheduler(pid_t pid);
SCHED_OTHER, SCHED_RR and SCHED_FIFO are also defined in sched.h. They can be used to check the defined policy or to set the policy:
#include <stdio.h>
#include <unistd.h>
#include <sched.h>
main(int argc, char *argv[])
{
pid_t pid;
int policy;
if (argc < 2)
pid = 0;
else
pid = atoi(argv[1]);
printf("Scheduler Policy for PID: %d -> ", pid);
policy = sched_getscheduler(pid);
switch(policy) {
case SCHED_OTHER: printf("SCHED_OTHER\n"); break;
case SCHED_RR: printf("SCHED_RR\n"); break;
case SCHED_FIFO: printf("SCHED_FIFO\n"); break;
default: printf("Unknown...\n");
}
}11.3.2. sched_setscheduler
sched_setscheduler() function. Currently, realtime policies have one parameter, sched_priority. This parameter is used to adjust the priority of the process.
sched_setscheduler function requires three parameters, in the form: sched_setscheduler(pid_t pid, int policy, const struct sched_param *sp);
Note
sched_setscheduler(2) man page lists all possible return values of sched_setscheduler, including the error codes.
pid is zero, the sched_setscheduler() function will act on the calling process.
SCHED_FIFO and the priority to 50:
struct sched_param sp = { .sched_priority = 50 };
int ret;
ret = sched_setscheduler(0, SCHED_FIFO, &sp);
if (ret == -1) {
perror("sched_setscheduler");
return 1;
}
11.3.3. sched_getparam and sched_setparam
sched_setparam() function is used to set the scheduling parameters of a particular process. This can then be verified using the sched_getparam() function.
sched_getscheduler() function, which only returns the scheduling policy, the sched_getparam() function returns all scheduling parameters for the given process.
struct sched_param sp; int ret; /* reads priority and increments it by 2 */ ret = sched_getparam(0, &sp); sp.sched_priority += 2; /* sets the new priority */ ret = sched_setparam(0, &sp);
Important
11.3.4. sched_get_priority_min and sched_get_priority_max
sched_get_priority_min and sched_get_priority_max functions are used to check the valid priority range for a given scheduler policy.
-1 and errno will be set to EINVAL:
#include <stdio.h>
#include <unistd.h>
#include <sched.h>
main()
{
printf("Valid priority range for SCHED_OTHER: %d - %d\n",
sched_get_priority_min(SCHED_OTHER),
sched_get_priority_max(SCHED_OTHER));
printf("Valid priority range for SCHED_FIFO: %d - %d\n",
sched_get_priority_min(SCHED_FIFO),
sched_get_priority_max(SCHED_FIFO));
printf("Valid priority range for SCHED_RR: %d - %d\n",
sched_get_priority_min(SCHED_RR),
sched_get_priority_max(SCHED_RR));
}Note
SCHED_FIFO and SCHED_RR can be any number within the range of 1 to 99. POSIX is not guaranteed to honor this range, however, and portable programs should use these calls.
11.3.5. sched_rr_get_interval
SCHED_RR policy differs slightly from the SCHED_FIFO policy. SCHED_RR allocates concurrent processes that have the same priority in a round-robin rotation. In this way, each process is assigned a timeslice. The sched_rr_get_interval() function will report the timeslice that has been allocated to each process.
SCHED_RR processes, the sched_rr_get_interval() function is able to retrieve the timeslice length of any process on Linux.
timespec, or the number of seconds and nanoseconds since the base time of 00:00:00 GMT, 1 January 1970:
struct timespec {
time_t tv_sec; /* seconds */
long tv_nsec; /* nanoseconds */
}sched_rr_get_interval function requires the PID of the process, and a struct timespec:
#include <stdio.h>
#include <sched.h>
main()
{
struct timespec ts;
int ret;
/* real apps must check return values */
ret = sched_rr_get_interval(0, &ts);
printf("Timeslice: %lu.%lu\n", ts.tv_sec, ts.tv_nsec);
}sched_03, with varying policies and priorities. Processes with a SCHED_FIFO policy will return a timeslice of 0 seconds and 0 nanoseconds, indicating that it is infinite:
~]$ chrt -o 0 ./sched_03
Timeslice: 0.38994072
~]$ chrt -r 10 ./sched_03
Timeslice: 0.99984800
~]$ chrt -f 10 ./sched_03
Timeslice: 0.0
Note
- nice(2)
- getpriority(2)
- setpriority(2)
Chapter 12. Creating Threads and Processes
Chapter 13. Mmap
mmap system call allows a file (or parts of a file) to be mapped to memory. This allows the file content to be changed with a memory operation, avoiding system calls and input/output operations.
Note
- mmap(2)
- Linux System Programming by Robert Love
Chapter 14. System Calls
14.1. sched_yield
sched_yield function was originally designed to cause a processor to select a process other than the running one. This type of request is prone to failure when issued from within a poorly-written application.
sched_yield() function is used within processes with realtime priorities, it can display unexpected behavior. The process that has called sched_yield gets moved to the tail of the queue of processes running at that priority. When this occurs in a situation where there are no other processes running at the same priority, the process that called sched_yield continues running. If the priority of that process is high, it can potentially create a busy loop, rendering the machine unusable.
sched_yield on realtime processes.
14.2. getrusage()
getrusage function is used to retrieve important information from a given process or its threads. This will not provide all the information available, but will report on information such as context switches and page faults.
getrusage() function is used to retrieve important information from a given process or its threads, which would otherwise need to be cataloged from several different files in the /proc/ directory and would be hard to synchronize with specific actions or events on the application. Information such as the amount of voluntary and involuntary context switches, major and minor page faults, amount of memory in use and a few other pieces of information can be obtained with the getrusage() function.
Note
getrusage() results are set by the kernel. Some of them are kept for compatibility reasons only.
Chapter 15. Timestamping
15.1. Hardware Clocks
/sys/devices/system/clocksource/clocksource0/available_clocksource file:
~]# cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc hpet acpi_pm
/sys/devices/system/clocksource/clocksource0/current_clocksource file:
~]# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
/sys/devices/system/clocksource/clocksource0/available_clocksource file. To do so, write the name of the clock source into the /sys/devices/system/clocksource/clocksource0/current_clocksource file. For example, the following command sets HPET as the clock source in use:
~]# echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksourceImportant
idle=poll parameter forces the clock to avoid entering the idle state, and the processor.max_cstate=1 parameter prevents the clock from entering deeper C-states. Note however that in both cases there would be an increase on energy consumption, as the system would always run at top speed.
Note
15.1.1. Reading Hardware Clock Sources
Example 15.1. Comparing the Cost of Reading Hardware Clock Sources
cat command. The time command is used to view the duration required to read the clock source 10 million times:
~]#cat /sys/devices/system/clocksource/clocksource0/current_clocksourcetsc ~]#time ./clock_timingreal 0m0.601s user 0m0.592s sys 0m0.002s
~]#echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource~]#cat /sys/devices/system/clocksource/clocksource0/current_clocksourcehpet ~]#time ./clock_timingreal 0m12.263s user 0m12.197s sys 0m0.001s
~]#echo acpi_pm > /sys/devices/system/clocksource/clocksource0/current_clocksource~]#cat /sys/devices/system/clocksource/clocksource0/current_clocksourceacpi_pm ~]#time ./clock_timingreal 0m24.461s user 0m0.504s sys 0m23.776s
time(1) man page provides detailed information on how to use the command and interpret its output. The example above uses the following categories:
real: The total time spent beginning from program invocation until the process ends.realincludesuserandsystimes, and will usually be larger than the sum of the latter two. If this process is interrupted by an application with higher priority, or by a system event such as a hardware interrupt (IRQ), this time spent waiting is also computed underreal.user: The time the process spent in user space, performing tasks that did not require kernel intervention.sys: The time spent by the kernel while performing tasks required by the user process. These tasks include opening files, reading and writing to files or I/O ports, memory allocation, thread creation and network related activities.
15.2. POSIX Clocks
CLOCK_REALTIME: it represents the time in the real world, also referred to as 'wall time' meaning the time as read from the clock on the wall. This clock is used to timestamp events, and when interfacing with the user. It can be modified by an user with the right privileges. However, user modification should be used with caution as it can lead to erroneous data if the clock has its value changed between two readings.CLOCK_MONOTONIC: represents the time monotonically increased since the system boot. This clock cannot be set by any process, and is the preferred clock for calculating the time difference between events. The following examples in this section useCLOCK_MONOTONICas the POSIX clock.
Note
- clock_gettime()
- Linux System Programming by Robert Love
clock_gettime(), which is defined at <time.h>. The clock_gettime() command takes two parameters: the POSIX clock ID and a timespec structure which will be filled with the duration used to read the clock. The following example shows the function to measure the cost of reading the clock:
Example 15.2. Using clock_gettime() to Measure the Cost of Reading POSIX Clocks
#include <time.h>
main()
{
int rc;
long i;
struct timespec ts;
for(i=0; i<10000000; i++) {
rc = clock_gettime(CLOCK_MONOTONIC, &ts);
}
}
clock_gettime(), to verify the value of the rc variable, or to ensure the content of the ts structure is to be trusted. The clock_gettime() manpage provides more information to help you write more reliable applications.
Important
clock_gettime() function must be linked with the rt library by adding '-lrt' to the gcc command line:
~]$ gcc clock_timing.c -o clock_timing -lrt
15.2.1. CLOCK_MONOTONIC_COARSE and CLOCK_REALTIME_COARSE
clock_gettime() and gettimeofday() have a counterpart in the kernel, in the form of a system call. When a user process calls clock_gettime(), the corresponding C library (glibc) routine calls the sys_clock_gettime() system call, which performs the requested operation and then returns the result to the user process.
CLOCK_MONOTONIC_COARSE and CLOCK_REALTIME_COARSE POSIX clocks was created in the form of a VDSO library function. The _COARSE variants are faster to read and have a precision (also known as resolution) of one millisecond (ms).
15.2.2. Using clock_getres() to Compare Clock Resolution
clock_getres() function you can check the resolution of a given POSIX clock. clock_getres() uses the same two parameters as clock_gettime(): the ID of the POSIX clock to be used, and a pointer to the timespec structure where the result is returned. The following function enables you to compare the precision between CLOCK_MONOTONIC and CLOCK_MONOTONIC_COARSE:
main()
{
int rc;
struct timespec res;
rc = clock_getres(CLOCK_MONOTONIC, &res);
if (!rc)
printf("CLOCK_MONOTONIC: %ldns\n", res.tv_nsec);
rc = clock_getres(CLOCK_MONOTONIC_COARSE, &res);
if (!rc)
printf("CLOCK_MONOTONIC_COARSE: %ldns\n", res.tv_nsec);
}
Example 15.3. Sample Output of clock_getres
TSC: ~]#./clock_resolutionCLOCK_MONOTONIC: 1ns CLOCK_MONOTONIC_COARSE: 999848ns (about 1ms) HPET: ~]#./clock_resolutionCLOCK_MONOTONIC: 1ns CLOCK_MONOTONIC_COARSE: 999848ns (about 1ms) ACPI_PM: ~]#./clock_resolutionCLOCK_MONOTONIC: 1ns CLOCK_MONOTONIC_COARSE: 999848ns (about 1ms)
15.2.3. Using C Code to Compare Clock Resolution
CLOCK_MONOTONIC POSIX clock. All nine digits in the tv_nsec field of the timespec structure are meaningful as the clock has a nanosecond resolution. The example function, named clock_test.c, is as follows:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
main()
{
int i;
struct timespec ts;
for(i=0; i<5; i++) {
clock_gettime(CLOCK_MONOTONIC, &ts);
printf("%ld.%ld\n", ts.tv_sec, ts.tv_nsec);
usleep(200);
}
}
Example 15.4. Sample Output of clock_test.c and clock_test_coarse.c
~]#gcc clock_test.c -o clock_test -lrt~]#./clock_test218449.986980853 218449.987330908 218449.987590716 218449.987849549 218449.988108248
clock_test_coarse.c and replacing CLOCK_MONOTONIC with CLOCK_MONOTONIC_COARSE, the result would look something like:
~]# ./clock_test_coarse
218550.844862154
218550.844862154
218550.844862154
218550.845862154
218550.845862154
_COARSE clocks have a one millisecond precision, therefore only the first three digits of the tv_nsec field of the timespec structure are significant. The result above could be read as:
~]# ./clock_test_coarse
218550.844
218550.844
218550.844
218550.845
218550.845
_COARSE variants of the POSIX clocks are particularly useful in cases where timestamping can be performed with millisecond precision. The benefits are more evident on systems which use hardware clocks with high costs for the reading operations, such as ACPI_PM.
15.2.4. Using the time Command to Compare Cost of Reading Clocks
time command to read the clock source 10 million times in a row, you can compare the costs of reading CLOCK_MONOTONIC and CLOCK_MONOTONIC_COARSE representations of the hardware clocks available. The following example uses TSC, HPET and ACPI_PM hardware clocks. For more information on how to decipher the output of the time command see Section 15.1.1, “Reading Hardware Clock Sources”.
Example 15.5. Comparing the Cost of Reading POSIX Clocks
TSC: ~]#time ./clock_timing_monotonicreal 0m0.567s user 0m0.559s sys 0m0.002s ~]#time ./clock_timing_monotonic_coarsereal 0m0.120s user 0m0.118s sys 0m0.001s HPET: ~]#time ./clock_timing_monotonicreal 0m12.257s user 0m12.179s sys 0m0.002s ~]#time ./clock_timing_monotonic_coarsereal 0m0.119s user 0m0.118s sys 0m0.000s ACPI_PM: ~]#time ./clock_timing_monotonicreal 0m25.524s user 0m0.451s sys 0m24.932s ~]#time ./clock_timing_monotonic_coarsereal 0m0.119s user 0m0.117s sys 0m0.001s
sys time (the time spent by the kernel to perform tasks required by the user process) is greatly reduced when the _COARSE clocks are used. This is particularly evident in the ACPI_PM clock timings, which indicates that _COARSE variants of POSIX clocks yield high performance gains on clocks with high reading costs.
Chapter 16. More Information
16.1. Reporting Bugs
Before you file a bug report, follow these steps to diagnose where the problem has been introduced. This will greatly assist in rectifying the problem.
- Check that you have the latest version of the Red Hat Enterprise Linux 7 kernel, then boot into it from the GRUB menu. Try reproducing the problem with the standard kernel. If the problem still occurs, report a bug against Red Hat Enterprise Linux 7.
- If the problem does not occur when using the standard kernel, then the bug is probably the result of changes introduced in the Red Hat Enterprise Linux for Real Time specific enhancements Red Hat has applied on top of the baseline (3.10.0) kernel.
If you have determined that the bug is specific to Red Hat Enterprise Linux for Real Time follow these instructions to enter a bug report:
- Create a Bugzilla account if you do not have it yet.
- Click on Enter A New Bug Report. Log in if necessary.
- Select the
Red Hatclassification. - Select the
Red Hat Enterprise Linux 7product. - If it is a kernel issue, enter
kernel-rtas the component. Otherwise, enter the name of the affected user-space component. - Continue to enter the bug information by giving a detailed problem description. When entering the problem description be sure to include details of whether you were able to reproduce the problem on the standard Red Hat Enterprise Linux 7 kernel.
Appendix A. Revision History
| Revision History | |||
|---|---|---|---|
| Revision 1-4 | Thu Oct 18 2018 | ||
| |||
| Revision 1-3 | Tue Jul 25 2017 | ||
| |||
| Revision 1-2 | Mon Nov 3 2016 | ||
| |||
| Revision 1-1 | Fri Nov 06 2015 | ||
| |||
| Revision 1-0 | Thu Feb 12 2015 | ||
| |||
