Chapter 1. Before You Start Tuning Your Red Hat Enterprise Linux for Real Time System
Red Hat Enterprise Linux for Real Time is designed to be used on well-tuned systems for applications with extremely high determinism requirements. Kernel system tuning offers the vast majority of the improvement in determinism. For example, in many workloads thorough system tuning improves consistency of results by around 90%. This is why we typically recommend that customers first perform the Chapter 2, General System Tuning of standard Red Hat Enterprise Linux before using Red Hat Enterprise Linux for Real Time.
Things to Remember While You Are Tuning Your Red Hat Enterprise Linux for Real Time Kernel
- Be PatientReal-time tuning is an iterative process; you will almost never be able to tweak a few variables and know that the change is the best that can be achieved. Be prepared to spend days or weeks narrowing down the set of tunings that work best for your system.Additionally, always make long test runs. Changing some tuning parameters then doing a five minute test run is not a good validation of a set of tunes. Make the length of your test runs adjustable and run them for longer than a few minutes. Try to narrow down to a few different tuning sets with test runs of a few hours, then run those sets for many hours or days at a time, to try and catch corner-cases of max latencies or resource exhaustion.
- Be AccurateBuild a measurement mechanism into your application, so that you can accurately gauge how a particular set of tuning changes affect the application's performance. Anecdotal evidence (for example, "The mouse moves more smoothly") is usually wrong and varies from person to person. Do hard measurements and record them for later analysis.
- Be MethodicalIt is very tempting to make multiple changes to tuning variables between test runs, but doing so means that you do not have a way to narrow down which tune affected your test results. Keep the tuning changes between test runs as small as you can.
- Be ConservativeIt is also tempting to make large changes when tuning, but it is almost always better to make incremental changes. You will find that working your way up from the lowest to highest priority values will yield better results in the long run.
- Be SmartUse the tools you have available. The Tuna graphical tuning tool makes it easy to change processor affinities for threads and interrupts, thread priorities and to isolate processors for application use. The
chrtcommand line utilities allow you to do most of what Tuna does. If you run into performance problems, the
perftools can help locate latency issues.
- Be FlexibleRather than hard-coding values into your application, use external tools to change policy, priority and affinity. This allows you to try many different combinations and simplifies your logic. Once you have found some settings that give good results, you can either add them to your application, or set up some startup logic to implement the settings when the application starts.
Linux uses three main scheduling policies:
- This is the default thread policy and has dynamic priority controlled by the kernel. The priority is changed based on thread activity. Threads with this policy are considered to have a real-time priority of 0 (zero).
SCHED_FIFO(First in, first out)
- A real-time policy with a priority range of from 1 - 99, with 1 being the lowest and 99 the highest.
SCHED_FIFOthreads always have a higher priority than
SCHED_OTHERthreads (for example, a
SCHED_FIFOthread with a priority of
1will have a higher priority than any
SCHED_OTHERthread). Any thread created as a
SCHED_FIFOthread has a fixed priority and will run until it is blocked or preempted by a higher priority thread.
SCHED_RRis a modification of
SCHED_FIFO. Threads with the same priority have a quantum and are round-robin scheduled among all equal priority
SCHED_RRthreads. This policy is rarely used.
1.1. Running Latency Tests and Interpreting Their Results
To verify that the potential hardware platform is suitable for real-time operations, you should run some latency and performance tests with the Real Time kernel. These tests can highlight BIOS or system tuning (including partitioning) issues that might be experienced under a load.
1.1.1. Preliminary Steps
Procedure 1.1. To successfully test your system and interpret the results:
- Check the vendor documentation for any tuning steps required for low latency operation.This step aims to reduce or remove any System Management Interrupts (SMIs) that would transition the system into System Management Mode (SMM). While a system is in SMM it is running firmware and not running operating system code, meaning any timers that expire while in SMM will have to wait until the system transitions back into normal operation. This can cause unexplained latencies since SMIs cannot be blocked by Linux and the only indication that we actually took an SMI may be found in vendor-specific performance counter registers.
WarningRed Hat strongly recommends that you do not completely disable SMIs, as it can result in catastrophic hardware failure.
- Ensure that RHEL-RT and
rt-testspackage is installed.This step verifies that you have tuned the system properly.
- Run the
hwlatdetectlooks for hardware-firmware induced latencies by polling the clock-source and looking for unexplained gaps.Generally, you do not need to run any sort of load on the system while running
hwlatdetect, since the program is looking for latencies introduced by hardware architecture or BIOS/EFI firmware.A typical output of
hwlatdetectlooks like this:
#hwlatdetect --duration=60s hwlatdetect: test duration 60 seconds detector: tracer parameters: Latency threshold: 10us Sample window: 1000000us Sample width: 500000us Non-sampling period: 500000us Output File: None Starting test test finished Max Latency: Below threshold Samples recorded: 0 Samples exceeding threshold: 0The above result represents a system that was tuned to minimize system interruptions from firmware.However, not all systems can be tuned to minimize system interruptions as shown below:
#hwlatdetect --duration=10s hwlatdetect: test duration 10 seconds detector: tracer parameters: Latency threshold: 10us Sample window: 1000000us Sample width: 500000us Non-sampling period: 500000us Output File: None Starting test test finished Max Latency: 18us Samples recorded: 10 Samples exceeding threshold: 10 SMIs during run: 0 ts: 1519674281.220664736, inner:17, outer:15 ts: 1519674282.721666674, inner:18, outer:17 ts: 1519674283.722667966, inner:16, outer:17 ts: 1519674284.723669259, inner:17, outer:18 ts: 1519674285.724670551, inner:16, outer:17 ts: 1519674286.725671843, inner:17, outer:17 ts: 1519674287.726673136, inner:17, outer:16 ts: 1519674288.727674428, inner:16, outer:18 ts: 1519674289.728675721, inner:17, outer:17 ts: 1519674290.729677013, inner:18, outer:17The above result shows that while doing consecutive reads of the system
clocksource, there were 10 delays that showed up in the 15-18 us range.
hwlatdetectwas using the
tracermechanism as the
detectorfor unexplained latencies. Previous versions used a kernel module rather than
parametersreport a latency and how the detection was run. The default latency threshold was 10 microseconds (10 us), the sample window was 1 second, the sampling window was 0.5 seconds.As a result,
detectorthread that ran for one half of each second of the specified duration.The
detectorthread runs a loop which does the following pseudocode:
t1 = timestamp() loop: t0 = timestamp() if (t0 - t1) > threshold outer = (t0 - t1) t1 = timestamp if (t1 - t0) > threshold inner = (t1 - t0) if inner or outer: print if t1 > duration: goto out goto loop out:The inner loop comparison checks that
t0 - t1does not exceed the specified threshold (10 us default). The outer loop comparison checks the time between the bottom of the loop and the top
t1 - t0. The time between consecutive reads of the timestamp register should be dozens of nanoseconds (essentially a register read, a comparison and a conditional jump) so any other delay between consecutive reads is introduced by firmware or by the way the system components were connected.
NoteThe values printed out by the
outerare the best case maximum latency. The latency values are the deltas between consecutive reads of the current system
Time Stamp Counteror
TSCregister, but potentially the
ACPIpower management clock) and any delays between consecutive reads, introduced by the hardware-firmware combination.
After finding the suitable hardware-firmware combination, the next step is to test the real-time performance of the system while under a load.
1.1.2. Testing the System Real-time Performance under Load
RHEL-RT provides the
rtevalutility to test the system real-time performance under load.
rtevalstarts a heavy system load of
SCHED_OTHERtasks and then measures real-time response on each online CPU. The loads are a parallel
makeof the Linux kernel tree in a loop and the
The goal is to bring the system into a state, where each core always has a job to schedule. The jobs perform various tasks, such as memory allocation/free, disk I/O, computational tasks, memory copies, and other.
Once the loads have started up,
rtevalthen starts the
cyclictestmeasurement program. This program starts the
SCHED_FIFOreal-time thread on each online core and then measures real-time scheduling response time. Each measurement thread takes a timestamp, sleeps for an interval, then takes another timestamp after waking up. The latency measured is
t1 - (t0 + i), which is the difference between the actual wakeup time
t1, and the theoretical wakeup time of the first timestamp
t0plus the sleep interval
The details for the
rtevalrun are written to the
XMLfile along with the boot log for the system. Then the
rteval-<date>-N.tar.bz2file is generated.
Nis a counter for the Nth run on
<date>. A report, generated from the
XMLfile, similar to the below, will be printed to the screen:
System: Statistics: Samples: 1440463955 Mean: 4.40624790712us Median: 0.0us Mode: 4us Range: 54us Min: 2us Max: 56us Mean Absolute Dev: 1.0776661507us Std.dev: 1.81821060672us CPU core 0 Priority: 95 Statistics: Samples: 36011847 Mean: 5.46434910711us Median: 4us Mode: 4us Range: 38us Min: 2us Max: 40us Mean Absolute Dev: 2.13785341159us Std.dev: 3.50155558554us
The report above brings details on the hardware, length of the run, options used, and the timing results, both per-cpu and system-wide. You can regenerate the report by running the
#rteval --summarize rteval-<date>-n.tar.bz2