3.12. Real Time Throttling

Real Time Scheduling Issues

The two real-time scheduling policies in Red Hat Enterprise Linux for Real Time share one main characteristic: they run until they are preempted by a higher priority thread or until they "wait", either by sleeping or performing I/O. In the case of SCHED_RR, a thread may be preempted by the operating system so that another thread of equal SCHED_RR priority may run. In any of these cases, no provision is made by the POSIX specifications that define the policies for allowing lower priority threads to get any CPU time.

This characteristic of real-time threads means that it is quite easy to write an application which monopolizes 100% of a given CPU. At first glance this sounds like it might be a good idea, but in reality it causes lots of headaches for the operating system. The OS is responsible for managing both system-wide and per-CPU resources and must periodically examine data structures describing these resources and perform housekeeping activities with them. If a core is monopolized by a SCHED_FIFO thread, it cannot perform the housekeeping tasks and eventually the entire system becomes unstable, potentially causing a crash.
On the Red Hat Enterprise Linux for Real Time kernel, interrupt handlers run as threads with a SCHED_FIFO priority (default: 50). A cpu-hog thread with a SCHED_FIFO or SCHED_RR policy higher than the interrupt handler threads can prevent interrupt handlers from running and cause programs waiting for data signaled by those interrupts to be starved and fail.
Real Time Scheduler Throttling

Red Hat Enterprise Linux for Real Time comes with a safeguard mechanism that allows the system administrator to allocate bandwith for use by real-time tasks. This safeguard mechanism is known as real-time scheduler throttling and is controlled by two parameters in the /proc file system:

/proc/sys/kernel/sched_rt_period_us
Defines the period in μs (microseconds) to be considered as 100% of CPU bandwidth. The default value is 1,000,000 μs (1 second). Changes to the value of the period must be very well thought out as a period too long or too small are equally dangerous.
/proc/sys/kernel/sched_rt_runtime_us
The total bandwidth available to all real-time tasks. The default values is 950,000 μs (0.95 s) or, in other words, 95% of the CPU bandwidth. Setting the value to -1 means that real-time tasks may use up to 100% of CPU times. This is only adequate when the real-time tasks are well engineered and have no obvious caveats such as unbounded polling loops.
The default values for the Real-time throttling mechanism define that 95% of the CPU time can be used by real-time tasks. The remaining 5% will be devoted to non-realtime tasks (tasks running under SCHED_OTHER and similar scheduling policies). It is important to note that if a single real-time task occupies that 95% CPU time slot, the remaining real-time tasks on that CPU will not run. The remaining 5% of CPU time is used only by non-realtime tasks.
The impact of the default values is two-fold: rogue real-time tasks will not lock up the system by not allowing non-realtime tasks to run and, on the other hand, real-time tasks will have at most 95% of CPU time available from them, probably affecting their performance.

the RT_RUNTIME_GREED feature

Although the Real Time throttling mechanism works for the purpose of avoiding real-time tasks that can cause the system hang, an advanced user may want to allow the real-time task to continue running in the absence of non-realtime tasks starving, that is, avoiding the system going idle.
When enabled, this feature checks if non-realtime tasks are starving before throttling the real-time task. If the real-time task becomes throttled, it will be unthrottled as soon as the system goes idle, or when the next period starts, whichever comes first.
Enable RT_RUNTIME_GREED with the following command:
# echo RT_RUNTIME_GREED > /sys/kernel/debug/sched_features
To keep all CPUs with the same rt_runtime, disable the NO_RT_RUNTIME_SHARE logic:
# echo NO_RT_RUNTIME_SHARE > /sys/kernel/debug/sched_features
With these two options set, the user will guarantee some runtime for non-rt-tasks on all CPUs, while keeping real-time tasks running as much as possible.
References

From the kernel documentation, which is available in the kernel-rt-doc package:

  • /usr/share/doc/kernel-rt-doc-3.10.0/Documentation/scheduler/sched-rt-group.txt