Chapter 35. Scheduling problems on the real-time kernel and solutions

Scheduling in the real-time kernel might have consequences sometimes. By using the information provided, you can understand the problems on scheduling policies, scheduler throttling, and thread starvation states on the real-time kernel, as well as potential solutions.

35.1. Scheduling policies for the real-time kernel

The real-time scheduling policies share one main characteristic: they run until a higher priority thread interrupts the thread or the threads wait, either by sleeping or performing I/O.

In the case of SCHED_RR, the operating system interrupts a running thread so that another thread of equal SCHED_RR priority can run. In either of these cases, no provision is made by the POSIX specifications that define the policies for allowing lower priority threads to get any CPU time. This characteristic of real-time threads means that it is easy to write an application, which monopolizes 100% of a given CPU. However, this causes problems for the operating system. For example, the operating system is responsible for managing both system-wide and per-CPU resources and must periodically examine data structures describing these resources and perform housekeeping activities with them. But if a core is monopolized by a SCHED_FIFO thread, it cannot perform its housekeeping tasks. Eventually the entire system becomes unstable and can potentially crash.

On the RHEL for Real Time kernel, interrupt handlers run as threads with a SCHED_FIFO priority. The default priority is 50. A cpu-hog thread with a SCHED_FIFO or SCHED_RR policy higher than the interrupt handler threads can prevent interrupt handlers from running. This causes the programs waiting for data signaled by those interrupts to starve and fail.

35.2. Scheduler throttling in the real-time kernel

The real-time kernel includes a safeguard mechanism to enable allocating bandwidth for use by the real-time tasks. The safeguard mechanism is known as real-time scheduler throttling.

The default values for the real-time throttling mechanism define that the real-time tasks can use 95% of the CPU time. The remaining 5% will be devoted to non real-time tasks, such as tasks running under SCHED_OTHER and similar scheduling policies. It is important to note that if a single real-time task occupies the 95% CPU time slot, the remaining real-time tasks on that CPU will not run. Only the non real-time tasks use the remaining 5% of CPU time. The default values can have the following performance impacts:

  • The real-time tasks have at most 95% of CPU time available for them, which can affect their performance.
  • The real-time tasks do not lock up the system by not allowing non real-time tasks to run.

The real-time scheduler throttling is controlled by the following parameters in the /proc file system:

The /proc/sys/kernel/sched_rt_period_us parameter
Defines the period in μs (microseconds), which is 100% of the CPU bandwidth. The default value is 1,000,000 μs, which is 1 second. Changes to the period’s value must be carefully considered because a period value that is either very high or low can cause problems.
The /proc/sys/kernel/sched_rt_runtime_us parameter
Defines the total bandwidth available for all real-time tasks. The default value is 950,000 μs (0.95 s), which is 95% of the CPU bandwidth. Setting the value to -1 configures the real-time tasks to use up to 100% of CPU time. This is only adequate when the real-time tasks are well engineered and have no obvious caveats, such as unbounded polling loops.

35.3. Thread starvation in the real-time kernel

Thread starvation occurs when a thread is on a CPU run queue for longer than the starvation threshold and does not make progress. A common cause of thread starvation is to run a fixed-priority polling application, such as SCHED_FIFO or SCHED_RR bound to a CPU. Since the polling application does not block for I/O, this can prevent other threads, such as kworkers, from running on that CPU.

An early attempt to reduce thread starvation is called as real-time throttling. In real-time throttling, each CPU has a portion of the execution time dedicated to non real-time tasks. The default setting for throttling is on with 95% of the CPU for real-time tasks and 5% reserved for non real-time tasks. This works if you have a single real-time task causing starvation but does not work if there are multiple real-time tasks assigned to a CPU. You can work around the problem by using:

The stalld mechanism

The stalld mechanism is an alternative for real-time throttling and avoids some of the throttling drawbacks. stalld is a daemon to periodically monitor the state of each thread in the system and looks for threads that are on the run queue for a specified length of time without being run. stalld temporarily changes that thread to use the SCHED_DEADLINE policy and allocates the thread a small slice of time on the specified CPU. The thread then runs, and when the time slice is used, the thread returns to its original scheduling policy and stalld continues to monitor thread states.

Housekeeping CPUs are CPUs that run all daemons, shell processes, kernel threads, interrupt handlers, and all work that can be dispatched from an isolated CPU. For housekeeping CPUs with real-time throttling disabled, stalld monitors the CPU that runs the main workload and assigns the CPU with the SCHED_FIFO busy loop, which helps to detect stalled threads and improve the thread priority as required with a previously defined acceptable added noise. stalld can be a preference if the real-time throttling mechanism causes an unreasonable noise in the main workload.

With stalld, you can more precisely control the noise introduced by boosting starved threads. The shell script /usr/bin/throttlectl automatically disables real-time throttling when stalld is run. You can list the current throttling values by using the /usr/bin/throttlectl show script.

Disabling real-time throttling

The following parameters in the /proc filesystem control real-time throttling:

  • The /proc/sys/kernel/sched_rt_period_us parameter specifies the number of microseconds in a period and defaults to 1 million, which is 1 second.
  • The /proc/sys/kernel/sched_rt_runtime_us parameter specifies the number of microseconds that can be used by a real-time task before throttling occurs and it defaults to 950,000 or 95% of the available CPU cycles. You can disable throttling by passing a value of -1 into the sched_rt_runtime_us file by using the echo -1 > /proc/sys/kernel/sched_rt_runtime_us command.