Avoiding RCU Stalls in the real-time kernel
Issue
RCU (read-copy update) is a kernel synchronization mechanism that increases a Linux system parallelism by enabling the concurrent access of readers and writers to a given shared data. Although RCU readers and writers are always allowed to access a shared data, writers are not allowed to free dynamically allocated data that was modified before the end of the grace-period. The end of a grace period ensures that no readers are accessing the old version of dynamically allocated shared data, allowing writers to return the memory to the system safely. Hence, a drawback of RCU is that a long wait for the end of a grace period can lead the system to run out-of-memory.
To warn that a grace-period is taking too long to occur, RCU Stalls messages are printed to the kernel log, notifying that the wait for the end of the grace period is taking more than the defined timeout. By default, the timeout is 60 seconds.
Although a RCU Stall can be a side effect of a kernel BUG, this is not the typical case for the real-time kernel users. In the vast majority of cases, real-time users face RCU stalls due to the delay of RCU callbacks execution. The RCU callbacks are responsible for performing the necessary RCU work to achieve the end of a grace period.
To provide the best on determinism, the real-time kernel is set to do all RCU callbacks on kernel threads, allowing users to configure the priority of RCU callback threads according to their preference.
NOTE: RCU threads are identified by the rcu
prefix on their names. For example, rcu_preempt
and rcuc/N
, where N
is the CPU on which the thread is allowed to run. Except by the rcuc/N
threads, all the other RCU threads can be configured to run on any CPU of the system.
The flexibility provided by the real-time kernel could also allow a non-optimal setup that causes the starvation of RCU callbacks, delaying the end of a grace period. The starvation can be either direct, when RCU threads are not able to run, or indirect when RCU threads are not able to run because threads RCU depends on are unable to run. Very often, the starvation is a side effect of a high priority real-time task that runs for more than 60 seconds on the same CPU, not allowing the RCU callbacks to run within the defined timeout.
Environment
Red Hat Enterprise Linux for Real Time (RHEL-RT)
Red Hat Enterprise MRG (MRG-RT)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.