Red Hat Training

A Red Hat training course is available for RHEL 8

Chapter 29. Tuning scheduling policy

In Red Hat Enterprise Linux, the smallest unit of process execution is called a thread. The system scheduler determines which processor runs a thread, and for how long the thread runs. However, because the scheduler’s primary concern is to keep the system busy, it may not schedule threads optimally for application performance.

For example, say an application on a NUMA system is running on Node A when a processor on Node B becomes available. To keep the processor on Node B busy, the scheduler moves one of the application’s threads to Node B. However, the application thread still requires access to memory on Node A. But, this memory will take longer to access because the thread is now running on Node B and Node A memory is no longer local to the thread. Thus, it may take longer for the thread to finish running on Node B than it would have taken to wait for a processor on Node A to become available, and then to execute the thread on the original node with local memory access.

29.1. Categories of scheduling policies

Performance sensitive applications often benefit from the designer or administrator determining where threads are run. The Linux scheduler implements a number of scheduling policies which determine where and for how long a thread runs.

The following are the two major categories of scheduling policies:

Normal policies
Normal threads are used for tasks of normal priority.
Realtime policies

Realtime policies are used for time-sensitive tasks that must complete without interruptions. Realtime threads are not subject to time slicing. This means the thread runs until they block, exit, voluntarily yield, or are preempted by a higher priority thread.

The lowest priority realtime thread is scheduled before any thread with a normal policy. For more information, see Static priority scheduling with SCHED_FIFO and Round robin priority scheduling with SCHED_RR.

Additional resources

  • sched(7), sched_setaffinity(2), sched_getaffinity(2), sched_setscheduler(2), and sched_getscheduler(2) man pages

29.2. Static priority scheduling with SCHED_FIFO

The SCHED_FIFO, also called static priority scheduling, is a realtime policy that defines a fixed priority for each thread. This policy allows administrators to improve event response time and reduce latency. It is recommended to not execute this policy for an extended period of time for time sensitive tasks.

When SCHED_FIFO is in use, the scheduler scans the list of all the SCHED_FIFO threads in order of priority and schedules the highest priority thread that is ready to run. The priority level of a SCHED_FIFO thread can be any integer from 1 to 99, where 99 is treated as the highest priority. Red Hat recommends starting with a lower number and increasing priority only when you identify latency issues.

Warning

Because realtime threads are not subject to time slicing, Red Hat does not recommend setting a priority as 99. This keeps your process at the same priority level as migration and watchdog threads; if your thread goes into a computational loop and these threads are blocked, they will not be able to run. Systems with a single processor will eventually hang in this situation.

Administrators can limit SCHED_FIFO bandwidth to prevent realtime application programmers from initiating realtime tasks that monopolize the processor.

The following are some of the parameters used in this policy:

/proc/sys/kernel/sched_rt_period_us
This parameter defines the time period, in microseconds, that is considered to be one hundred percent of the processor bandwidth. The default value is 1000000 μs, or 1 second.
/proc/sys/kernel/sched_rt_runtime_us
This parameter defines the time period, in microseconds, that is devoted to running real-time threads. The default value is 950000 μs, or 0.95 seconds.

29.3. Round robin priority scheduling with SCHED_RR

The SCHED_RR is a round-robin variant of the SCHED_FIFO. This policy is useful when multiple threads need to run at the same priority level.

Like SCHED_FIFO, SCHED_RR is a realtime policy that defines a fixed priority for each thread. The scheduler scans the list of all SCHED_RR threads in order of priority and schedules the highest priority thread that is ready to run. However, unlike SCHED_FIFO, threads that have the same priority are scheduled in a round-robin style within a certain time slice.

You can set the value of this time slice in milliseconds with the sched_rr_timeslice_ms kernel parameter in the /proc/sys/kernel/sched_rr_timeslice_ms file. The lowest value is 1 millisecond.

29.4. Normal scheduling with SCHED_OTHER

The SCHED_OTHER is the default scheduling policy in Red Hat Enterprise Linux 8. This policy uses the Completely Fair Scheduler (CFS) to allow fair processor access to all threads scheduled with this policy. This policy is most useful when there are a large number of threads or when data throughput is a priority, as it allows more efficient scheduling of threads over time.

When this policy is in use, the scheduler creates a dynamic priority list based partly on the niceness value of each process thread. Administrators can change the niceness value of a process, but cannot change the scheduler’s dynamic priority list directly.

29.5. Setting scheduler policies

Check and adjust scheduler policies and priorities by using the chrt command line tool. It can start new processes with the desired properties, or change the properties of a running process. It can also be used for setting the policy at runtime.

Procedure

  1. View the process ID (PID) of the active processes:

    # ps

    Use the --pid or -p option with the ps command to view the details of the particular PID.

  2. Check the scheduling policy, PID, and priority of a particular process:

    # chrt -p 468
    pid 468's current scheduling policy: SCHED_FIFO
    pid 468's current scheduling priority: 85
    
    # chrt -p 476
    pid 476's current scheduling policy: SCHED_OTHER
    pid 476's current scheduling priority: 0

    Here, 468 and 476 are PID of a process.

  3. Set the scheduling policy of a process:

    1. For example, to set the process with PID 1000 to SCHED_FIFO, with a priority of 50:

      # chrt -f -p 50 1000
    2. For example, to set the process with PID 1000 to SCHED_OTHER, with a priority of 0:

      # chrt -o -p 0 1000
    3. For example, to set the process with PID 1000 to SCHED_RR, with a priority of 10:

      # chrt -r -p 10 1000
    4. To start a new application with a particular policy and priority, specify the name of the application:

      # chrt -f 36 /bin/my-app

29.6. Policy options for the chrt command

Using the chrt command, you can view and set the scheduling policy of a process.

The following table describes the appropriate policy options, which can be used to set the scheduling policy of a process.

Table 29.1. Policy Options for the chrt Command

Short optionLong optionDescription

-f

--fifo

Set schedule to SCHED_FIFO

-o

--other

Set schedule to SCHED_OTHER

-r

--rr

Set schedule to SCHED_RR

29.7. Changing the priority of services during the boot process

Using the systemd service, it is possible to set up real-time priorities for services launched during the boot process. The unit configuration directives are used to change the priority of a service during the boot process.

The boot process priority change is done by using the following directives in the service section:

CPUSchedulingPolicy=
Sets the CPU scheduling policy for executed processes. It is used to set other, fifo, and rr policies.
CPUSchedulingPriority=
Sets the CPU scheduling priority for executed processes. The available priority range depends on the selected CPU scheduling policy. For real-time scheduling policies, an integer between 1 (lowest priority) and 99 (highest priority) can be used.

The following procedure describes how to change the priority of a service, during the boot process, using the mcelog service.

Prerequisites

  1. Install the tuned package:

    # yum install tuned
  2. Enable and start the tuned service:

    # systemctl enable --now tuned

Procedure

  1. View the scheduling priorities of running threads:

    # tuna --show_threads
                          thread       ctxt_switches
        pid SCHED_ rtpri affinity voluntary nonvoluntary             cmd
      1      OTHER     0     0xff      3181          292         systemd
      2      OTHER     0     0xff       254            0        kthreadd
      3      OTHER     0     0xff         2            0          rcu_gp
      4      OTHER     0     0xff         2            0      rcu_par_gp
      6      OTHER     0        0         9            0 kworker/0:0H-kblockd
      7      OTHER     0     0xff      1301            1 kworker/u16:0-events_unbound
      8      OTHER     0     0xff         2            0    mm_percpu_wq
      9      OTHER     0        0       266            0     ksoftirqd/0
    [...]
  2. Create a supplementary mcelog service configuration directory file and insert the policy name and priority in this file:

    # cat <<-EOF > /etc/systemd/system/mcelog.system.d/priority.conf
    
    >
    [SERVICE]
    CPUSchedulingPolicy=_fifo_
    CPUSchedulingPriority=_20_
    EOF
  3. Reload the systemd scripts configuration:

    # systemctl daemon-reload
  4. Restart the mcelog service:

    # systemctl restart mcelog

Verification steps

  • Display the mcelog priority set by systemd issue:

    # tuna -t mcelog -P
    thread       ctxt_switches
      pid SCHED_ rtpri affinity voluntary nonvoluntary             cmd
    826     FIFO    20  0,1,2,3        13            0          mcelog

Additional resources

29.8. Priority map

Priorities are defined in groups, with some groups dedicated to certain kernel functions. For real-time scheduling policies, an integer between 1 (lowest priority) and 99 (highest priority) can be used.

The following table describes the priority range, which can be used while setting the scheduling policy of a process.

Table 29.2. Description of the priority range

PriorityThreadsDescription

1

Low priority kernel threads

This priority is usually reserved for the tasks that need to be just above SCHED_OTHER.

2 - 49

Available for use

The range used for typical application priorities.

50

Default hard-IRQ value

 

51 - 98

High priority threads

Use this range for threads that execute periodically and must have quick response times. Do not use this range for CPU-bound threads as you will starve interrupts.

99

Watchdogs and migration

System threads that must run at the highest priority.

29.9. cpu-partitioning profile

The cpu-partitioning profile is used to isolate CPUs from system level interruptions. Once you have isolated these CPUs, you can allocate them for specific applications. This is very useful in low-latency environments or in environments where you wish to extract the maximum performance from your hardware.

This profile also lets you designate housekeeping CPUs. A housekeeping CPU is used to run all services, daemons, shell processes, and kernel threads.

You can configure the cpu-partitioning profile in the /etc/tuned/cpu-partitioning-variables.conf file using the following configuration options:

isolated_cores=cpu-list
Lists CPUs to isolate. The list of isolated CPUs is comma-separated or you can specify a range using a dash, such as 3-5. This option is mandatory. Any CPU missing from this list is automatically considered a housekeeping CPU.
no_balance_cores=cpu-list
Lists CPUs which are not considered by the kernel during system wide process load-balancing. This option is optional. This is usually the same list as isolated_cores.

Additional resources

  • tuned-profiles-cpu-partitioning(7) man page