6.3. Configuration Suggestions
6.3.1. Configuring Kernel Tick Time
nohz_full
parameter. On a 16 core system, specifying nohz_full=1-15
enables dynamic tickless behavior on cores 1 through 15, moving all timekeeping to the only unspecified core (core 0). This behavior can be enabled either temporarily at boot time, or persistently in the /etc/default/grub
file. For persistent behavior, run the grub2-mkconfig -o /boot/grub2/grub.cfg
command to save your configuration.
- When the system boots, you must manually move rcu threads to the non-latency-sensitive core, in this case core 0.
# for i in `pgrep rcu[^c]` ; do taskset -pc 0 $i ; done
- Use the
isolcpus
parameter on the kernel command line to isolate certain cores from user-space tasks. - Optionally, set CPU affinity for the kernel's write-back bdi-flush threads to the housekeeping core:
echo 1 > /sys/bus/workqueue/devices/writeback/cpumask
# perf stat -C 1 -e irq_vectors:local_timer_entry taskset -c 1 stress -t 1 -c 1
while :; do d=1; done
.
# perf stat -C 1 -e irq_vectors:local_timer_entry taskset -c 1 stress -t 1 -c 1 1000 irq_vectors:local_timer_entry
# perf stat -C 1 -e irq_vectors:local_timer_entry taskset -c 1 stress -t 1 -c 1 1 irq_vectors:local_timer_entry
6.3.2. Setting Hardware Performance Policy (x86_energy_perf_policy)
performance
mode. It requires processor support, which is indicated by the presence of CPUID.06H.ECX.bit3
, and must be run with root privileges.
$ man x86_energy_perf_policy
6.3.3. Setting Process Affinity with taskset
Important
$ man taskset
6.3.4. Managing NUMA Affinity with numactl
$ man numactl
Note
libnuma
library. This library offers a simple programming interface to the NUMA policy supported by the kernel, and can be used for more fine-grained tuning than the numactl application. For more information, see the man page:
$ man numa
6.3.5. Automatic NUMA Affinity Management with numad
numad
is an automatic NUMA affinity management daemon. It monitors NUMA topology and resource usage within a system in order to dynamically improve NUMA resource allocation and management.
$ man numad
6.3.6. Tuning Scheduling Policy
6.3.6.1. Scheduling Policies
6.3.6.1.1. Static Priority Scheduling with SCHED_FIFO
SCHED_FIFO
(also called static priority scheduling) is a realtime policy that defines a fixed priority for each thread. This policy allows administrators to improve event response time and reduce latency, and is recommended for time sensitive tasks that do not run for an extended period of time.
SCHED_FIFO
is in use, the scheduler scans the list of all SCHED_FIFO
threads in priority order and schedules the highest priority thread that is ready to run. The priority level of a SCHED_FIFO
thread can be any integer from 1 to 99, with 99 treated as the highest priority. Red Hat recommends starting at a low number and increasing priority only when you identify latency issues.
Warning
SCHED_FIFO
bandwidth to prevent realtime application programmers from initiating realtime tasks that monopolize the processor.
- /proc/sys/kernel/sched_rt_period_us
- This parameter defines the time period in microseconds that is considered to be one hundred percent of processor bandwidth. The default value is
1000000
μs, or 1 second. - /proc/sys/kernel/sched_rt_runtime_us
- This parameter defines the time period in microseconds that is devoted to running realtime threads. The default value is
950000
μs, or 0.95 seconds.
6.3.6.1.2. Round Robin Priority Scheduling with SCHED_RR
SCHED_RR
is a round-robin variant of SCHED_FIFO
. This policy is useful when multiple threads need to run at the same priority level.
SCHED_FIFO
, SCHED_RR
is a realtime policy that defines a fixed priority for each thread. The scheduler scans the list of all SCHED_RR
threads in priority order and schedules the highest priority thread that is ready to run. However, unlike SCHED_FIFO
, threads that have the same priority are scheduled round-robin style within a certain time slice.
sched_rr_timeslice_ms
kernel parameter (/proc/sys/kernel/sched_rr_timeslice_ms
). The lowest value is 1 millisecond.
6.3.6.1.3. Normal Scheduling with SCHED_OTHER
SCHED_OTHER
is the default scheduling policy in Red Hat Enterprise Linux 7. This policy uses the Completely Fair Scheduler (CFS) to allow fair processor access to all threads scheduled with this policy. This policy is most useful when there are a large number of threads or data throughput is a priority, as it allows more efficient scheduling of threads over time.
6.3.6.2. Isolating CPUs
isolcpus
boot parameter. This prevents the scheduler from scheduling any user-space threads on this CPU.
isolcpus=2,5-7
isolcpus
parameter, and does not currently achieve the performance gains associated with isolcpus
. See Section 6.3.8, “Configuring CPU, Thread, and Interrupt Affinity with Tuna” for more details about this tool.
6.3.7. Setting Interrupt Affinity on AMD64 and Intel 64
smp_affinity
, which defines the processors that will handle the interrupt request. To improve application performance, assign interrupt affinity and process affinity to the same processor, or processors on the same core. This allows the specified interrupt and application threads to share cache lines.
Important
Procedure 6.1. Balancing Interrupts Automatically
- If your BIOS exports its NUMA topology, the
irqbalance
service can automatically serve interrupt requests on the node that is local to the hardware requesting service.For details on configuringirqbalance
, see Section A.1, “irqbalance”.
Procedure 6.2. Balancing Interrupts Manually
- Check which devices correspond to the interrupt requests that you want to configure.Starting with Red Hat Enterprise Linux 7.5, the system configures the optimal interrupt affinity for certain devices and their drivers automatically. You can no longer configure their affinity manually. This applies to the following devices:
- Devices using the
be2iscsi
driver - NVMe PCI devices
- Find the hardware specification for your platform. Check if the chipset on your system supports distributing interrupts.
- If it does, you can configure interrupt delivery as described in the following steps.Additionally, check which algorithm your chipset uses to balance interrupts. Some BIOSes have options to configure interrupt delivery.
- If it does not, your chipset will always route all interrupts to a single, static CPU. You cannot configure which CPU is used.
- Check which Advanced Programmable Interrupt Controller (APIC) mode is in use on your system.Only non-physical flat mode (
flat
) supports distributing interrupts to multiple CPUs. This mode is available only for systems that have up to 8 CPUs.$
journalctl --dmesg | grep APIC
In the command output:- If your system uses a mode other than
flat
, you can see a line similar toSetting APIC routing to physical flat
. - If you can see no such message, your system uses
flat
mode.
If your system usesx2apic
mode, you can disable it by adding thenox2apic
option to the kernel command line in the bootloader configuration. - Calculate the
smp_affinity
mask.Thesmp_affinity
value is stored as a hexadecimal bit mask representing all processors in the system. Each bit configures a different CPU. The least significant bit is CPU 0.The default value of the mask isf
, meaning that an interrupt request can be handled on any processor in the system. Setting this value to1
means that only processor 0 can handle the interrupt.Procedure 6.3. Calculating the Mask
- In binary, use the value
1
for CPUs that will handle the interrupts.For example, to handle interrupts by CPU 0 and CPU 7, use0000000010000001
as the binary code:Table 6.1. Binary Bits for CPUs
CPU 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Binary 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 - Convert the binary code to hexadecimal.For example, to convert the binary code using Python:
>>>
hex(int('0000000010000001', 2))
'0x81'
On systems with more than 32 processors, you must delimitsmp_affinity
values for discrete 32 bit groups. For example, if you want only the first 32 processors of a 64 processor system to service an interrupt request, use0xffffffff,00000000
. - Set the
smp_affinity
mask.The interrupt affinity value for a particular interrupt request is stored in the associated/proc/irq/irq_number/smp_affinity
file.Write the calculated mask to the associated file:# echo mask > /proc/irq/irq_number/smp_affinity
Additional Resources
- On systems that support interrupt steering, modifying the
smp_affinity
property of an interrupt request sets up the hardware so that the decision to service an interrupt with a particular processor is made at the hardware level with no intervention from the kernel.For more information about interrupt steering, see Chapter 9, Networking.