Chapter 17. Tuning CPU frequency to optimize energy consumption
This section describes how to optimize the power consumption of your system by using the available
cpupower commands to set CPU speed on a system as per your requirements after setting up the required CPUfreq governor.
17.1. Supported cpupower tool commands
cpupower tool is a collection of tools to examine and tune power saving related features of processors.
cpupower tool supports the following commands:
Displays the available idle states and other statistics for the CPU idle driver using the
cpupower idle-infocommand. For more information, see CPU Idle States.
Enables or disables specific CPU idle state using the
cpupower idle-setcommand as root. Use
-dto disable and
-eto enable a specific CPU idle state.
Displays the current
cpufreqdriver and available
cpufreqgovernors using the
cpupower frequency-infocommand. For more information, see CPUfreq drivers, Core CPUfreq Governors, and Intel P-state CPUfreq governors.
cpufreqand governors using the
cpupower frequency-setcommand as root. For more information, see Setting up CPUfreq governor.
Sets processor power saving policies using the
cpupower setcommand as root.
--perf-biasoption, you can enable software on supported Intel processors to determine the balance between optimum performance and saving power. Assigned values range from
0is optimum performance and
15is optimum power efficiency. By default, the
--perf-biasoption applies to all cores. To apply it only to individual cores, add the
Displays processor power related and hardware configurations, which you have enabled using the
cpupower setcommand. For example, if you assign the
# cpupower set --perf-bias 5 # cpupower info analyzing CPU 0: perf-bias: 5
Displays the idle statistics and CPU demands using the
# cpupower monitor | Nehalem || Mperf ||Idle_Stats CPU| C3 | C6 | PC3 | PC6 || C0 | Cx | Freq || POLL | C1 | C1E | C3 | C6 | C7s | C8 | C9 | C10 0| 1.95| 55.12| 0.00| 0.00|| 4.21| 95.79| 3875|| 0.00| 0.68| 2.07| 3.39| 88.77| 0.00| 0.00| 0.00| 0.00 [...]
-loption, you can list all available monitors on your system and the
-moption to display information related to specific monitors. For example, to monitor information related to the
Mperfmonitor, use the
cpupower monitor -m Mperfcommand as root.
17.2. CPU Idle States
CPUs with the x86 architecture support various states, such as, few parts of the CPU are deactivated or using lower performance settings, known as C-states.
With this state, you can save power by partially deactivating CPUs that are not in use. There is no need to configure the C-state, unlike P-states that require a governor and potentially some set up to avoid undesirable power or performance issues. C-states are numbered from C0 upwards, with higher numbers representing decreased CPU functionality and greater power saving. C-states of a given number are broadly similar across processors, although the exact details of the specific feature sets of the state may vary between processor families. C-states 0–3 are defined as follows:
- In this state, the CPU is working and not idle at all.
- In this state, the processor is not executing any instructions but is typically not in a lower power state. The CPU can continue processing with practically no delay. All processors offering C-states need to support this state. Pentium 4 processors support an enhanced C1 state called C1E that actually is a state for lower power consumption.
- In this state, the clock is frozen for this processor but it keeps the complete state for its registers and caches, so after starting the clock again it can immediately start processing again. This is an optional state.
- In this state, the processor goes to sleep and does not need to keep its cache up to date. Due to this reason, waking up from this state needs considerably more time than from the C2 state. This is an optional state.
You can view the available idle states and other statistics for the CPUidle driver using the following command:
$ cpupower idle-info CPUidle governor: menu analyzing CPU 0: Number of idle states: 9 Available idle states: POLL C1 C1E C3 C6 C7s C8 C9 C10 [...]
Intel CPUs with the "Nehalem" microarchitecture features a C6 state, which can reduce the voltage supply of a CPU to zero, but typically reduces power consumption by between 80% and 90%. The kernel in Red Hat Enterprise Linux 8 includes optimizations for this new C-state.
17.3. Overview of CPUfreq
One of the most effective ways to reduce power consumption and heat output on your system is CPUfreq, which is supported by x86 and ARM64 architectures in Red Hat Enterprise Linux 8. CPUfreq, also referred to as CPU speed scaling, is the infrastructure in the Linux kernel that enables it to scale the CPU frequency in order to save power.
CPU scaling can be done automatically depending on the system load, in response to Advanced Configuration and Power Interface (ACPI) events, or manually by user-space programs, and it allows the clock speed of the processor to be adjusted on the fly. This enables the system to run at a reduced clock speed to save power. The rules for shifting frequencies, whether to a faster or slower clock speed and when to shift frequencies, are defined by the CPUfreq governor.
You can view the
cpufreq information using the
cpupower frequency-info command as root.
17.3.1. CPUfreq drivers
cpupower frequency-info --driver command as root, you can view the current CPUfreq driver.
The following are the two available drivers for CPUfreq that can be used:
- Advanced Configuration and Power Interface (ACPI) CPUfreq driver is a kernel driver that controls the frequency of a particular CPU through ACPI, which ensures the communication between the kernel and the hardware.
In Red Hat Enterprise Linux 8, Intel P-state driver is supported. The driver provides an interface for controlling the P-state selection on processors based on the Intel Xeon E series architecture or newer architectures.
Currently, Intel P-state is used by default for supported CPUs. You can switch to using ACPI CPUfreq by adding the
intel_pstate=disablecommand to the kernel command line.
Intel P-state implements the
setpolicy()callback. The driver decides what P-state to use based on the policy requested from the
cpufreqcore. If the processor is capable of selecting its next P-state internally, the driver offloads this responsibility to the processor. If not, the driver implements algorithms to select the next P-state.
Intel P-state provides its own
sysfsfiles to control the P-state selection. These files are located in the
/sys/devices/system/cpu/intel_pstate/directory. Any changes made to the files are applicable to all CPUs.
This directory contains the following files that are used for setting P-state parameters:
max_perf_pctlimits the maximum P-state requested by the driver expressed in a percentage of available performance. The available P-state performance can be reduced by the
min_perf_pctlimits the minimum P-state requested by the driver, expressed in a percentage of the maximum
no_turbolimits the driver to selecting P-state below the turbo frequency range.
turbo_pctdisplays the percentage of the total performance supported by hardware that is in the turbo range. This number is independent of whether
turbohas been disabled or not.
num_pstatesdisplays the number of P-states that are supported by hardware. This number is independent of whether turbo has been disabled or not.
17.3.2. Core CPUfreq governors
A CPUfreq governor defines the power characteristics of the system CPU, which in turn affects the CPU performance. Each governor has its own unique behavior, purpose, and suitability in terms of workload. Using the
cpupower frequency-info --governor command as root, you can view the available CPUfreq governors.
Red Hat Enterprise Linux 8 includes multiple core CPUfreq governors:
- It forces the CPU to use the highest possible clock frequency. This frequency is statically set and does not change. As such, this particular governor offers no power saving benefit. It is only suitable for hours of a heavy workload, and only during times wherein the CPU is rarely or never idle.
It forces the CPU to use the lowest possible clock frequency. This frequency is statically set and does not change. This governor offers maximum power savings, but at the cost of the lowest CPU performance. The term "powersave" can sometimes be deceiving though, since in principle a slow CPU on full load consumes more power than a fast CPU that is not loaded. As such, while it may be advisable to set the CPU to use the
powersavegovernor during times of expected low activity, any unexpected high loads during that time can cause the system to actually consume more power. The Powersave governor is more of a speed limiter for the CPU than a power saver. It is most useful in systems and environments where overheating can be a problem.
It is a dynamic governor, using which you can enable the CPU to achieve maximum clock frequency when the system load is high, and also minimum clock frequency when the system is idle. While this allows the system to adjust power consumption accordingly with respect to system load, it does so at the expense of latency between frequency switching. As such, latency can offset any performance or power saving benefits offered by the
ondemandgovernor if the system switches between idle and heavy workloads too often. For most systems, the
ondemandgovernor can provide the best compromise between heat emission, power consumption, performance, and manageability. When the system is only busy at specific times of the day, the
ondemandgovernor automatically switches between maximum and minimum frequency depending on the load without any further intervention.
It allows user-space programs, or any process running as root, to set the frequency. Of all the governors,
userspaceis the most customizable and depending on how it is configured, it can offer the best balance between performance and consumption for your system.
Similar to the
conservativegovernor also adjusts the clock frequency according to usage. However, the
conservativegovernor switches between frequencies more gradually. This means that the
conservativegovernor adjusts to a clock frequency that it considers best for the load, rather than simply choosing between maximum and minimum. While this can possibly provide significant savings in power consumption, it does so at an ever greater latency than the
You can enable a governor using
cron jobs. This allows you to automatically set specific governors during specific times of the day. As such, you can specify a low-frequency governor during idle times, for example, after work hours, and return to a higher-frequency governor during hours of heavy workload.
For instructions on how to enable a specific governor, see Setting up CPUfreq governor.
17.3.3. Intel P-state CPUfreq governors
By default, the Intel P-state driver operates in active mode with or without Hardware p-state (HWP) depending on whether the CPU supports HWP.
cpupower frequency-info --governor command as root, you can view the available CPUfreq governors.
The functionality of
powersave Intel P-state CPUfreq governors is different compared to core CPUfreq governors of the same names.
The Intel P-state driver can operate in the following three different modes:
Active mode with hardware-managed P-states
When active mode with HWP is used, the Intel P-state driver instructs the CPU to perform the P-state selection. The driver can provide frequency hints. However, the final selection depends on CPU internal logic. In active mode with HWP, the Intel P-state driver provides two P-state selection algorithms:
performance: With the
performancegovernor, the driver instructs internal CPU logic to be performance-oriented. The range of allowed P-states is restricted to the upper boundary of the range that the driver is allowed to use.
powersave: With the
powersavegovernor, the driver instructs internal CPU logic to be powersave-oriented.
Active mode without hardware-managed P-states
When active mode without HWP is used, the Intel P-state driver provides two P-state selection algorithms:
performance: With the
performancegovernor, the driver chooses the maximum P-state it is allowed to use.
powersave: With the
powersavegovernor, the driver chooses P-states proportional to the current CPU utilization. The behavior is similar to the
ondemandCPUfreq core governor.
passivemode is used, the Intel P-state driver functions the same as the traditional CPUfreq scaling driver. All available generic CPUFreq core governors can be used.
17.3.4. Setting up CPUfreq governor
All CPUfreq drivers are built in as part of the
kernel-tools package, and selected automatically. To set up CPUfreq, you need to select a governor.
cpupower, install the
# yum install kernel-tools
View which governors are available for use for a specific CPU:
# cpupower frequency-info --governors analyzing CPU 0: available cpufreq governors: performance powersave
Enable one of the governors on all CPUs:
# cpupower frequency-set --governor performance
performancegovernor with the
cpufreqgovernor name as per your requirement.
To only enable a governor on specific cores, use
-cwith a range or comma-separated list of CPU numbers. For example, to enable the
userspacegovernor for CPUs 1-3 and 5, use:
# cpupower -c 1-3,5 frequency-set --governor cpufreq_userspace
kernel-tools package is not installed, the CPUfreq settings can be viewed in the
/sys/devices/system/cpu/cpuid/cpufreq/ directory. Settings and values can be changed by writing to these tunables. For example, to set the minimum clock speed of cpu0 to 360 MHz, use:
# echo 360000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
Verify that the governor is enabled:
# cpupower frequency-info analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: Cannot determine or is not supported. hardware limits: 400 MHz - 4.20 GHz available cpufreq governors: performance powersave current policy: frequency should be within 400 MHz and 4.20 GHz. The governor "performance" may decide which speed to use within this range. current CPU frequency: Unable to call hardware current CPU frequency: 3.88 GHz (asserted by call to kernel) boost state support: Supported: yes Active: yes
The current policy displays the recently enabled
cpufreqgovernor. In this case, it is