Red Hat Training

A Red Hat training course is available for RHEL 8

Chapter 17. Tuning CPU frequency to optimize energy consumption

This section describes how to optimize the power consumption of your system by using the available cpupower commands to set CPU speed on a system as per your requirements after setting up the required CPUfreq governor.

17.1. Supported cpupower tool commands

The cpupower tool is a collection of tools to examine and tune power saving related features of processors.

The cpupower tool supports the following commands:

idle-info
Displays the available idle states and other statistics for the CPU idle driver using the cpupower idle-info command. For more information, see CPU Idle States.
idle-set
Enables or disables specific CPU idle state using the cpupower idle-set command as root. Use -d to disable and -e to enable a specific CPU idle state.
frequency-info
Displays the current cpufreq driver and available cpufreq governors using the cpupower frequency-info command. For more information, see CPUfreq drivers, Core CPUfreq Governors, and Intel P-state CPUfreq governors.
frequency-set
Sets the cpufreq and governors using the cpupower frequency-set command as root. For more information, see Setting up CPUfreq governor.
set

Sets processor power saving policies using the cpupower set command as root.

Using the --perf-bias option, you can enable software on supported Intel processors to determine the balance between optimum performance and saving power. Assigned values range from 0 to 15, where 0 is optimum performance and 15 is optimum power efficiency. By default, the --perf-bias option applies to all cores. To apply it only to individual cores, add the --cpu cpulist option.

info

Displays processor power related and hardware configurations, which you have enabled using the cpupower set command. For example, if you assign the --perf-bias value as 5:

# cpupower set --perf-bias 5
# cpupower info
analyzing CPU 0:
perf-bias: 5
monitor

Displays the idle statistics and CPU demands using the cpupower monitor command.

# cpupower monitor
 | Nehalem       || Mperf    ||Idle_Stats
 CPU| C3   | C6   | PC3  | PC6  || C0   | Cx   | Freq || POLL | C1   | C1E  | C3   | C6   | C7s  | C8   | C9   | C10
   0|  1.95| 55.12|  0.00|  0.00||  4.21| 95.79|  3875||  0.00|  0.68|  2.07|  3.39| 88.77|  0.00|  0.00|  0.00| 0.00
[...]

Using the -l option, you can list all available monitors on your system and the -m option to display information related to specific monitors. For example, to monitor information related to the Mperf monitor, use the cpupower monitor -m Mperf command as root.

Additional resources

  • cpupower(1), cpupower-idle-info(1), cpupower-idle-set(1), cpupower-frequency-set(1), cpupower-frequency-info(1), cpupower-set(1), cpupower-info(1), and cpupower-monitor(1) man pages

17.2. CPU Idle States

CPUs with the x86 architecture support various states, such as, few parts of the CPU are deactivated or using lower performance settings, known as C-states.

With this state, you can save power by partially deactivating CPUs that are not in use. There is no need to configure the C-state, unlike P-states that require a governor and potentially some set up to avoid undesirable power or performance issues. C-states are numbered from C0 upwards, with higher numbers representing decreased CPU functionality and greater power saving. C-states of a given number are broadly similar across processors, although the exact details of the specific feature sets of the state may vary between processor families. C-states 0–3 are defined as follows:

C0
In this state, the CPU is working and not idle at all.
C1, Halt
In this state, the processor is not executing any instructions but is typically not in a lower power state. The CPU can continue processing with practically no delay. All processors offering C-states need to support this state. Pentium 4 processors support an enhanced C1 state called C1E that actually is a state for lower power consumption.
C2, Stop-Clock
In this state, the clock is frozen for this processor but it keeps the complete state for its registers and caches, so after starting the clock again it can immediately start processing again. This is an optional state.
C3, Sleep
In this state, the processor goes to sleep and does not need to keep its cache up to date. Due to this reason, waking up from this state needs considerably more time than from the C2 state. This is an optional state.

You can view the available idle states and other statistics for the CPUidle driver using the following command:

$ cpupower idle-info
CPUidle governor: menu
analyzing CPU 0:

Number of idle states: 9
Available idle states: POLL C1 C1E C3 C6 C7s C8 C9 C10
[...]

Intel CPUs with the "Nehalem" microarchitecture features a C6 state, which can reduce the voltage supply of a CPU to zero, but typically reduces power consumption by between 80% and 90%. The kernel in Red Hat Enterprise Linux 8 includes optimizations for this new C-state.

Additional resources

  • cpupower(1) and cpupower-idle(1) man pages

17.3. Overview of CPUfreq

One of the most effective ways to reduce power consumption and heat output on your system is CPUfreq, which is supported by x86 and ARM64 architectures in Red Hat Enterprise Linux 8. CPUfreq, also referred to as CPU speed scaling, is the infrastructure in the Linux kernel that enables it to scale the CPU frequency in order to save power.

CPU scaling can be done automatically depending on the system load, in response to Advanced Configuration and Power Interface (ACPI) events, or manually by user-space programs, and it allows the clock speed of the processor to be adjusted on the fly. This enables the system to run at a reduced clock speed to save power. The rules for shifting frequencies, whether to a faster or slower clock speed and when to shift frequencies, are defined by the CPUfreq governor.

You can view the cpufreq information using the cpupower frequency-info command as root.

17.3.1. CPUfreq drivers

Using the cpupower frequency-info --driver command as root, you can view the current CPUfreq driver.

The following are the two available drivers for CPUfreq that can be used:

ACPI CPUfreq
Advanced Configuration and Power Interface (ACPI) CPUfreq driver is a kernel driver that controls the frequency of a particular CPU through ACPI, which ensures the communication between the kernel and the hardware.
Intel P-state

In Red Hat Enterprise Linux 8, Intel P-state driver is supported. The driver provides an interface for controlling the P-state selection on processors based on the Intel Xeon E series architecture or newer architectures.

Currently, Intel P-state is used by default for supported CPUs. You can switch to using ACPI CPUfreq by adding the intel_pstate=disable command to the kernel command line.

Intel P-state implements the setpolicy() callback. The driver decides what P-state to use based on the policy requested from the cpufreq core. If the processor is capable of selecting its next P-state internally, the driver offloads this responsibility to the processor. If not, the driver implements algorithms to select the next P-state.

Intel P-state provides its own sysfs files to control the P-state selection. These files are located in the /sys/devices/system/cpu/intel_pstate/ directory. Any changes made to the files are applicable to all CPUs.

This directory contains the following files that are used for setting P-state parameters:

  • max_perf_pct limits the maximum P-state requested by the driver expressed in a percentage of available performance. The available P-state performance can be reduced by the no_turbo setting.
  • min_perf_pct limits the minimum P-state requested by the driver, expressed in a percentage of the maximum no-turbo performance level.
  • no_turbo limits the driver to selecting P-state below the turbo frequency range.
  • turbo_pct displays the percentage of the total performance supported by hardware that is in the turbo range. This number is independent of whether turbo has been disabled or not.
  • num_pstates displays the number of P-states that are supported by hardware. This number is independent of whether turbo has been disabled or not.

Additional resources

  • cpupower-frequency-info(1) man page

17.3.2. Core CPUfreq governors

A CPUfreq governor defines the power characteristics of the system CPU, which in turn affects the CPU performance. Each governor has its own unique behavior, purpose, and suitability in terms of workload. Using the cpupower frequency-info --governor command as root, you can view the available CPUfreq governors.

Red Hat Enterprise Linux 8 includes multiple core CPUfreq governors:

cpufreq_performance
It forces the CPU to use the highest possible clock frequency. This frequency is statically set and does not change. As such, this particular governor offers no power saving benefit. It is only suitable for hours of a heavy workload, and only during times wherein the CPU is rarely or never idle.
cpufreq_powersave
It forces the CPU to use the lowest possible clock frequency. This frequency is statically set and does not change. This governor offers maximum power savings, but at the cost of the lowest CPU performance. The term "powersave" can sometimes be deceiving though, since in principle a slow CPU on full load consumes more power than a fast CPU that is not loaded. As such, while it may be advisable to set the CPU to use the powersave governor during times of expected low activity, any unexpected high loads during that time can cause the system to actually consume more power. The Powersave governor is more of a speed limiter for the CPU than a power saver. It is most useful in systems and environments where overheating can be a problem.
cpufreq_ondemand
It is a dynamic governor, using which you can enable the CPU to achieve maximum clock frequency when the system load is high, and also minimum clock frequency when the system is idle. While this allows the system to adjust power consumption accordingly with respect to system load, it does so at the expense of latency between frequency switching. As such, latency can offset any performance or power saving benefits offered by the ondemand governor if the system switches between idle and heavy workloads too often. For most systems, the ondemand governor can provide the best compromise between heat emission, power consumption, performance, and manageability. When the system is only busy at specific times of the day, the ondemand governor automatically switches between maximum and minimum frequency depending on the load without any further intervention.
cpufreq_userspace
It allows user-space programs, or any process running as root, to set the frequency. Of all the governors, userspace is the most customizable and depending on how it is configured, it can offer the best balance between performance and consumption for your system.
cpufreq_conservative
Similar to the ondemand governor, the conservative governor also adjusts the clock frequency according to usage. However, the conservative governor switches between frequencies more gradually. This means that the conservative governor adjusts to a clock frequency that it considers best for the load, rather than simply choosing between maximum and minimum. While this can possibly provide significant savings in power consumption, it does so at an ever greater latency than the ondemand governor.
Note

You can enable a governor using cron jobs. This allows you to automatically set specific governors during specific times of the day. As such, you can specify a low-frequency governor during idle times, for example, after work hours, and return to a higher-frequency governor during hours of heavy workload.

For instructions on how to enable a specific governor, see Setting up CPUfreq governor.

17.3.3. Intel P-state CPUfreq governors

By default, the Intel P-state driver operates in active mode with or without Hardware p-state (HWP) depending on whether the CPU supports HWP.

Using the cpupower frequency-info --governor command as root, you can view the available CPUfreq governors.

Note

The functionality of performance and powersave Intel P-state CPUfreq governors is different compared to core CPUfreq governors of the same names.

The Intel P-state driver can operate in the following three different modes:

Active mode with hardware-managed P-states

When active mode with HWP is used, the Intel P-state driver instructs the CPU to perform the P-state selection. The driver can provide frequency hints. However, the final selection depends on CPU internal logic. In active mode with HWP, the Intel P-state driver provides two P-state selection algorithms:

  • performance: With the performance governor, the driver instructs internal CPU logic to be performance-oriented. The range of allowed P-states is restricted to the upper boundary of the range that the driver is allowed to use.
  • powersave: With the powersave governor, the driver instructs internal CPU logic to be powersave-oriented.
Active mode without hardware-managed P-states

When active mode without HWP is used, the Intel P-state driver provides two P-state selection algorithms:

  • performance: With the performance governor, the driver chooses the maximum P-state it is allowed to use.
  • powersave: With the powersave governor, the driver chooses P-states proportional to the current CPU utilization. The behavior is similar to the ondemand CPUfreq core governor.
Passive mode
When the passive mode is used, the Intel P-state driver functions the same as the traditional CPUfreq scaling driver. All available generic CPUFreq core governors can be used.

17.3.4. Setting up CPUfreq governor

All CPUfreq drivers are built in as part of the kernel-tools package, and selected automatically. To set up CPUfreq, you need to select a governor.

Prerequisites

  • To use cpupower, install the kernel-tools package:

    # yum install kernel-tools

Procedure

  1. View which governors are available for use for a specific CPU:

    # cpupower frequency-info --governors
    analyzing CPU 0:
      available cpufreq governors: performance powersave
  2. Enable one of the governors on all CPUs:

    # cpupower frequency-set --governor performance

    Replace the performance governor with the cpufreq governor name as per your requirement.

    To only enable a governor on specific cores, use -c with a range or comma-separated list of CPU numbers. For example, to enable the userspace governor for CPUs 1-3 and 5, use:

    # cpupower -c 1-3,5 frequency-set --governor cpufreq_userspace
Note

If the kernel-tools package is not installed, the CPUfreq settings can be viewed in the /sys/devices/system/cpu/cpuid/cpufreq/ directory. Settings and values can be changed by writing to these tunables. For example, to set the minimum clock speed of cpu0 to 360 MHz, use:

# echo 360000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq

Verification

  • Verify that the governor is enabled:

    # cpupower frequency-info
    analyzing CPU 0:
      driver: intel_pstate
      CPUs which run at the same hardware frequency: 0
      CPUs which need to have their frequency coordinated by software: 0
      maximum transition latency:  Cannot determine or is not supported.
      hardware limits: 400 MHz - 4.20 GHz
      available cpufreq governors: performance powersave
      current policy: frequency should be within 400 MHz and 4.20 GHz.
            The governor "performance" may decide which speed to use within this range.
      current CPU frequency: Unable to call hardware
      current CPU frequency: 3.88 GHz (asserted by call to kernel)
      boost state support:
        Supported: yes
        Active: yes

    The current policy displays the recently enabled cpufreq governor. In this case, it is performance.

Additional resources

  • cpupower-frequency-info(1) and cpupower-frequency-set(1) man pages