Red Hat Training

A Red Hat training course is available for Red Hat Enterprise Linux

3.2. CPUfreq

One of the most effective ways to reduce power consumption and heat output on your system is CPUfreq. CPUfreq — also referred to as CPU speed scaling — is the infrastructure in the Linux kernel that enables to scale the CPU frequency in order to save power. CPU scalling can be done automatically depending on the system load, in response to ACPI events, or manually by user-space programs, and it allows the clock speed of the processor to be adjusted on the fly. This enables the system to run at a reduced clock speed to save power. The rules for shifting frequencies, whether to a faster or slower clock speed, and when to shift frequencies, are defined by the CPUfreq governor.

3.2.1. CPUfreq drivers

Two drivers for CPUfreq can be used: ACPI CPUfreq driver and Intel P-state driver.

ACPI CPUfreq

ACPI CPUfreq driver is a kernel driver that controls the frequency of a particular CPU through ACPI, which ensures the communication between the kernel and the hardware.

Intel P-state

In Red Hat Enterprise Linux 7, Intel P-state driver is supported. The driver provides an interface for controlling the P-state selection on processors based on the Intel Xeon E series architecture or newer architectures. Intel P-state implements the setpolicy() callback. The driver decides what P-state to use based on the policy requested from the cpufreq core. If the processor is capable of selecting its next P-state internally, the driver offloads this responsibility to the processor. If not, the driver implements algorithms to select the next P-state.
Intel P-state provides its own sysfs files to control the P-state selection. These files are located in the /sys/devices/system/cpu/intel_pstate/ directory. Any changes made to the files are applicable to all CPUs. This directory contains five files that are used for setting P-state parameters:
  • max_perf_pct: Limits the maximum P-state requested by the driver, expressed in a percentage of available performance. The available P-state performance can be reduced by the no_turbo setting (see below).
  • min_perf_pct: min_perf_pct: Limits the minimum P-state requested by the driver, expressed in a percentage of the maximum (no-turbo) performance level.
  • no_turbo: Limits the driver to selecting P-state below the turbo frequency range.
  • turbo_pct: Displays the percentage of the total performance supported by hardware that is in the turbo range. This number is independent of whether turbo has been disabled or not.
  • num_pstates: Displays the number of P-states that are supported by hardware. This number is independent of whether turbo has been disabled or not.
Currently, Intel P-state is used by default for supported CPUs. Users can switch to using ACPI CPUfreq by adding the following to the kernel command line:
intel_pstate=disable

3.2.2. CPUfreq Governors

A CPUfreq governor defines the power characteristics of the system CPU, which in turn affects CPU performance. Each governor has its own unique behavior, purpose, and suitability in terms of workload. This section describes how to choose and configure a CPUfreq governor, the characteristics of each governor, and what kind of workload each governor is suitable for.

Warning

Red Hat Enterprise Linux 7 includes multiple core CPUfreq governors. The Intel P-state driver by default operates in the active mode, in which only two CPUfreq governors are available: performance and powersave. Note that the functionality of performance and powersave Intel P-state CPUfreq governors is different compared to core CPUfreq governors of the same names.

3.2.2.1. Core CPUfreq Governors

Different types of CPUfreq governors available in Red Hat Enterprise Linux 7 are listed here:
cpufreq_performance

The Performance governor forces the CPU to use the highest possible clock frequency. This frequency will be statically set, and will not change. As such, this particular governor offers no power saving benefit. It is only suitable for hours of heavy workload, and even then only during times wherein the CPU is rarely (or never) idle.

cpufreq_powersave

By contrast, the Powersave governor forces the CPU to use the lowest possible clock frequency. This frequency will be statically set, and will not change. As such, this particular governor offers maximum power savings, but at the cost of the lowest CPU performance.

The term "powersave" can sometimes be deceiving, though, since (in principle) a slow CPU on full load consumes more power than a fast CPU that is not loaded. As such, while it may be advisable to set the CPU to use the Powersave governor during times of expected low activity, any unexpected high loads during that time can cause the system to actually consume more power.
The Powersave governor is, in simple terms, more of a "speed limiter" for the CPU than a "power saver". It is most useful in systems and environments where overheating can be a problem.
cpufreq_ondemand

The Ondemand governor is a dynamic governor that allows the CPU to achieve maximum clock frequency when system load is high, and also minimum clock frequency when the system is idle. While this allows the system to adjust power consumption accordingly with respect to system load, it does so at the expense of latency between frequency switching. As such, latency can offset any performance/power saving benefits offered by the Ondemand governor if the system switches between idle and heavy workloads too often.

For most systems, the Ondemand governor can provide the best compromise between heat emission, power consumption, performance, and manageability. When the system is only busy at specific times of the day, the Ondemand governor will automatically switch between maximum and minimum frequency depending on the load without any further intervention.
cpufreq_userspace

The Userspace governor allows user-space programs, or any process running as root, to set the frequency. Of all the governors, Userspace is the most customizable; and depending on how it is configured, it can offer the best balance between performance and consumption for your system.

cpufreq_conservative

Like the Ondemand governor, the Conservative governor also adjusts the clock frequency according to usage (like the Ondemand governor). However, while the Ondemand governor does so in a more aggressive manner (that is from maximum to minimum and back), the Conservative governor switches between frequencies more gradually.

This means that the Conservative governor will adjust to a clock frequency that it deems fitting for the load, rather than simply choosing between maximum and minimum. While this can possibly provide significant savings in power consumption, it does so at an ever greater latency than the Ondemand governor.

Note

You can enable a governor using cron jobs. This allows you to automatically set specific governors during specific times of the day. As such, you can specify a low-frequency governor during idle times (for example after work hours) and return to a higher-frequency governor during hours of heavy workload.
For instructions on how to enable a specific governor, see Section 3.2.3, “CPUfreq Setup”.

3.2.2.2. Intel P-state CPUfreq Governors

The Intel P-state driver can operate in three different modes:
  • Active mode with hardware-managed P-states (HWP)
  • Active mode without hardware-managed P-states (HWP)
  • Passive mode
By default, the Intel P-state driver operates in active mode with or without HWP depending on whether the CPU supports HWP.
Active mode with hardware-managed P-states
When active mode with HWP is used, the Intel P-state driver instructs the CPU to perform the P-state selection. The driver can provide frequency hints. However, the final selection depends on CPU internal logic.
In active mode with HWP, the Intel P-state driver provides two P-state selection algorithms:
  • Performance
  • Powersave
With the Performance governor, the driver instructs internal CPU logic to be performance-oriented. The range of allowed P-states is restricted to the upper boundary of the range that the driver is allowed to use.
With the Powersave governor, the driver instructs internal CPU logic to be powersave-oriented.
Active mode without hardware-managed P-states
When active mode without HWP is used, the Intel P-state driver provides two P-state selection algorithms:
  • Performance
  • Powersave
With the Performance governor, the driver chooses the maximum P-state it is allowed to use.
With the Powersave governor, the driver chooses P-states proportional to the current CPU utilization. The behavior is similar to the Ondemand CPUfreq core governor.
Passive mode
When passive mode is used, the Intel P-state driver functions the same as traditional CPUfreq scaling driver. All available generic CPUFreq core governors can be used.
For more details about Intel P-state governors, see intel_pstate CPU Performance Scaling Driver.

3.2.3. CPUfreq Setup

All CPUfreq drivers are built in as part of the kernel-tools package, and selected automatically, so to set up CPUfreq you just need to select a governor.
You can view which governors are available for use for a specific CPU using:
~]# cpupower frequency-info --governors
You can then enable one of these governors on all CPUs using:
~]# cpupower frequency-set --governor [governor]
To only enable a governor on specific cores, use -c with a range or comma-separated list of CPU numbers. For example, to enable the Userspace governor for CPUs 1-3 and 5, the command would be:
~]# cpupower -c 1-3,5 frequency-set --governor cpufreq_userspace

3.2.4. Tuning CPUfreq Policy and Speed

Once you have chosen an appropriate CPUfreq governor, you can view CPU speed and policy information with the cpupower frequency-info command and further tune the speed of each CPU with options for cpupower frequency-set.
For cpupower frequency-info, the following options are available:
  • --freq — Shows the current speed of the CPU according to the CPUfreq core, in KHz.
  • --hwfreq — Shows the current speed of the CPU according to the hardware, in KHz (only available as root).
  • --driver — Shows what CPUfreq driver is used to set the frequency on this CPU.
  • --governors — Shows the CPUfreq governors available in this kernel. If you wish to use a CPUfreq governor that is not listed in this file, see Section 3.2.3, “CPUfreq Setup” for instructions on how to do so.
  • --affected-cpus — Lists CPUs that require frequency coordination software.
  • --policy — Shows the range of the current CPUfreq policy, in KHz, and the currently active governor.
  • --hwlimits — Lists available frequencies for the CPU, in KHz.
For cpupower frequency-set, the following options are available:
  • --min <freq> and --max <freq> — Set the policy limits of the CPU, in KHz.

    Important

    When setting policy limits, you should set --max before --min.
  • --freq <freq> — Set a specific clock speed for the CPU, in KHz. You can only set a speed within the policy limits of the CPU (as per --min and --max).
  • --governor <gov> — Set a new CPUfreq governor.

Note

If you do not have the kernel-tools package installed, CPUfreq settings can be viewed in the tunables found in /sys/devices/system/cpu/[cpuid]/cpufreq/. Settings and values can be changed by writing to these tunables. For example, to set the minimum clock speed of cpu0 to 360 KHz, use:
echo 360000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq