Red Hat Training

A Red Hat training course is available for RHEL 8

Chapter 15. Importance of power management

Reducing the overall power consumption of computer systems helps to save cost. Effectively optimizing energy consumption of each system component includes studying different tasks that your system performs, and configuring each component to ensure that its performance is correct for that job. Lowering the power consumption of a specific component or of the system as a whole leads to lower heat and performance.

Proper power management results in:

  • heat reduction for servers and computing centers
  • reduced secondary costs, including cooling, space, cables, generators, and uninterruptible power supplies (UPS)
  • extended battery life for laptops
  • lower carbon dioxide output
  • meeting government regulations or legal requirements regarding Green IT, for example, Energy Star
  • meeting company guidelines for new systems

This section describes the information regarding power management of your Red Hat Enterprise Linux systems.

15.1. Power management basics

Effective power management is built on the following principles:

An idle CPU should only wake up when needed

Since Red Hat Enterprise Linux 6, the kernel runs tickless, which means the previous periodic timer interrupts have been replaced with on-demand interrupts. Therefore, idle CPUs are allowed to remain idle until a new task is queued for processing, and CPUs that have entered lower power states can remain in these states longer. However, benefits from this feature can be offset if your system has applications that create unnecessary timer events. Polling events, such as checks for volume changes or mouse movement, are examples of such events.

Red Hat Enterprise Linux includes tools using which you can identify and audit applications on the basis of their CPU usage. For more information see, Audit and analysis overview and Tools for auditing.

Unused hardware and devices should be disabled completely
This is true for devices that have moving parts, for example, hard disks. In addition to this, some applications may leave an unused but enabled device "open"; when this occurs, the kernel assumes that the device is in use, which can prevent the device from going into a power saving state.
Low activity should translate to low wattage

In many cases, however, this depends on modern hardware and correct BIOS configuration or UEFI on modern systems, including non-x86 architectures. Make sure that you are using the latest official firmware for your systems and that in the power management or device configuration sections of the BIOS the power management features are enabled. Some features to look for include:

  • Collaborative Processor Performance Controls (CPPC) support for ARM64
  • PowerNV support for IBM Power Systems
  • SpeedStep
  • PowerNow!
  • Cool’n’Quiet
  • ACPI (C-state)
  • Smart

    If your hardware has support for these features and they are enabled in the BIOS, Red Hat Enterprise Linux uses them by default.

Different forms of CPU states and their effects

Modern CPUs together with Advanced Configuration and Power Interface (ACPI) provide different power states. The three different states are:

  • Sleep (C-states)
  • Frequency and voltage (P-states)
  • Heat output (T-states or thermal states)

    A CPU running on the lowest sleep state, consumes the least amount of watts, but it also takes considerably more time to wake it up from that state when needed. In very rare cases this can lead to the CPU having to wake up immediately every time it just went to sleep. This situation results in an effectively permanently busy CPU and loses some of the potential power saving if another state had been used.

A turned off machine uses the least amount of power
One of the best ways to save power is to turn off systems. For example, your company can develop a corporate culture focused on "green IT" awareness with a guideline to turn off machines during lunch break or when going home. You also might consolidate several physical servers into one bigger server and virtualize them using the virtualization technology, which is shipped with Red Hat Enterprise Linux.

15.2. Audit and analysis overview

The detailed manual audit, analysis, and tuning of a single system is usually the exception because the time and cost spent to do so typically outweighs the benefits gained from these last pieces of system tuning.

However, performing these tasks once for a large number of nearly identical systems where you can reuse the same settings for all systems can be very useful. For example, consider the deployment of thousands of desktop systems, or an HPC cluster where the machines are nearly identical. Another reason to do auditing and analysis is to provide a basis for comparison against which you can identify regressions or changes in system behavior in the future. The results of this analysis can be very helpful in cases where hardware, BIOS, or software updates happen regularly and you want to avoid any surprises with regard to power consumption. Generally, a thorough audit and analysis gives you a much better idea of what is really happening on a particular system.

Auditing and analyzing a system with regard to power consumption is relatively hard, even with the most modern systems available. Most systems do not provide the necessary means to measure power use via software. Exceptions exist though:

  • iLO management console of Hewlett Packard server systems has a power management module that you can access through the web.
  • IBM provides a similar solution in their BladeCenter power management module.
  • On some Dell systems, the IT Assistant offers power monitoring capabilities as well.

Other vendors are likely to offer similar capabilities for their server platforms, but as can be seen there is no single solution available that is supported by all vendors. Direct measurements of power consumption are often only necessary to maximize savings as far as possible.

15.3. Tools for auditing

Red Hat Enterprise Linux 8 offers tools using which you can perform system auditing and analysis. Most of them can be used as supplementary sources of information in case you want to verify what you have discovered already or in case you need more in-depth information about certain parts.

Many of these tools are used for performance tuning as well, which include:

PowerTOP
It identifies specific components of kernel and user-space applications that frequently wake up the CPU. Use the powertop command as root to start the PowerTop tool and powertop --calibrate to calibrate the power estimation engine. For more information about PowerTop, see Managing power consumption with PowerTOP.
Diskdevstat and netdevstat

They are SystemTap tools that collect detailed information about the disk activity and network activity of all applications running on a system. Using the collected statistics by these tools, you can identify applications that waste power with many small I/O operations rather than fewer, larger operations. Using the yum install tuned-utils-systemtap kernel-debuginfo command as root, install the diskdevstat and netdevstat tool.

To view the detailed information about the disk and network activity, use:

# diskdevstat

PID   UID   DEV   WRITE_CNT   WRITE_MIN   WRITE_MAX   WRITE_AVG   READ_CNT   READ_MIN   READ_MAX   READ_AVG   COMMAND

3575  1000  dm-2   59          0.000      0.365        0.006        5         0.000        0.000      0.000      mozStorage #5
3575  1000  dm-2    7          0.000      0.000        0.000        0         0.000        0.000      0.000      localStorage DB
[...]


# netdevstat

PID   UID   DEV       XMIT_CNT   XMIT_MIN   XMIT_MAX   XMIT_AVG   RECV_CNT   RECV_MIN   RECV_MAX   RECV_AVG   COMMAND
3572  991  enp0s31f6    40       0.000      0.882       0.108        0         0.000       0.000       0.000     openvpn
3575  1000 enp0s31f6    27       0.000      1.363       0.160        0         0.000       0.000       0.000     Socket Thread
[...]

With these commands, you can specify three parameters: update_interval, total_duration, and display_histogram.

TuneD
It is a profile-based system tuning tool that uses the udev device manager to monitor connected devices, and enables both static and dynamic tuning of system settings. You can use the tuned-adm recommend command to determine which profile Red Hat recommends as the most suitable for a particular product. For more information about TuneD, see Getting started with TuneD and Customizing TuneD profiles. Using the powertop2tuned utility, you can create custom TuneD profiles from PowerTOP suggestions. For information about the powertop2tuned utility, see Optimizing power consumption.
Virtual memory statistics (vmstat)

It is provided by the procps-ng package. Using this tool, you can view the detailed information about processes, memory, paging, block I/O, traps, and CPU activity.

To view this information, use:

$ vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r  b  swpd  free    buff   cache   si   so  bi   bo   in  cs  us  sy id  wa  st
1  0   0   5805576 380856 4852848   0    0  119  73  814  640  2   2 96   0   0

Using the vmstat -a command, you can display active and inactive memory. For more information about other vmstat options, see the vmstat man page.

iostat

It is provided by the sysstat package. This tool is similar to vmstat, but only for monitoring I/O on block devices. It also provides more verbose output and statistics.

To monitor the system I/O, use:

$ iostat
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.05    0.46    1.55    0.26    0.00   95.67

Device     tps     kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
nvme0n1    53.54     899.48     616.99      3445229     2363196
dm-0       42.84     753.72     238.71      2886921      914296
dm-1        0.03       0.60       0.00         2292           0
dm-2       24.15     143.12     379.80       548193     1454712
blktrace

It provides detailed information about how time is spent in the I/O subsystem.

To view this information in human readable format, use:

# blktrace -d /dev/dm-0 -o - | blkparse -i -

253,0   1    1   0.000000000  17694  Q   W 76423384 + 8 [kworker/u16:1]
253,0   2    1   0.001926913     0   C   W 76423384 + 8 [0]
[...]

Here, The first column, 253,0 is the device major and minor tuple. The second column, 1, gives information about the CPU, followed by columns for timestamps and PID of the process issuing the IO process.

The sixth column, Q, shows the event type, the 7th column, W for write operation, the 8th column, 76423384, is the block number, and the + 8 is the number of requested blocks.

The last field, [kworker/u16:1], is the process name.

By default, the blktrace command runs forever until the process is explicitly killed. Use the -w option to specify the run-time duration.

turbostat

It is provided by the kernel-tools package. It reports on processor topology, frequency, idle power-state statistics, temperature, and power usage on x86-64 processors.

To view this summary, use:

# turbostat

CPUID(0): GenuineIntel 0x16 CPUID levels; 0x80000008 xlevels; family:model:stepping 0x6:8e:a (6:142:10)
CPUID(1): SSE3 MONITOR SMX EIST TM2 TSC MSR ACPI-TM HT TM
CPUID(6): APERF, TURBO, DTS, PTM, HWP, HWPnotify, HWPwindow, HWPepp, No-HWPpkg, EPB
[...]

By default, turbostat prints a summary of counter results for the entire screen, followed by counter results every 5 seconds. Specify a different period between counter results with the -i option, for example, execute turbostat -i 10 to print results every 10 seconds instead.

Turbostat is also useful for identifying servers that are inefficient in terms of power usage or idle time. It also helps to identify the rate of system management interrupts (SMIs) occurring on the system. It can also be used to verify the effects of power management tuning.

cpupower

IT is a collection of tools to examine and tune power saving related features of processors. Use the cpupower command with the frequency-info, frequency-set, idle-info, idle-set, set, info, and monitor options to display and set processor related values.

For example, to view available cpufreq governors, use:

$ cpupower frequency-info --governors
analyzing CPU 0:
  available cpufreq governors: performance powersave

For more information about cpupower, see Viewing CPU related information.

GNOME Power Manager
It is a daemon that is installed as part of the GNOME desktop environment. GNOME Power Manager notifies you of changes in your system’s power status; for example, a change from battery to AC power. It also reports battery status, and warns you when battery power is low.

Additional resources

  • powertop(1), diskdevstat(8), netdevstat(8), tuned(8), vmstat(8), iostat(1), blktrace(8), blkparse(8), and turbostat(8) man pages
  • cpupower(1), cpupower-set(1), cpupower-info(1), cpupower-idle(1), cpupower-frequency-set(1), cpupower-frequency-info(1), and cpupower-monitor(1) man pages