Chapter 29. Stress testing real-time systems with stress-ng

The stress-ng tool measures the system’s capability to maintain a good level of efficiency under unfavorable conditions. The stress-ng tool is a stress workload generator to load and stress all kernel interfaces. It includes a wide range of stress mechanisms known as stressors. Stress testing makes a machine work hard and trip hardware issues such as thermal overruns and operating system bugs that occur when a system is being overworked.

There are over 270 different tests. These include CPU specific tests that exercise floating point, integer, bit manipulation, control flow, and virtual memory tests.

Note

Use the stress-ng tool with caution as some of the tests can impact the system’s thermal zone trip points on a poorly designed hardware. This can impact system performance and cause excessive system thrashing which can be difficult to stop.

29.1. Testing CPU floating point units and processor data cache

A floating point unit is the functional part of the processor that performs floating point arithmetic operations. Floating point units handle mathematical operations and make floating numbers or decimal calculations simpler.

Using the --matrix-method option, you can stress test the CPU floating point operations and processor data cache.

Prerequisites

  • You have root permissions on the systems

Procedure

  • To test the floating point on one CPU for 60 seconds, use the --matrix option:

    # stress-ng --matrix 1 -t 1m
  • To run multiple stressors on more than one CPUs for 60 seconds, use the --times or -t option:

    # stress-ng --matrix 0 -t 1m
    
    stress-ng --matrix 0 -t 1m --times
    stress-ng: info:  [16783] dispatching hogs: 4 matrix
    stress-ng: info:  [16783] successful run completed in 60.00s (1 min, 0.00 secs)
    stress-ng: info:  [16783] for a 60.00s run time:
    stress-ng: info:  [16783] 240.00s available CPU time
    stress-ng: info:  [16783] 205.21s user time   ( 85.50%)
    stress-ng: info:  [16783] 0.32s system time (  0.13%)
    stress-ng: info:  [16783] 205.53s total time  ( 85.64%)
    stress-ng: info:  [16783] load average: 3.20 1.25 1.40

    The special mode with 0 stressors, query the available CPUs to run, removing the need to specify the CPU number.

    The total CPU time required is 4 x 60 seconds (240 seconds), of which 0.13% is in the kernel, 85.50% is in user time, and stress-ng runs 85.64% of all the CPUs.

  • To test message passing between processes using a POSIX message queue, use the -mq option:

    # stress-ng --mq 0 -t 30s --times --perf

    The mq option configures a specific number of processes to force context switches using the POSIX message queue. This stress test aims for low data cache misses.

29.2. Testing CPU with multiple stress mechanisms

The stress-ng tool runs multiple stress tests. In the default mode, it runs the specified stressor mechanisms in parallel.

Prerequisites

  • You have root privileges on the systems

Procedure

  • Run multiple instances of CPU stressors as follows:

    # stress-ng --cpu 2 --matrix 1 --mq 3 -t 5m

    In the example, stress-ng runs two instances of the CPU stressors, one instance of the matrix stressor and three instances of the message queue stressor to test for five minutes.

  • To run all stress tests in parallel, use the –all option:

    # stress-ng --all 2

    In this example, stress-ng runs two instances of all stress tests in parallel.

  • To run each different stressor in a specific sequence, use the --seq option.

    # stress-ng --seq 4 -t 20

    In this example, stress-ng runs all the stressors one by one for 20 minutes, with the number of instances of each stressor matching the number of online CPUs.

  • To exclude specific stressors from a test run, use the -x option:

    # stress-ng --seq 1 -x numa,matrix,hdd

    In this example, stress-ng runs all stressors, one instance of each, excluding numa, hdd and key stressors mechanisms.

29.3. Measuring CPU heat generation

To measure the CPU heat generation, the specified stressors generate high temperatures for a short time duration to test the system’s cooling reliability and stability under maximum heat generation. Using the --matrix-size option, you can measure CPU temperatures in degrees Celsius over a short time duration.

Prerequisites

  • You have root privileges on the system.

Procedure

  1. To test the CPU behavior at high temperatures for a specified time duration, run the following command:

    # stress-ng --matrix 0 --matrix-size 64 --tz -t 60
    
      stress-ng: info:  [18351] dispatching hogs: 4 matrix
      stress-ng: info:  [18351] successful run completed in 60.00s (1 min, 0.00 secs)
      stress-ng: info:  [18351] matrix:
      stress-ng: info:  [18351] x86_pkg_temp   88.00 °C
      stress-ng: info:  [18351] acpitz   87.00 °C

    In this example, the stress-ng configures the processor package thermal zone to reach 88 degrees Celsius over the duration of 60 seconds.

  2. (Optional) To print a report at the end of a run, use the --tz option:

    # stress-ng --cpu 0 --tz -t 60
    
      stress-ng: info:  [18065] dispatching hogs: 4 cpu
      stress-ng: info:  [18065] successful run completed in 60.07s (1 min, 0.07 secs)
      stress-ng: info:  [18065] cpu:
      stress-ng: info:  [18065] x86_pkg_temp   88.75 °C
      stress-ng: info:  [18065] acpitz   88.38 °C

29.4. Measuring test outcomes with bogo operations

The stress-ng tool can measure a stress test throughput by measuring the bogo operations per second. The size of a bogo operation depends on the stressor being run. The test outcomes are not precise, but they provide a rough estimate of the performance.

You must not use this measurement as an accurate benchmark metric. These estimates help to understand the system performance changes on different kernel versions or different compiler versions used to build stress-ng. Use the --metrics-brief option to display the total available bogo operations and the matrix stressor performance on your machine.

Prerequisites

  • You have root privileges on the system.

Procedure

  • To measure test outcomes with bogo operations, use with the --metrics-brief option:

    # stress-ng --matrix 0 -t 60s --metrics-brief
    
    stress-ng: info: [17579] dispatching hogs: 4 matrix
    stress-ng: info: [17579] successful run completed in 60.01s (1 min, 0.01 secs)
    stress-ng: info: [17579] stressor bogo ops real time usr time sys time   bogo ops/s bogo ops/s
    stress-ng: info:  [17579]                  (secs)   (secs)  (secs)  (real time) (usr+sys time)
    stress-ng: info:  [17579] matrix  349322   60.00    203.23   0.19      5822.03      1717.25

    The --metrics-brief option displays the test outcomes and the total real-time bogo operations run by the matrix stressor for 60 seconds.

29.5. Generating a virtual memory pressure

When under memory pressure, the kernel starts writing pages out to swap. You can stress the virtual memory by using the --page-in option to force non-resident pages to swap back into the virtual memory. This causes the virtual machine to be heavily exercised. Using the --page-in option, you can enable this mode for the bigheap, mmap and virtual machine (vm) stressors. The --page-in option, touch allocated pages that are not in core, forcing them to page in.

Prerequisites

  • You have root privileges on the system.

Procedure

  • To stress test a virtual memory, use the --page-in option:

    # stress-ng --vm 2 --vm-bytes 2G --mmap 2 --mmap-bytes 2G --page-in

    In this example, stress-ng tests memory pressure on a system with 4GB of memory, which is less than the allocated buffer sizes, 2 x 2GB of vm stressor and 2 x 2GB of mmap stressor with --page-in enabled.

29.6. Testing large interrupts loads on a device

Running timers at high frequency can generate a large interrupt load. The –timer stressor with an appropriately selected timer frequency can force many interrupts per second.

Prerequisites

  • You have root permissions on the system.

Procedure

  • To generate an interrupt load, use the --timer option:

    # stress-ng --timer 32 --timer-freq 1000000

    In this example, stress-ng tests 32 instances at 1MHz.

29.7. Generating major page faults in a program

With stress-ng, you can test and analyze the page fault rate by generating major page faults in a page that are not loaded in the memory. On new kernel versions, the userfaultfd mechanism notifies the fault finding threads about the page faults in the virtual memory layout of a process.

Prerequisites

  • You have root permissions on the system.

Procedure

  • To generate major page faults on early kernel versions, use:

    # stress-ng --fault 0 --perf -t 1m
  • To generate major page faults on new kernel versions, use:

    # stress-ng --userfaultfd 0 --perf -t 1m

29.8. Viewing CPU stress test mechanisms

The CPU stress test contains methods to exercise a CPU. You can print an output to view all methods using the which option.

If you do not specify the test method, by default, the stressor checks all the stressors in a round-robin fashion to test the CPU with each stressor.

Prerequisites

  • You have root permissions on the system.

Procedure

  1. Print all available stressor mechanisms, use the which option:

    # stress-ng --cpu-method which
    
    cpu-method must be one of: all ackermann bitops callfunc cdouble cfloat clongdouble correlate crc16 decimal32 decimal64 decimal128 dither djb2a double euler explog fft fibonacci float fnv1a gamma gcd gray hamming hanoi hyperbolic idct int128 int64 int32
  2. Specify a specific CPU stress method using the --cpu-method option:

    # stress-ng --cpu 1 --cpu-method fft -t 1m

29.9. Using the verify mode

The verify mode validates the results when a test is active. It sanity checks the memory contents from a test run and reports any unexpected failures.

All stressors do not have the verify mode and enabling one will reduce the bogo operation statistics because of the extra verification step being run in this mode.

Prerequisites

  • You have root permissions on the system.

Procedure

  • To validate a stress test results, use the --verify option:

    # stress-ng --vm 1 --vm-bytes 2G --verify -v

    In this example, stress-ng prints the output for an exhaustive memory check on a virtually mapped memory using the vm stressor configured with --verify mode. It sanity checks the read and write results on the memory.