The CPU frequency issue with NVIDIA Grace processor
Environment
- Red Hat Enterprise Linux for ARM 64 9.4
- Red Hat Enterprise Linux for ARM 64 9.3
- HPE ProLiant Compute DL384 Gen12 (NVIDIA Grace (2 CPU/2 GPU) based system)
- Pegatron Corporation Pegatron SVR AS201-1N0(NVIDIA Grace (1 CPU/1 GPU) based system)
- HPE Cray EX254n (EX Supercomputer)
Issue
- The reported minimum and maximum frequencies are inconsistent with the expected values.
- A single, random CPU core does not operate at the expected min frequency.
lscpu output
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 144
On-line CPU(s) list: 0-143
Vendor ID: ARM
BIOS Vendor ID: NVIDIA
Model name: Neoverse-V2
BIOS Model name: Grace A02
Model: 0
Thread(s) per core: 1
Core(s) per socket: 72
Socket(s): 2
Stepping: r0p0
Frequency boost: disabled
CPU(s) scaling MHz: 95%
CPU max MHz: 3492.0000
CPU min MHz: 81.0000
......
The reported CPU frequencies by RHEL hardware certification test suite:
Run 1:
User Min User Max Performance
------- --------------------- ---------------------- ---------------------
expected: 81 MHz 3.490 GHz 3.490 GHz
cpu 0 351 MHz (384.00 sec) 3.753 GHz (6.51 sec) 4.248 GHz (6.47 sec)
cpu 1 369 MHz (261.16 sec) 2.952 GHz (6.22 sec) 3.870 GHz (6.21 sec)
cpu 2 342 MHz (261.13 sec) 2.988 GHz (6.27 sec) 2.943 GHz (6.25 sec)
cpu 3 333 MHz (262.44 sec) 3.330 GHz (6.25 sec) 2.844 GHz (6.23 sec)
......
Run2
User Min User Max Performance
------- --------------------- ---------------------- ---------------------
expected: 81 MHz 3.490 GHz 3.490 GHz
cpu 10 360 MHz (263.07 sec) 3.978 GHz (6.27 sec) 2.853 GHz (6.22 sec)
cpu 11 342 MHz (379.17 sec) 5.832 GHz (6.52 sec) 3.141 GHz (6.47 sec)
cpu 12 333 MHz (262.50 sec) 3.465 GHz (6.23 sec) 3.213 GHz (6.25 sec)
......
Resolution
- The min/max frequencies in lscpu are 81 MHz and 3.490 GHz, but the min/max frequencies reported by rhcert hardware test suite do not match these values. For example, the reported min and max frequencies are 351 MHz and 3.753 GHz. This is a reporting issue, and the min/max frequencies should operate at the expected values.
- A random CPU core fails to operate at its expected minimum frequency. For example:
Run1
cpu 0 351 MHz (384.00 sec) 3.753 GHz (6.51 sec) 4.248 GHz (6.47 sec)
Run2
cpu 11 342 MHz (379.17 sec) 5.832 GHz (6.52 sec) 3.141 GHz (6.47 sec)
Compared to other CPU cores, which completed the task at the min frequency in about 260 seconds, CPU 0 and CPU 11 in these test runs took 384 and 379 seconds respectively to complete the task. This is lower than the expected min frequency. However, this does not affect the overall functionality of the CPU. HPE, Nvidia, and Red Hat are investigating the root cause of the issue with the min frequency and are working towards a resolution.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments