What boot-time kernel parameter are required for the HPE Compute Scale-up Server 3200 when configured with large amounts of memory and CPU?

Updated -

Environment

  • HPE Compute Scale-up Server 3200
  • Red Hat Enterprise Linux (RHEL) 8.6-8.x
  • Red Hat Enterprise Linux (RHEL) 9.0-9.x

Resolution

In especially large certified configurations of the HPE Compute Scale-up Server 3200 such as the one described in the environment section above, a number of additional kernel parameters are required in order for the system to function properly. Those parameters are:

Red Hat Enterprise Linux 8

  • crashkernel=2G,High - HPE recommends these specific parameters to control the amount and location of memory needed for the crash kernel on systems with more than 24TB of RAM.
  • modprobe.blacklist=skx_edac - The skx_edac driver has a scaling issue that prevents it from loading on large systems. It must be disabled in this configuration.
  • nmi_watchdog=0 - An NMI could be improperly reported due to the length of time it takes to perform certain system operations in very large systems. Disabling the NMI watchdog timer prevents these erroneous NMIs from being reported and possibly triggering a reboot.
  • udev.children-max=512 - Limit the number of udev child processes to prevent a livelock condition on large systems. This can occur if too many drivers attempt to load at the same time.
  • console=ttyS0,115200 - HPE recommends these parameters for enabling access to the serial console on the Superdome Flex system.
  • pci=nobar - Do not assign address space to the BARs that weren't assigned by the BIOS.
  • uv_nmi.action=kdump - Handle system-wide NMI events generated by the global 'power nmi' command by triggering a kdump.
  • tsc=nowatchdog - This is to work around an Intel Sapphire Rapids holdoff issue that causes a delay when reading the TSC which causes false tsc failure message and a failover to an alternative clocksource, which destroys application performance.
  • add_efi_memmap - The e820 table has a limit of 128 entries which is not big enough for large systems. This option ensures the system used the EFI memmap which allows more than 128 entries (and includes e820 entries).
  • initcall_blacklist=acpi_cpufreq_init,isst_if_mbox_init - is a workaround for a known issue(acpi-cpufreq: Skip initialization if a cpufreq driver exists), which will be fixed in the errata version in the future.
  • transparent_hugepage=never - Disabling transparent hugepages to ensure consistent performance across all NUMA nodes

Red Hat Enterprise Linux 9

  • crashkernel=4G - HPE recommends these specific parameters to control the amount and location of memory needed for the crash kernel on systems with more than 24TB of RAM.
  • nmi_watchdog=0 - An NMI could be improperly reported due to the length of time it takes to perform certain system operations in very large systems. Disabling the NMI watchdog timer prevents these erroneous NMIs from being reported and possibly triggering a reboot.
  • udev.children-max=512 - Limit the number of udev child processes to prevent a livelock condition on large systems. This can occur if too many drivers attempt to load at the same time.
  • console=ttyS0,115200 - HPE recommends these parameters for enabling access to the serial console on the Superdome Flex system.
  • pci=nobar - Do not assign address space to the BARs that weren't assigned by the BIOS.
  • uv_nmi.action=kdump - Handle system-wide NMI events generated by the global 'power nmi' command by triggering a kdump.
  • tsc=nowatchdog - This is to work around an Intel Sapphire Rapids holdoff issue that causes a delay when reading the TSC which causes false tsc failure message and a failover to an alternative clocksource, which destroys application performance.
  • add_efi_memmap - The e820 table has a limit of 128 entries which is not big enough for large systems. This option ensures the system used the EFI memmap which allows more than 128 entries (and includes e820 entries).
  • initcall_blacklist=acpi_cpufreq_init,isst_if_mbox_init - is a workaround for a known issue(acpi-cpufreq: Skip initialization if a cpufreq driver exists), which will be fixed in the errata version in the future.
  • transparent_hugepage=never - Disabling transparent hugepages to ensure consistent performance across all NUMA nodes

Comments