Red Hat Training

A Red Hat training course is available for RHEL 8

Chapter 11. Keeping kernel panic parameters disabled in virtualized environments

When configuring a Virtual Machine in RHEL 8, you should not enable the softlockup_panic and nmi_watchdog kernel parameters, because the Virtual Machine might suffer from a spurious soft lockup. And that should not require a kernel panic.

Find the reasons behind this advice in the following sections.

11.1. What is a soft lockup

A soft lockup is a situation usually caused by a bug, when a task is executing in kernel space on a CPU without rescheduling. The task also does not allow any other task to execute on that particular CPU. As a result, a warning is displayed to a user through the system console. This problem is also referred to as the soft lockup firing.

Additional resources

11.2. Parameters controlling kernel panic

The following kernel parameters can be set to control a system’s behavior when a soft lockup is detected.

softlockup_panic

Controls whether or not the kernel will panic when a soft lockup is detected.

TypeValueEffect

Integer

0

kernel does not panic on soft lockup

Integer

1

kernel panics on soft lockup

By default, on RHEL8 this value is 0.

The system needs to detect a hard lockup first to be able to panic. The detection is controlled by the nmi_watchdog parameter.

nmi_watchdog

Controls whether lockup detection mechanisms (watchdogs) are active or not. This parameter is of integer type.

ValueEffect

0

disables lockup detector

1

enables lockup detector

The hard lockup detector monitors each CPU for its ability to respond to interrupts.

watchdog_thresh

Controls frequency of watchdog hrtimer, NMI events, and soft/hard lockup thresholds.

Default thresholdSoft lockup threshold

10 seconds

2 * watchdog_thresh

Setting this parameter to zero disables lockup detection altogether.

11.3. Spurious soft lockups in virtualized environments

The soft lockup firing on physical hosts, as described in What is a soft lockup, usually represents a kernel or hardware bug. The same phenomenon happening on guest operating systems in virtualized environments may represent a false warning.

Heavy work-load on a host or high contention over some specific resource such as memory, usually causes a spurious soft lockup firing. This is because the host may schedule out the guest CPU for a period longer than 20 seconds. Then when the guest CPU is again scheduled to run on the host, it experiences a time jump which triggers due timers. The timers include also watchdog hrtimer, which can consequently report a soft lockup on the guest CPU.

Because a soft lockup in a virtualized environment may be spurious, you should not enable the kernel parameters that would cause a system panic when a soft lockup is reported on a guest CPU.

Important

To understand soft lockups in guests, it is essential to know that the host schedules the guest as a task, and the guest then schedules its own tasks.