Chapter 7. Keeping kernel panic parameters disabled in virtualized environments
When configuring a virtualized environment in Red Hat Enterprise Linux 8 (RHEL 8), you should not enable the softlockup_panic
and nmi_watchdog
kernel parameters, as the virtualized environment may trigger a spurious soft lockup that should not require a system panic.
The following sections explain the reasons behind this advice by summarizing:
- What causes a soft lockup.
- Describing the kernel parameters that control a system’s behavior on a soft lockup.
- Explaining how soft lockups may be triggered in a virtualized environment.
7.1. What is a soft lockup
A soft lockup is a situation usually caused by a bug, when a task is executing in kernel space on a CPU without rescheduling. The task also does not allow any other task to execute on that particular CPU. As a result, a warning is displayed to a user through the system console. This problem is also referred to as the soft lockup firing.
Additional resources
- For a technical reason behind a soft lockup, example log messages, and other details, see the following Knowledge Article.
7.2. Parameters controlling kernel panic
The following kernel parameters can be set to control a system’s behavior when a soft lockup is detected.
- softlockup_panic
Controls whether or not the kernel will panic when a soft lockup is detected.
Type Value Effect Integer
0
kernel does not panic on soft lockup
Integer
1
kernel panics on soft lockup
By default, on RHEL8 this value is 0.
In order to panic, the system needs to detect a hard lockup first. The detection is controlled by the
nmi_watchdog
parameter.- nmi_watchdog
Controls whether lockup detection mechanisms (
watchdogs
) are active or not. This parameter is of integer type.Value Effect 0
disables lockup detector
1
enables lockup detector
The hard lockup detector monitors each CPU for its ability to respond to interrupts.
- watchdog_thresh
Controls frequency of watchdog
hrtimer
, NMI events, and soft/hard lockup thresholds.Default threshold Soft lockup threshold 10 seconds
2 *
watchdog_thresh
Setting this parameter to zero disables lockup detection altogether.
Additional resources
-
For further information about
nmi_watchdog
andsoftlockup_panic
, see the Softlockup detector and hardlockup detector document. -
For more details about
watchdog_thresh
, see the Kernel sysctl document.
7.3. Spurious soft lockups in virtualized environments
The soft lockup firing on physical hosts, as described in What is a soft lockup, usually represents a kernel or hardware bug. The same phenomenon happening on guest operating systems in virtualized environments may represent a false warning.
Heavy work-load on a host or high contention over some specific resource such as memory, usually causes a spurious soft lockup firing. This is because the host may schedule out the guest CPU for a period longer than 20 seconds. Then when the guest CPU is again scheduled to run on the host, it experiences a time jump which triggers due timers. The timers include also watchdog hrtimer
, which can consequently report a soft lockup on the guest CPU.
Because a soft lockup in a virtualized environment may be spurious, you should not enable the kernel parameters that would cause a system panic when a soft lockup is reported on a guest CPU.
To understand soft lockups in guests, it is essential to know that the host schedules the guest as a task, and the guest then schedules its own tasks.
Additional resources
- For soft lockup definition and technicalities behind its functioning, see What is a soft lockup.
- To learn about components of RHEL 8 virtualized environments and their interaction, see RHEL 8 virtual machine components and their interaction.