Accessing the watchdog device /dev/watchdog0 via the kfod.bin utility triggers a system reset
Issue
- HPE ProLiant systems with Oracle Grid Infrastructure and Oracle AHF installed are experiencing unexpected restarts with the following messages:
[105351.966765] watchdog: watchdog0: watchdog did not stop!
[105351.966841] watchdog: watchdog0: watchdog did not stop!
[105373.209527] Kernel panic - not syncing: 06: An NMI occurred. Depending on your system the reason for the NMI is logged in any one of the following resources:
[105373.209531] 1. Integrated Management Log (IML)
[105373.209532] 2. OA Syslog
[105373.209533] 3. OA Forward Progress Log
[105373.209533] 4. iLO Event Log
[105373.209534] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 4.18.0-553.22.1.el8_10.x86_64 #1
[105373.209535] Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 08/29/2024
[105373.209536] Call Trace:
[105373.209537] <NMI>
[105373.209538] dump_stack+0x41/0x60
[105373.209538] panic+0xe7/0x2ac
[105373.209539] ? __ghes_peek_estatus.isra.13+0x49/0xa0
[105373.209539] nmi_panic.cold.11+0xc/0xc
[105373.209539] hpwdt_pretimeout+0x7f/0xc6 [hpwdt]
[105373.209540] nmi_handle+0x63/0x110
[105373.209540] io_check_error+0x16/0x30
[105373.209540] default_do_nmi+0x97/0x110
[105373.209541] do_nmi+0x19c/0x210
[105373.209541] end_repeat_nmi+0x16/0x69
[105373.209541] RIP: 0010:intel_idle+0xa4/0x120
[105373.209542] Code: 40 dc 01 00 48 89 d1 0f 01 c8 48 8b 00 a8 08 75 19 0f 1f 44 00 00 0f 00 2d c5 e4 67 00 c1 eb 18 b9 01 00 00 00 89 d8 0f 01 c9 <40> 84 f6 74 18 48 89 fa b9 48 00 00 00 89 f8 48 c1 ea 20 0f 30 65
[105373.209543] RSP: 0018:ffffffff88003e20 EFLAGS: 00000002
[105373.209544] RAX: 0000000000000020 RBX: 0000000000000020 RCX: 0000000000000001
[105373.209544] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffe0057f401160
[105373.209545] RBP: 0000000000000004 R08: 0000000000000002 R09: 0000000000033000
[105373.209545] R10: 0012020d2c86f8a8 R11: ffff9be3ff432484 R12: 0000000000000004
[105373.209545] R13: ffffffff884c69a0 R14: 0000000000000004 R15: 0000000000000004
- Dell systems with Oracle Grid Infrastructure and Oracle AHF installed are experiencing abrupt restarts with the following messages:
[105351.966765] watchdog: watchdog0: watchdog did not stop!
[105351.966841] watchdog: watchdog0: watchdog did not stop!
Environment
- Red Hat Enterprise Linux 7
- Red Hat Enterprise Linux 8
- Red Hat Enterprise Linux 9
- HP ProLiant system with the [
hpwdt] module loaded - Dell PowerEdge systems with the [
iTCO_wdt] module loaded - Other systems with the [
iTCO_wdt] module loaded - Oracle Autonomous Health Framework (AHF)
- Kernel Filesystem Oracle Disk (
kfod.bin) utility
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.