What does the `kernel: WARNING: at net/sched/sch_generic.c dev_watchdog()` error indicate?
Environment
- Red Hat Enterprise Linux 9
- Red Hat Enterprise Linux 8
- Red Hat Enterprise Linux 7
- Red Hat Enterprise Linux 6
- Red Hat Enterprise Linux 5
- Network interfaces using the following drivers, and possibly others:
bnx2
bnx2x
e1000
e1000e
igb
ixgbe
netxen_nic
mlx4_core
niu
r8169
sky2
tg3
enic
ena
Issue
- System loses network connectivity, network driver backtraces found in
/var/log/messages
similar to the following:
WARNING: at net/sched/sch_generic.c:... dev_watchdog+0x.../0x...() (Not tainted)
Hardware name: ...
NETDEV WATCHDOG: ethX (<drivername>): transmit queue N timed out
Modules linked in: ...
Pid: ..., comm: ... Not tainted 2.6.32-....el6.x86_64 #1
Call Trace:
<IRQ> [<ffffffff........>] ? warn_slowpath_common+...
[<ffffffff........>] ? warn_slowpath_fmt+...
[<ffffffff........>] ? dev_watchdog+...
[<ffffffff........>] ? run_timer_softirq+...
...
[<ffffffff........>] ? __do_softirq+...
...
[<ffffffff........>] ? call_softirq+...
[<ffffffff........>] ? do_softirq+...
[<ffffffff........>] ? irq_exit+...
...
<EOI> [<ffffffff........>] ? ...
...
Resolution
The NETDEV WATCHDOG
message is the kernel's way of saying "This network device has not been transmitting data for a few seconds, even though it has data to transmit."
The watchdog message does not indicate why the device stopped transmitting. It may be due to a hardware error or a software (kernel/driver/BIOS/firmware) bug.
Red Hat Engineering has Private Bugs open for each individual driver where this issue has being seen.
As the NETDEV WATCHDOG
hang is a symptom of an issue, not an actual issue itself, the root cause of the NETDEV WATCHDOG
hangs must be investigated on an individual basis.
Please open a case with Red Hat Global Support Services, supplying a full sosreport
, and as much of the following information as possible:
- Full
dmesg
(not just an excerpt with theNETDEV WATCHDOG
message and the stack trace, for the reasons explained above). - Information about the affected hardware (sosreport should be fine).
- Did the network interface recover automatically shortly afterwards? Or can connectivity be restored by doing
ifdown
followed byifup
? Or can connectivity be restored byrmmod
followed bymodprobe
of the driver? Or is reboot the only way to make the device work again? - How often does the issue occur?
- Does the occurrence of the issue seem to correlate with specific workloads? Is there a way to reproduce it, or at least to make it more likely to happen?
- Do any of these kernel boot parameters help?:
pcie_aspm=off (ASPM has been known to cause problems in the past.)
intremap=off (Interrupt remapping has been known to cause lost
interrupts in conjunction with irqbalance
on some platforms, e.g. bug 887006)
Some additional troubleshooting steps which may help prevent the issue from re-occurring:
- Update the
kernel
package to the latest version, which will supply the latest available driver with fixes for known issues. - Update the system BIOS and network interface firmware.
- Ensure
irqbalance
is running
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments