Cannot Reproduce "NETDEV WATCHDOG: (enic): transmit queue 0 timed out"

Latest response

We are running RHEL 7.9 on Cisco UCS B-Series blade servers, and are currently experiencing the issue outlined in the following KB: https://access.redhat.com/solutions/4126461 .

The ENIC driver on these blades is out-of-date, so I have high confidence that updating the driver alongside following the guidance provided in the KB will solve our issues.

The issue is that I need reliable, consistent proof that this error occurs. Our process would be to trigger the error, apply any fixes, and then attempt to trigger the error again. But I have not been able to consistently trigger it. I've tried a host of things, including iperf3, custom Python scripts that send random bytes from an affected blade to some arbitrary client, and transferring files back and forth from our NAS server. I've only been able to trigger the error at most once per day, and it's unpredictable when the error triggers. I just leave my scripts running and when I come back to it the NIC had reset. Even in our production systems, with our software running, the error occurs at most once per week, and has not yet occurred on the same blade twice in a row.

Does anyone have any ideas on consistently causing this error to appear?

Responses