Why did interfaces using 'enic' drivers get disabled with errors and lose connectivity?
Issue
Cisco UCS
servers using Red Hat genuineenic
drivers faced connectivity issue./var/log/messages
were having below errors at the time of issue.
Jun 13 02:07:57 HOSTNAME kernel: enic 0000:05:00.0: eth0: devcmd 20 timed out
Jun 13 02:07:57 HOSTNAME kernel: enic 0000:05:00.0: eth0: vNIC soft reset failed, err -110
Jun 13 02:07:57 HOSTNAME kernel: enic 0000:05:00.0: eth0: devcmd 36 timed out
Jun 13 02:07:57 HOSTNAME kernel: enic 0000:05:00.0: eth0: devcmd 21 timed out
Jun 13 02:07:57 HOSTNAME kernel: enic 0000:05:00.0: eth0: Failed to alloc notify buffer, aborting.
Jun 13 02:07:57 HOSTNAME kernel: enic 0000:05:00.0: eth0: devcmd 4 timed out
Jun 13 02:07:57 HOSTNAME kernel: enic 0000:05:00.0: eth0: devcmd 4 timed out
Jun 13 02:08:46 HOSTNAME kernel: enic 0000:05:00.0: eth0: devcmd 4 timed out
[...]
Jun 13 02:16:46 HOSTNAME kernel: enic 0000:05:00.0: eth0: devcmd2 4: wq is full. fetch index: 0, posted index: 31
Jun 13 02:17:46 HOSTNAME kernel: enic 0000:05:00.0: eth0: devcmd2 4: wq is full. fetch index: 0, posted index: 31
Jun 13 02:18:46 HOSTNAME kernel: enic 0000:05:00.0: eth0: devcmd2 4: wq is full. fetch index: 0, posted index: 31
Jun 13 02:19:46 HOSTNAME kernel: enic 0000:05:00.0: eth0: devcmd2 4: wq is full. fetch index: 0, posted index: 31
UCS-Manager
may also report the following LOM (LAN-on-Motherboard) errors in the SEL Log:
315 | 09/06/2018 22:05:41 | CIMC | Platform alert LED_LOM_FAULT #0xa3 | LED is on | Asserted
316 | 09/06/2018 22:05:41 | CIMC | Platform alert LED_LOM_FAULT #0xa3 | Degraded | Asserted
- The server may eventually become unresponsive or crash.
Environment
- Red Hat Enterprise Linux 7.5
- Cisco enic driver version 2.3.0.31
- Cisco UCS-B M200, UCS Server Firmware 3.1
- Cisco VIC-1340 eNIC
- NIC firmware version 4.0(8c)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.