sfc Solarflare NIC memory parity error followed by timeout error; bonding does not failover
Issue
- Solarflare NIC using sfc module reports a memory parity error and then a timeout error. The bonding module does not automatically fail over:
Mar 28 09:00:35 localhost kernel: sfc 0000:90:00.1: ERR: eth9 SYSTEM ERROR 00000000:00000001:00000001:00000000 status 00000000:00000000:00000f3d:ffffe100: disabling bus mastering
Mar 28 09:00:35 localhost kernel: sfc 0000:90:00.1: ERR: eth9 SYSTEM ERROR: memory parity error 00000000:00000000:00000000:00001000
Mar 28 09:00:35 localhost kernel: sfc 0000:90:00.1: ERR: eth9 SYSTEM ERROR - reset scheduled
Mar 28 09:00:35 localhost kernel: sfc 0000:90:00.1: INFO: eth9 resetting (ALL)
Mar 28 09:00:40 localhost kernel: sfc 0000:90:00.1: ERR: eth9 tx queue 0 flush command timed out
Mar 28 09:00:40 localhost kernel: sfc 0000:90:00.1: ERR: eth9 tx queue 1 flush command timed out
Mar 28 09:00:40 localhost kernel: sfc 0000:90:00.1: ERR: eth9 rx queue 0 flush command timed out
Mar 28 09:00:40 localhost kernel: sfc 0000:90:00.1: ERR: eth9 rx queue 1 flush command timed out
Mar 28 09:00:40 localhost kernel: sfc 0000:90:00.1: ERR: eth9 rx queue 2 flush command timed out
Mar 28 09:00:40 localhost kernel: sfc 0000:90:00.1: ERR: eth9 rx queue 3 flush command timed out
Mar 28 09:00:40 localhost kernel: sfc 0000:90:00.1: ERR: eth9 failed to flush queues
Environment
- Red Hat Enterprise Linux
- Solarflare NIC
- Bonding
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.