Why NIC stops working with "tg3_abort_hw timed out, TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff" messages in RHEL6?
Issue
- From time to time the
tg3interface on the host fails and goes out of network with following errors:
Jul 29 10:11:11 localhost kernel: tg3 0000:04:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff
Jul 29 10:11:12 localhost kernel: tg3 0000:04:00.0: eth0: No firmware running
- Sometime below
NETDEV WATCHDOGrelated logs might also occur:
Aug 9 14:14:46 HOSTNAME kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26b/0x280() (Not tainted)
Aug 9 14:14:46 HOSTNAME kernel: Hardware name: ProLiant DL380p Gen8
Aug 9 14:14:46 HOSTNAME kernel: NETDEV WATCHDOG: eth10 (tg3): transmit queue 0 timed out
Aug 9 14:14:46 HOSTNAME kernel: Modules linked in: nls_utf8 ext2 raid1 raid0 linear vfat msdos fat nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc
cpufreq_ondemand freq_table pcc_cpufreq bonding 8021q garp stp llc ipv6 microcode iTCO_wdt iTCO_vendor_support hpilo hpwdt igb i2c_algo_bit i2c_core serio_ra
w lpc_ich mfd_core ioatdma dca tg3 ptp pps_core power_meter sg shpchp ext4 jbd2 mbcache sr_mod cdrom dm_round_robin sd_mod pata_acpi ata_generic ata_piix hpsa
lpfc scsi_transport_fc scsi_tgt crc_t10dif dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Aug 9 14:14:46 HOSTNAME kernel: Pid: 0, comm: swapper Not tainted 2.6.32-431.el6.x86_64 #1
Aug 9 14:14:46 HOSTNAME kernel: Call Trace:
Aug 9 14:14:46 HOSTNAME kernel: <IRQ> [<ffffffff81071e27>] ? warn_slowpath_common+0x87/0xc0
Aug 9 14:14:46 HOSTNAME kernel: [<ffffffff81071f16>] ? warn_slowpath_fmt+0x46/0x50
Aug 9 14:14:46 HOSTNAME kernel: [<ffffffff8147b74b>] ? dev_watchdog+0x26b/0x280
Aug 9 14:14:46 HOSTNAME kernel: [<ffffffff81094fcd>] ? insert_work+0x6d/0xb0
[...]
Aug 9 14:14:46 HOSTNAME kernel: ---[ end trace 223c82a8ebf66e08 ]---
Aug 9 14:14:46 HOSTNAME kernel: tg3 0000:03:00.3: eth10: transmit timed out, resetting
Aug 9 14:14:46 HOSTNAME kernel: tg3 0000:03:00.3: eth10: 0x00000000: 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff
Aug 9 14:14:46 HOSTNAME kernel: tg3 0000:03:00.3: eth10: 0x00000010: 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff
[...]
Aug 9 14:14:47 HOSTNAME kernel: tg3 0000:03:00.3: tg3_stop_block timed out, ofs=c00 enable_bit=2
Aug 9 14:14:47 HOSTNAME kernel: tg3 0000:03:00.3: tg3_stop_block timed out, ofs=4800 enable_bit=2
Aug 9 14:14:47 HOSTNAME kernel: tg3 0000:03:00.3: tg3_stop_block timed out, ofs=1000 enable_bit=2
Aug 9 14:14:48 HOSTNAME kernel: tg3 0000:03:00.3: tg3_stop_block timed out, ofs=1c00 enable_bit=2
Aug 9 14:14:48 HOSTNAME kernel: tg3 0000:03:00.3: tg3_abort_hw timed out, TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff
Aug 9 14:14:48 HOSTNAME kernel: tg3 0000:03:00.3: tg3_stop_block timed out, ofs=3c00 enable_bit=2
Aug 9 14:14:48 HOSTNAME kernel: tg3 0000:03:00.3: tg3_stop_block timed out, ofs=4c00 enable_bit=2
Aug 9 14:14:49 HOSTNAME kernel: tg3 0000:03:00.3: eth10: No firmware running
Aug 9 14:14:50 HOSTNAME kernel: tg3 0000:03:00.3: tg3_abort_hw timed out, TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff
Aug 9 14:15:02 HOSTNAME kernel: tg3 0000:03:00.3: eth10: Link is down
[...]
Aug 13 04:25:11 HOSTNAME kernel: tg3 0000:03:00.3: tg3_abort_hw timed out, TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff
Aug 13 04:25:23 HOSTNAME kernel: tg3 0000:03:00.3: PME# enabled
- Why does this happen ? What is causing the interface to go down ?
Environment
- Red Hat Enterprise Linux (RHEL) 6.x.
- Broadcom
BCM5762NICs. tg3module.
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.