Intel XL710 (i40e) reports a NETDEV WATCHDOG timeout (tx_timeout) when added to a bond

Solution Verified - Updated -

Issue

  • An i40e interface triggers the NETDEV WATCHDOG and reports a tx_timeout when it becomes the active interface in a bond. This might also happen constantly in LACP bonding.

    Apr 08 14:02:45 localhost kernel: bond0: making interface enp1s0f0 the new active one
    Apr 08 14:02:52 localhost kernel: ------------[ cut here ]------------
    Apr 08 14:02:52 localhost kernel: WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:356 dev_watchdog+0x248/0x260
    Apr 08 14:02:52 localhost kernel: NETDEV WATCHDOG: enp1s0f0 (i40e): transmit queue 6 timed out
    Apr 08 14:02:52 localhost kernel: Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache bonding sunrpc skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel     kvm irqbypass crc32_pclmul ghash_clmulni_intel iTCO_wdt iTCO_vendor_support aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ipmi_ssif dm_mod pcspkr cdc_ether usbnet sg mii mei_me lpc_ich mei i2c_i801 wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel mgag200 i2c_algo_bit drm_kms_helper i40e syscopyarea sysfillrect sysimgblt fb_sys_fops ttm tg3 drm ahci ptp libahci pps_core megaraid_sas libata drm_panel_orientation_quirks nfit libnvdimm
    Apr 08 14:02:52 localhost kernel: CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Not tainted 3.10.0-957.10.1.el7.x86_64 #1
    Apr 08 14:02:52 localhost kernel: Hardware name:
    Apr 08 14:02:52 localhost kernel: Call Trace:
    Apr 08 14:02:52 localhost kernel:  <IRQ>  [<ffffffff9fd68c49>] dump_stack+0x19/0x1b
    Apr 08 14:02:52 localhost kernel:  [<ffffffff9f697948>] __warn+0xd8/0x100
    Apr 08 14:02:52 localhost kernel:  [<ffffffff9f6979cf>] warn_slowpath_fmt+0x5f/0x80
    Apr 08 14:02:52 localhost kernel:  [<ffffffff9fc6c2b8>] dev_watchdog+0x248/0x260
    Apr 08 14:02:52 localhost kernel:  [<ffffffff9fc6c070>] ? dev_deactivate_queue.constprop.27+0x60/0x60
    Apr 08 14:02:52 localhost kernel:  [<ffffffff9f6a8278>] call_timer_fn+0x38/0x110
    Apr 08 14:02:52 localhost kernel:  [<ffffffff9fc6c070>] ? dev_deactivate_queue.constprop.27+0x60/0x60
    Apr 08 14:02:52 localhost kernel:  [<ffffffff9f6aa6dd>] run_timer_softirq+0x24d/0x300
    Apr 08 14:02:52 localhost kernel:  [<ffffffff9f6a1225>] __do_softirq+0xf5/0x280
    Apr 08 14:02:52 localhost kernel:  [<ffffffff9fd7f2ec>] call_softirq+0x1c/0x30
    Apr 08 14:02:52 localhost kernel:  [<ffffffff9f62e675>] do_softirq+0x65/0xa0
    Apr 08 14:02:52 localhost kernel:  [<ffffffff9f6a15a5>] irq_exit+0x105/0x110
    Apr 08 14:02:52 localhost kernel:  [<ffffffff9fd80698>] smp_apic_timer_interrupt+0x48/0x60
    Apr 08 14:02:52 localhost kernel:  [<ffffffff9fd7cdb2>] apic_timer_interrupt+0x162/0x170
    Apr 08 14:02:52 localhost kernel:  <EOI>  [<ffffffff9fbb3457>] ? cpuidle_enter_state+0x57/0xd0
    Apr 08 14:02:52 localhost kernel:  [<ffffffff9fbb35ae>] cpuidle_idle_call+0xde/0x230
    Apr 08 14:02:52 localhost kernel:  [<ffffffff9f63670e>] arch_cpu_idle+0xe/0xc0
    Apr 08 14:02:52 localhost kernel:  [<ffffffff9f6fc93a>] cpu_startup_entry+0x14a/0x1e0
    Apr 08 14:02:52 localhost kernel:  [<ffffffff9f657e17>] start_secondary+0x1f7/0x270
    Apr 08 14:02:52 localhost kernel:  [<ffffffff9f6000d5>] start_cpu+0x5/0x14
    Apr 08 14:02:52 localhost kernel: ---[ end trace a7b83f01706b526c ]--- 
    Apr 08 14:02:52 localhost kernel: i40e 0000:d8:00.0 enp1s0f0: tx_timeout: VSI_seid: 390, Q 6, NTC: 0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1
    Apr 08 14:02:52 localhost kernel: i40e 0000:d8:00.0 enp1s0f0: tx_timeout recovery level 1, hung_queue 6
    Apr 08 14:02:52 localhost kernel: i40e 0000:d8:00.0: VSI seid 390 Tx ring 0 disable timeout
    Apr 08 14:02:52 localhost kernel: bond0: link status definitely down for interface enp1s0f0, disabling it
    Apr 08 14:02:52 localhost kernel: bond0: making interface eno1 the new active one
    Apr 08 14:02:52 localhost kernel: Non-contiguous TC - Disabling DCB
    Apr 08 14:02:52 localhost kernel: bond0: link status definitely up for interface enp1s0f0, 10000 Mbps full duplex
    

Environment

  • Red Hat Enterprise Linux 7
  • Red Hat Enterprise Linux 8
  • Intel NIC with i40e driver
  • Bonding

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content