Network interface hangs or disappears with "WARNING: at lib/list_debug.c" or "list_add corruption" on realtime kernel

Solution Verified - Updated -

Issue

  • Network interface hangs or disappears with "WARNING: at lib/list_debug.c" or "list_add corruption" on realtime kernel
  • Physical interface goes down after RT-kernel update
  • ethtool shows Cannot get device settings: No such device for NIC which existed at boot time
  • A WARN-level message is logged about list corruption, such as:
WARNING: at lib/list_debug.c:29 __list_add+0x77/0xd0()
list_add corruption. next->prev should be prev (ffff883e13896900), but was ffff88803f955b90. (next=ffff88803f955b90).

Call Trace:
[<ffffffff815f08cd>] dump_stack+0x19/0x1c
[<ffffffff8105cd22>] warn_slowpath_common+0x82/0xc0
[<ffffffff8105ce16>] warn_slowpath_fmt+0x46/0x50
[<ffffffff812d3757>] __list_add+0x77/0xd0
[<ffffffff815183cd>] ? __napi_schedule_irqoff+0x1d/0x40
[<ffffffff815183d6>] __napi_schedule_irqoff+0x26/0x40
[<ffffffffa03d8185>] mlx4_en_rx_irq+0x45/0x60 [mlx4_en]
[<ffffffffa0386102>] mlx4_cq_completion+0x42/0x90 [mlx4_core]
[<ffffffffa0387988>] mlx4_eq_int+0x578/0xe50 [mlx4_core]
[<ffffffff810a5b2c>] ? pull_rt_task+0x29c/0x3b0
[<ffffffff810a6937>] ? dequeue_task_rt+0x57/0x70
[<ffffffffa0388274>] mlx4_msi_x_interrupt+0x14/0x20 [mlx4_core]
[<ffffffff810fd15e>] irq_forced_thread_fn+0x2e/0x70
[<ffffffff810fe22f>] irq_thread+0x13f/0x1c0
[<ffffffff810fd130>] ? irq_thread_fn+0x50/0x50
[<ffffffff810fd000>] ? irq_finalize_oneshot+0xf0/0xf0
[<ffffffff810fe0f0>] ? irq_thread_check_affinity+0xb0/0xb0
[<ffffffff810fe0f0>] ? irq_thread_check_affinity+0xb0/0xb0
[<ffffffff8108870e>] kthread+0xbe/0xd0
WARNING: at lib/list_debug.c:33 __list_add+0xbe/0xd0()
list_add corruption. prev->next should be next (ffff880c4fa35b90), but was dead000000100100. (prev=ffff880c12cf8a08).

Call Trace:
 [<ffffffff815f078d>] dump_stack+0x19/0x1c
 [<ffffffff8105cd12>] warn_slowpath_common+0x82/0xc0
 [<ffffffff8105ce06>] warn_slowpath_fmt+0x46/0x50
 [<ffffffff812d371e>] __list_add+0xbe/0xd0
 [<ffffffff8151837e>] __napi_schedule+0x2e/0x70
 [<ffffffffa03ff9fd>] efx_farch_msi_interrupt+0x5d/0x90 [sfc]
 [<ffffffff810fcf5e>] irq_forced_thread_fn+0x2e/0x70
 [<ffffffff810fe02f>] irq_thread+0x13f/0x1c0
 [<ffffffff810fcf30>] ? irq_thread_fn+0x50/0x50
 [<ffffffff810fce00>] ? irq_finalize_oneshot+0xf0/0xf0
 [<ffffffff810fdef0>] ? irq_thread_check_affinity+0xb0/0xb0
 [<ffffffff810fdef0>] ? irq_thread_check_affinity+0xb0/0xb0
 [<ffffffff810886fe>] kthread+0xbe/0xd0
  • Followed by a net device watchdog hang:
WARNING: at net/sched/sch_generic.c:297 dev_watchdog+0x27a/0x290()
NETDEV WATCHDOG: eth6 (mlx4_core): transmit queue 30 timed out

Environment

  • Red Hat Enterprise Linux 6 with MRG Realtime
    • Any kernel earlier than kernel-rt-3.10.0-514.rt56.210.el6rt
  • Red Hat Enterprise Linux 7 for Real Time
    • 7.3.z kernel earlier than kernel-rt-3.10.0-514.6.1.rt56.429.el7
    • 7.4 kernel earlier than kernel-rt-3.10.0-529.rt56.436.el7
  • Network interface with high traffic rate

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content