Network interface hangs or disappears with "WARNING: at lib/list_debug.c" or "list_add corruption" on realtime kernel

Solution Verified - Updated -

Issue

  • Network interface hangs or disappears with "WARNING: at lib/list_debug.c" or "list_add corruption" on realtime kernel
  • Physical interface goes down after RT-kernel update
  • ethtool shows Cannot get device settings: No such device for NIC which existed at boot time
  • A WARN-level message is logged about list corruption, such as:
WARNING: at lib/list_debug.c:29 __list_add+0x77/0xd0()
list_add corruption. next->prev should be prev (ffff883e13896900), but was ffff88803f955b90. (next=ffff88803f955b90).

Call Trace:
[<ffffffff815f08cd>] dump_stack+0x19/0x1c
[<ffffffff8105cd22>] warn_slowpath_common+0x82/0xc0
[<ffffffff8105ce16>] warn_slowpath_fmt+0x46/0x50
[<ffffffff812d3757>] __list_add+0x77/0xd0
[<ffffffff815183cd>] ? __napi_schedule_irqoff+0x1d/0x40
[<ffffffff815183d6>] __napi_schedule_irqoff+0x26/0x40
[<ffffffffa03d8185>] mlx4_en_rx_irq+0x45/0x60 [mlx4_en]
[<ffffffffa0386102>] mlx4_cq_completion+0x42/0x90 [mlx4_core]
[<ffffffffa0387988>] mlx4_eq_int+0x578/0xe50 [mlx4_core]
[<ffffffff810a5b2c>] ? pull_rt_task+0x29c/0x3b0
[<ffffffff810a6937>] ? dequeue_task_rt+0x57/0x70
[<ffffffffa0388274>] mlx4_msi_x_interrupt+0x14/0x20 [mlx4_core]
[<ffffffff810fd15e>] irq_forced_thread_fn+0x2e/0x70
[<ffffffff810fe22f>] irq_thread+0x13f/0x1c0
[<ffffffff810fd130>] ? irq_thread_fn+0x50/0x50
[<ffffffff810fd000>] ? irq_finalize_oneshot+0xf0/0xf0
[<ffffffff810fe0f0>] ? irq_thread_check_affinity+0xb0/0xb0
[<ffffffff810fe0f0>] ? irq_thread_check_affinity+0xb0/0xb0
[<ffffffff8108870e>] kthread+0xbe/0xd0
WARNING: at lib/list_debug.c:33 __list_add+0xbe/0xd0()
list_add corruption. prev->next should be next (ffff880c4fa35b90), but was dead000000100100. (prev=ffff880c12cf8a08).

Call Trace:
 [<ffffffff815f078d>] dump_stack+0x19/0x1c
 [<ffffffff8105cd12>] warn_slowpath_common+0x82/0xc0
 [<ffffffff8105ce06>] warn_slowpath_fmt+0x46/0x50
 [<ffffffff812d371e>] __list_add+0xbe/0xd0
 [<ffffffff8151837e>] __napi_schedule+0x2e/0x70
 [<ffffffffa03ff9fd>] efx_farch_msi_interrupt+0x5d/0x90 [sfc]
 [<ffffffff810fcf5e>] irq_forced_thread_fn+0x2e/0x70
 [<ffffffff810fe02f>] irq_thread+0x13f/0x1c0
 [<ffffffff810fcf30>] ? irq_thread_fn+0x50/0x50
 [<ffffffff810fce00>] ? irq_finalize_oneshot+0xf0/0xf0
 [<ffffffff810fdef0>] ? irq_thread_check_affinity+0xb0/0xb0
 [<ffffffff810fdef0>] ? irq_thread_check_affinity+0xb0/0xb0
 [<ffffffff810886fe>] kthread+0xbe/0xd0
  • Followed by a net device watchdog hang:
WARNING: at net/sched/sch_generic.c:297 dev_watchdog+0x27a/0x290()
NETDEV WATCHDOG: eth6 (mlx4_core): transmit queue 30 timed out

Environment

  • Red Hat Enterprise Linux 6 with MRG Realtime
    • Any kernel earlier than kernel-rt-3.10.0-514.rt56.210.el6rt
  • Red Hat Enterprise Linux 7 for Real Time
    • 7.3.z kernel earlier than kernel-rt-3.10.0-514.6.1.rt56.429.el7
    • 7.4 kernel earlier than kernel-rt-3.10.0-529.rt56.436.el7
  • Network interface with high traffic rate

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In