Hard lockup deadlock after tlb_choose_channel when using ALB/TLB bonding

Solution Verified - Updated -

Issue

  • Hard lockup deadlock after bond_alb_xmit and tlb_choose_channel when using ALB/TLB bonding
  • Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu X with calltrace like:
    [exception RIP: _spin_lock+0x21]
    RIP: ffffffff8153b801  RSP: ffff88089c403830  RFLAGS: 00000097
    RAX: 000000000000c50f  RBX: ffff88107374f6e0  RCX: ffff880eecd77a20
    RDX: 000000000000c50e  RSI: 000000000000006d  RDI: ffff88107374f728
    RBP: ffff88089c403830   R8: 0000000000000000   R9: ffff880eecd77a68
    R10: 0000000000000000  R11: 0000000000000000  R12: 000000000000006d
    R13: ffff88107374f728  R14: 000000000000006b  R15: ffff8809f2bc5bc0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
#13 [ffff88089c403830] _spin_lock at ffffffff8153b801
#14 [ffff88089c403838] tlb_choose_channel at ffffffffa0392ec5 [bonding]
#15 [ffff88089c403868] bond_alb_xmit at ffffffffa039409e [bonding]
#16 [ffff88089c4038b8] bond_start_xmit at ffffffffa038a48b [bonding]
#17 [ffff88089c4038f8] netpoll_send_skb_on_dev at ffffffff81484491
#18 [ffff88089c403958] netpoll_send_udp at ffffffff81484774
#19 [ffff88089c4039a8] write_msg at ffffffffa00a131b [netconsole]
#20 [ffff88089c403a08] __call_console_drivers at ffffffff81077625
#21 [ffff88089c403a38] _call_console_drivers at ffffffff8107768a
#22 [ffff88089c403a58] release_console_sem at ffffffff81077cd8
#23 [ffff88089c403a98] vprintk at ffffffff810783d8
#24 [ffff88089c403b38] printk at ffffffff81537d5d
#25 [ffff88089c403b98] __netdev_printk at ffffffff81467c11
#26 [ffff88089c403ba8] netdev_err at ffffffff81467de3
#27 [ffff88089c403c18] bond_alb_xmit at ffffffffa03942f5 [bonding]
#28 [ffff88089c403c68] bond_start_xmit at ffffffffa038a48b [bonding]
#29 [ffff88089c403ca8] dev_hard_start_xmit at ffffffff8146fa54
#30 [ffff88089c403d08] dev_queue_xmit at ffffffff8146fefd
#31 [ffff88089c403d48] arp_xmit at ffffffff814d19b8
#32 [ffff88089c403d78] arp_send at ffffffff814d1ef3
#33 [ffff88089c403d98] arp_solicit at ffffffff814d201f
#34 [ffff88089c403e08] neigh_timer_handler at ffffffff814796f8
#35 [ffff88089c403e48] run_timer_softirq at ffffffff8108a4d7
#36 [ffff88089c403ed8] __do_softirq at ffffffff8107ffd1
#37 [ffff88089c403f48] call_softirq at ffffffff8100c38c
#38 [ffff88089c403f60] do_softirq at ffffffff8100fbd5
#39 [ffff88089c403f80] irq_exit at ffffffff8107fe85
#40 [ffff88089c403f90] smp_apic_timer_interrupt at ffffffff815425aa
#41 [ffff88089c403fb0] apic_timer_interrupt at ffffffff8100bc13
--- <IRQ stack> ---

Environment

  • Red Hat Enterprise Linux 6.7
  • Bonding in balance-tlb (Mode 5) or balance-alb (Mode 6)
  • netconsole service sending log messages over bond
  • Bonding slave removed or gone down at any point in the past

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.