Soft lockup in 'bonding' module

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 5
  • kernel-2.6.18-92.el5
    • older than 2.6.18-128.el5
  • network interface bonding enabled with bonding module

Issue

Server hung with soft lockup in bonding module message.

Resolution

This bug was fixed in kernel version 2.6.18-111.el5 and later released in RHEL 5.3 GA kernel 2.6.18-128.el5.

Root Cause

The code got stuck in IRQ context waiting on spinlock which is already held by this CPU so it cannot be dropped. Hence it is deadlocked.
The code path was like this:

static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
{
        rx_states_t last_state;

        // Lock to prevent 2 instances of this function to run simultaneously(rx interrupt and periodic machine callback)
        __get_rx_machine_lock(port);
stuck here ^^^^^^^^^^^^^^^^^^^^^^^^
...

  Which is call to spin_lock():

static inline void __get_rx_machine_lock(struct port *port)
{
        spin_lock(&(SLAVE_AD_INFO(port->slave).rx_machine_lock));
}

Diagnostic Steps

Kernel log, dmesg or messages contain call trace like this:

NETDEV WATCHDOG: eth2: transmit timed out
BUG: soft lockup - CPU#3 stuck for 10s! [bond1:6931]
CPU 3:
Modules linked in: hangcheck_timer nfsd exportfs auth_rpcgss ipv6 xfrm_nalgo crypto_api autofs4 hidp nfs lockd fscache nfs_acl rfcomm l2cap bluetooth sunrpc bonding dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev sr_mod cdrom sg e1000e shpchp pcspkr bnx2 serio_raw ata_piix libata cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 6931, comm: bond1 Not tainted 2.6.18-92.el5 #1
RIP: 0010:[<ffffffff80064b54>]  [<ffffffff80064b54>] .text.lock.spinlock+0x2/0x30
RSP: 0018:ffff81013f7bbd08  EFLAGS: 00000286
RAX: 0000000000000001 RBX: ffff81121dee6080 RCX: 0000000000000002
RDX: ffff81121dee6000 RSI: ffff81121dee6080 RDI: ffff81121dee6168
RBP: ffff81013f7bbc80 R08: 0000000000000000 R09: ffff810001000270
R10: ffff8112193287c0 R11: 00000000000000c8 R12: ffffffff8005dc8e
R13: ffff810276810030 R14: ffffffff80076f1d R15: ffff81013f7bbc80
FS:  0000000000000000(0000) GS:ffff81121ff236c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000364b8a00e CR3: 00000012117b0000 CR4: 00000000000006e0

Call Trace:
 <IRQ>  [<ffffffff882b6477>] :bonding:ad_rx_machine+0x20/0x502
 [<ffffffff882b6aa2>] :bonding:bond_3ad_lacpdu_recv+0xc1/0x1fc
 [<ffffffff8005be70>] cache_alloc_refill+0x106/0x186
 [<ffffffff80020183>] netif_receive_skb+0x330/0x3ae
 [<ffffffff8811c4cc>] :bnx2:bnx2_poll+0xb51/0xd16
 [<ffffffff882b6e76>] :bonding:bond_3ad_state_machine_handler+0x0/0x84a
 [<ffffffff8000c54c>] net_rx_action+0xa4/0x1a4
 [<ffffffff80011ed2>] __do_softirq+0x5e/0xd6
 [<ffffffff80154a28>] end_msi_irq_wo_maskbit+0x9/0x16
 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006c571>] do_softirq+0x2c/0x85
 [<ffffffff8006c3f9>] do_IRQ+0xec/0xf5
 [<ffffffff8005d615>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff800649e0>] _spin_lock+0x3/0xa
 [<ffffffff882b6477>] :bonding:ad_rx_machine+0x20/0x502
 [<ffffffff882b6f4a>] :bonding:bond_3ad_state_machine_handler+0xd4/0x84a
 [<ffffffff8004cea9>] run_workqueue+0x94/0xe4
 [<ffffffff800497be>] worker_thread+0x0/0x122
 [<ffffffff8009dbca>] keventd_create_kthread+0x0/0xc4
 [<ffffffff800498ae>] worker_thread+0xf0/0x122
 [<ffffffff8008ac03>] default_wake_function+0x0/0xe
 [<ffffffff8009dbca>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8003253d>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff8009dbca>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8003243f>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.