RHEL7: soft lockup occurs while a thread group leader is waiting on tasklist_waiters

Solution Verified - Updated -

Issue

  • soft lockup occurs while a thread group leader is waiting on tasklist_waiters in mm_update_next_owner() where a huge number of the thread group members are exiting and trying to take the tasklist_lock.
crash> log | grep lockup
[225225.514521] NMI watchdog: BUG: soft lockup - CPU#19 stuck for 22s! [hdbdpserver:12207]
[225225.850485] Kernel panic - not syncing: softlockup: hung tasks

crash> bt
PID: 12207  TASK: ffff9ba5a7a92080  CPU: 19  COMMAND: "hdbdpserver"
    ...
--- <IRQ stack> ---
 #9 [ffff9ba571a83bd8] apic_timer_interrupt at ffffffff82d77df2
    [exception RIP: tasklist_read_lock+34]
    RIP: ffffffff82694e22  RSP: ffff9ba571a83c88  RFLAGS: 00000202
    RAX: 000000000000006e  RBX: ffff9ba5a7a92718  RCX: ffff9ba571a83fd8
    RDX: 0000000000000001  RSI: ffff9ba5a7a92080  RDI: ffff9bbe99dc8c80
    RBP: ffff9ba571a83c88   R8: ffff9ba571a80000   R9: 0000000000000000
    R10: 000000000000b718  R11: 0000000000000001  R12: ffff9ba69cec0000
    R13: ffffffff8262a621  R14: ffff9ba571a83c38  R15: ffff9ba571a83c00
    ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018
#10 [ffff9ba571a83c90] mm_update_next_owner at ffffffff8269ea11
#11 [ffff9ba571a83ce0] do_exit at ffffffff8269ee5d
#12 [ffff9ba571a83d78] do_group_exit at ffffffff8269f69f
#13 [ffff9ba571a83da8] get_signal_to_deliver at ffffffff826b049e
#14 [ffff9ba571a83e40] do_signal at ffffffff8262b527
#15 [ffff9ba571a83f30] do_notify_resume at ffffffff8262bc32
    ...
  • It appears that the issue tends to occur when a large number of threads that belong to the same one thread group are exitting at once.

Environment

  • Red Hat Enterprise Linux 7

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content