RHEL7: soft lockup occurs while a thread group leader is waiting on tasklist_waiters
Issue
soft lockup
occurs while a thread group leader is waiting ontasklist_waiters
inmm_update_next_owner()
where a huge number of the thread group members are exiting and trying to take thetasklist_lock
.
crash> log | grep lockup
[225225.514521] NMI watchdog: BUG: soft lockup - CPU#19 stuck for 22s! [hdbdpserver:12207]
[225225.850485] Kernel panic - not syncing: softlockup: hung tasks
crash> bt
PID: 12207 TASK: ffff9ba5a7a92080 CPU: 19 COMMAND: "hdbdpserver"
...
--- <IRQ stack> ---
#9 [ffff9ba571a83bd8] apic_timer_interrupt at ffffffff82d77df2
[exception RIP: tasklist_read_lock+34]
RIP: ffffffff82694e22 RSP: ffff9ba571a83c88 RFLAGS: 00000202
RAX: 000000000000006e RBX: ffff9ba5a7a92718 RCX: ffff9ba571a83fd8
RDX: 0000000000000001 RSI: ffff9ba5a7a92080 RDI: ffff9bbe99dc8c80
RBP: ffff9ba571a83c88 R8: ffff9ba571a80000 R9: 0000000000000000
R10: 000000000000b718 R11: 0000000000000001 R12: ffff9ba69cec0000
R13: ffffffff8262a621 R14: ffff9ba571a83c38 R15: ffff9ba571a83c00
ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018
#10 [ffff9ba571a83c90] mm_update_next_owner at ffffffff8269ea11
#11 [ffff9ba571a83ce0] do_exit at ffffffff8269ee5d
#12 [ffff9ba571a83d78] do_group_exit at ffffffff8269f69f
#13 [ffff9ba571a83da8] get_signal_to_deliver at ffffffff826b049e
#14 [ffff9ba571a83e40] do_signal at ffffffff8262b527
#15 [ffff9ba571a83f30] do_notify_resume at ffffffff8262bc32
...
- It appears that the issue tends to occur when a large number of threads that belong to the same one thread group are exitting at once.
Environment
- Red Hat Enterprise Linux 7
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.