Soft lockups during unmount when dentry cache is very large
Issue
- Seeing soft lockups during unmount of a filesystem on an OpenShift node with a large amount of system memory and millions of objects in dentry cache.
-
CPU softlockup in shrink_dcache_for_umount():
crash> bt PID: 112102 TASK: ffff885e89e21fa0 CPU: 14 COMMAND: "test" #0 [ffff881fff9c3cf8] machine_kexec at ffffffff8105c4cb #1 [ffff881fff9c3d58] __crash_kexec at ffffffff81104a32 #2 [ffff881fff9c3e28] panic at ffffffff8169dc5f #3 [ffff881fff9c3ea8] watchdog_timer_fn at ffffffff8112f651 #4 [ffff881fff9c3ee0] __hrtimer_run_queues at ffffffff810b4ae4 #5 [ffff881fff9c3f38] hrtimer_interrupt at ffffffff810b507f #6 [ffff881fff9c3f80] local_apic_timer_interrupt at ffffffff81053895 #7 [ffff881fff9c3f98] smp_apic_timer_interrupt at ffffffff816b76bd #8 [ffff881fff9c3fb0] apic_timer_interrupt at ffffffff816b5c1d --- <IRQ stack> --- #9 [ffff883e1c6afd58] apic_timer_interrupt at ffffffff816b5c1d [exception RIP: __d_shrink+89] RIP: ffffffff81218479 RSP: ffff883e1c6afe00 RFLAGS: 00000246 RAX: ffffc9000d77bff0 RBX: ffff881817867dc0 RCX: ffff883623548848 RDX: ffff8834c7ab3808 RSI: ffff883284af3748 RDI: ffff8834c7ab3800 RBP: ffff883e1c6afe00 R8: ffff8834c7ab3880 R9: ffff880153030fd0 R10: 0000000000000000 R11: 0000000000000400 R12: 0000000000000000 R13: 0000000000000400 R14: 0000000000010260 R15: ffff883e1c6afde0 ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018 #10 [ffff883e1c6afe08] shrink_dcache_for_umount_subtree at ffffffff81218b78 #11 [ffff883e1c6afe30] shrink_dcache_for_umount at ffffffff8121aaff #12 [ffff883e1c6afe48] generic_shutdown_super at ffffffff812036c1 #13 [ffff883e1c6afe70] kill_block_super at ffffffff81203b57 #14 [ffff883e1c6afe90] deactivate_locked_super at ffffffff81203e99 #15 [ffff883e1c6afeb0] deactivate_super at ffffffff81204606 #16 [ffff883e1c6afec8] cleanup_mnt at ffffffff812216af #17 [ffff883e1c6afee0] __cleanup_mnt at ffffffff81221742 #18 [ffff883e1c6afef0] task_work_run at ffffffff810ad247 #19 [ffff883e1c6aff30] do_notify_resume at ffffffff8102ab62 #20 [ffff883e1c6aff50] int_signal at ffffffff816b527d -
Unmounting XFS filesystem after creating 50 million files & 700k directories causes a kernel panic:
Jun 12 05:30:29 example kernel: BUG: soft lockup - CPU#8 stuck for 22s! [migration/8:435]
Environment
- Red Hat Enterprise Linux (RHEL) 7.0, 7.1, 7.2, 7.3, 7.4
- OpenShift Container Platform (OCP) 3.4, 3.5, 3.6
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
