Soft lockups during unmount when dentry cache is very large
Issue
- Seeing soft lockups during unmount of a filesystem on an OpenShift node with a large amount of system memory and millions of objects in dentry cache.
-
CPU softlockup in shrink_dcache_for_umount():
crash> bt PID: 112102 TASK: ffff885e89e21fa0 CPU: 14 COMMAND: "test" #0 [ffff881fff9c3cf8] machine_kexec at ffffffff8105c4cb #1 [ffff881fff9c3d58] __crash_kexec at ffffffff81104a32 #2 [ffff881fff9c3e28] panic at ffffffff8169dc5f #3 [ffff881fff9c3ea8] watchdog_timer_fn at ffffffff8112f651 #4 [ffff881fff9c3ee0] __hrtimer_run_queues at ffffffff810b4ae4 #5 [ffff881fff9c3f38] hrtimer_interrupt at ffffffff810b507f #6 [ffff881fff9c3f80] local_apic_timer_interrupt at ffffffff81053895 #7 [ffff881fff9c3f98] smp_apic_timer_interrupt at ffffffff816b76bd #8 [ffff881fff9c3fb0] apic_timer_interrupt at ffffffff816b5c1d --- <IRQ stack> --- #9 [ffff883e1c6afd58] apic_timer_interrupt at ffffffff816b5c1d [exception RIP: __d_shrink+89] RIP: ffffffff81218479 RSP: ffff883e1c6afe00 RFLAGS: 00000246 RAX: ffffc9000d77bff0 RBX: ffff881817867dc0 RCX: ffff883623548848 RDX: ffff8834c7ab3808 RSI: ffff883284af3748 RDI: ffff8834c7ab3800 RBP: ffff883e1c6afe00 R8: ffff8834c7ab3880 R9: ffff880153030fd0 R10: 0000000000000000 R11: 0000000000000400 R12: 0000000000000000 R13: 0000000000000400 R14: 0000000000010260 R15: ffff883e1c6afde0 ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018 #10 [ffff883e1c6afe08] shrink_dcache_for_umount_subtree at ffffffff81218b78 #11 [ffff883e1c6afe30] shrink_dcache_for_umount at ffffffff8121aaff #12 [ffff883e1c6afe48] generic_shutdown_super at ffffffff812036c1 #13 [ffff883e1c6afe70] kill_block_super at ffffffff81203b57 #14 [ffff883e1c6afe90] deactivate_locked_super at ffffffff81203e99 #15 [ffff883e1c6afeb0] deactivate_super at ffffffff81204606 #16 [ffff883e1c6afec8] cleanup_mnt at ffffffff812216af #17 [ffff883e1c6afee0] __cleanup_mnt at ffffffff81221742 #18 [ffff883e1c6afef0] task_work_run at ffffffff810ad247 #19 [ffff883e1c6aff30] do_notify_resume at ffffffff8102ab62 #20 [ffff883e1c6aff50] int_signal at ffffffff816b527d
-
Unmounting XFS filesystem after creating 50 million files & 700k directories causes a kernel panic:
Jun 12 05:30:29 example kernel: BUG: soft lockup - CPU#8 stuck for 22s! [migration/8:435]
Environment
- Red Hat Enterprise Linux (RHEL) 7.0, 7.1, 7.2, 7.3, 7.4
- OpenShift Container Platform (OCP) 3.4, 3.5, 3.6
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.