Soft lockups during unmount when dentry cache is very large

Solution Verified - Updated -

Issue

  • Seeing soft lockups during unmount of a filesystem on an OpenShift node with a large amount of system memory and millions of objects in dentry cache.
  • CPU softlockup in shrink_dcache_for_umount():

    crash> bt
    PID: 112102  TASK: ffff885e89e21fa0  CPU: 14  COMMAND: "test"
     #0 [ffff881fff9c3cf8] machine_kexec at ffffffff8105c4cb
     #1 [ffff881fff9c3d58] __crash_kexec at ffffffff81104a32
     #2 [ffff881fff9c3e28] panic at ffffffff8169dc5f
     #3 [ffff881fff9c3ea8] watchdog_timer_fn at ffffffff8112f651
     #4 [ffff881fff9c3ee0] __hrtimer_run_queues at ffffffff810b4ae4
     #5 [ffff881fff9c3f38] hrtimer_interrupt at ffffffff810b507f
     #6 [ffff881fff9c3f80] local_apic_timer_interrupt at ffffffff81053895
     #7 [ffff881fff9c3f98] smp_apic_timer_interrupt at ffffffff816b76bd
     #8 [ffff881fff9c3fb0] apic_timer_interrupt at ffffffff816b5c1d
    --- <IRQ stack> ---
     #9 [ffff883e1c6afd58] apic_timer_interrupt at ffffffff816b5c1d
        [exception RIP: __d_shrink+89]
        RIP: ffffffff81218479  RSP: ffff883e1c6afe00  RFLAGS: 00000246
        RAX: ffffc9000d77bff0  RBX: ffff881817867dc0  RCX: ffff883623548848
        RDX: ffff8834c7ab3808  RSI: ffff883284af3748  RDI: ffff8834c7ab3800
        RBP: ffff883e1c6afe00   R8: ffff8834c7ab3880   R9: ffff880153030fd0
        R10: 0000000000000000  R11: 0000000000000400  R12: 0000000000000000
        R13: 0000000000000400  R14: 0000000000010260  R15: ffff883e1c6afde0
        ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018
    #10 [ffff883e1c6afe08] shrink_dcache_for_umount_subtree at ffffffff81218b78
    #11 [ffff883e1c6afe30] shrink_dcache_for_umount at ffffffff8121aaff
    #12 [ffff883e1c6afe48] generic_shutdown_super at ffffffff812036c1
    #13 [ffff883e1c6afe70] kill_block_super at ffffffff81203b57
    #14 [ffff883e1c6afe90] deactivate_locked_super at ffffffff81203e99
    #15 [ffff883e1c6afeb0] deactivate_super at ffffffff81204606
    #16 [ffff883e1c6afec8] cleanup_mnt at ffffffff812216af
    #17 [ffff883e1c6afee0] __cleanup_mnt at ffffffff81221742
    #18 [ffff883e1c6afef0] task_work_run at ffffffff810ad247
    #19 [ffff883e1c6aff30] do_notify_resume at ffffffff8102ab62
    #20 [ffff883e1c6aff50] int_signal at ffffffff816b527d
    
  • Unmounting XFS filesystem after creating 50 million files & 700k directories causes a kernel panic:

    Jun 12 05:30:29 example kernel: BUG: soft lockup - CPU#8 stuck for 22s! [migration/8:435]
    

Environment

  • Red Hat Enterprise Linux (RHEL) 7.0, 7.1, 7.2, 7.3, 7.4
  • OpenShift Container Platform (OCP) 3.4, 3.5, 3.6

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content