RHEL5: System crashed after "VFS: Busy inodes after unmount of 0:1c. Self-destruct in 5 seconds. Have a nice day..." message.
Environment
- Red Hat Enterprise Linux 5
- kernel-2.6.18-164.10.1.el5
Issue
- NMI watchdog detected LOCKUP on CPU 1 and crashed the machine after "VFS: Busy inodes.." message:
VFS: Busy inodes after unmount of 0:1c. Self-destruct in 5 seconds. Have a nice day...
NMI Watchdog detected LOCKUP on CPU 1
..
Resolution
-
Upgrade to one of these releases to get the fix:
- kernel-2.6.18-194.8.1.el5 released with RHEL 5.5 errata.
- kernel-2.6.18-238.el5 (or later) released with RHEL 5.6 errata.
-
Issue is described in Bug 582321.
-
Workaround is to avoid unnecessarily unmounting NFS filesystems to reduce the chances of this bug occurring.
Root Cause
When nfs_lookup_revalidate detects that a dentry is no longer valid, it will generally try to d_drop() it. This is problematic if the dentry has the DCACHE_DISCONNECTED flag set, which is commonly the case with root dentries.
Instead of unhashing it from the dcache hashtable, it unhashes it from sb->s_anon. If the dentry is the root of a subtree that was merged with a "bigger" tree, then it will become unreachable by shrink_dcache_for_umount() which leads to a "Busy inodes after umount" situation.
Eventually however, prune_dcache() will be called due to memory pressure at which point the dentry will point to a now-freed superblock and it will cause the system to panic.
Diagnostic Steps
- Looking at the vmcore:
CPUS: 24
DATE: Sun Aug 4 00:50:08 2013
UPTIME: 21 days, 06:16:31
LOAD AVERAGE: 50.60, 16.97, 10.06
TASKS: 941
RELEASE: 2.6.18-164.10.1.el5
VERSION: #1 SMP Wed Dec 30 18:35:28 EST 2009
MACHINE: x86_64 (2932 Mhz)
MEMORY: 47.3 GB
PANIC: ""
From the kernel log:
VFS: Busy inodes after unmount of 0:1c. Self-destruct in 5 seconds. Have a nice day...
NMI Watchdog detected LOCKUP on CPU 1
CPU 1
Modules linked in: nfs fscache nfs_acl gab(PU) llt(PU) autofs4 ipmi_devintf ipmi_si ipmi_msghandler dmpaa(PU) vxspec(PFU) vxio(PFU) vxdmp(PU) lockd sunrpc bonding ipv6 xfrm_nalgo crypto_api vxportal(PFU) fdd(PFU) vxfs(PU) dm_mirror dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev sr_mod cdrom sg hpilo serio_raw bnx2 pcspkr e1000e dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache qla2xxx scsi_transport_fc ata_piix libata shpchp cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 1028, comm: kswapd1 Tainted: PF 2.6.18-164.10.1.el5 #1
RIP: 0010:[<ffffffff80064bd8>] [<ffffffff80064bd8>] .text.lock.spinlock+0xe/0x30
RSP: 0018:ffff810c1f7c1d38 EFLAGS: 00000046
RAX: 0000000000000282 RBX: ffff81061de0f070 RCX: 0000000000000001
RDX: ffffffff803114e0 RSI: 0000000000000286 RDI: ffff81061de0f074
RBP: ffff81061de0f074 R08: ffff810280899150 R09: ffff81070f88c4b0
R10: ffff810bfb4b9150 R11: ffffffff80039bb5 R12: 0000000000000000
R13: 0000000000000078 R14: 0000000000000000 R15: 0000000000000680
FS: 0000000000000000(0000) GS:ffff810620046a40(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000003fcd62fff0 CR3: 000000064071f000 CR4: 00000000000006e0
Process kswapd1 (pid: 1028, threadinfo ffff810c1f7c0000, task ffff81061fb967e0)
Stack: ffffffff8000b9dc ffff81070f88c4b0 ffff81061de0f070 ffff81061e3e8d20
ffffffff8002e93f ffff810c1fffcc00 0000000000005078 0000000000000081
00000000000000d0 0000000000000000 ffffffff800ea46f 00000000000001a0
Call Trace:
[<ffffffff8000b9dc>] __down_read_trylock+0x15/0x44
[<ffffffff8002e93f>] prune_dcache+0xde/0x132
[<ffffffff800ea46f>] shrink_dcache_memory+0x19/0x33
[<ffffffff8003f464>] shrink_slab+0xdc/0x153
[<ffffffff80057e22>] kswapd+0x343/0x46c
[<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e
[<ffffffff80057adf>] kswapd+0x0/0x46c
[<ffffffff8009fbb7>] keventd_create_kthread+0x0/0xc4
[<ffffffff8003298c>] kthread+0xfe/0x132
[<ffffffff8009ca4c>] request_module+0x0/0x14d
[<ffffffff8005dfb1>] child_rip+0xa/0x11
[<ffffffff8009fbb7>] keventd_create_kthread+0x0/0xc4
[<ffffffff8003288e>] kthread+0x0/0x132
[<ffffffff8005dfa7>] child_rip+0x0/0x11
- This system has panicked because it detected that CPU1 was taking too long to acquire a spinlock. The reason for this is because of this message just before the panic:
VFS: Busy inodes after unmount of 0:1c. Self-destruct in 5 seconds. Have a nice day...
- What this indicates is that for some reason that's not clear just yet, some filesystem objects (inodes and dentries) had references to them while the filesystem was being unmounted. This prevented those objects from being freed and they remained allocated in memory. One of those objects was a dentry that is on the dentry_unused list. This dentry has a reference to an inode and it was that inode that caused the busy inode message. The unmount operation completes and frees up all the other data structures belonging to this filesystem including the superblock. The lingering dentry on the dentry_unused list has a pointer to the now freed superblock which it is trying to access in the call stack above. The code is trying to acquire a spinlock in the superblock (inside the semaphore) and the spinlock appears to be currently held. The code does not know that superblock has been freed and the spinlock state will never change so the process just spins on the lock until the NMI watchdog detects the hang.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
