System hangs while coming down...

Latest response

I have 2 Oracle boxes that stall out while coming down for a reboot. Both are running RHEL 6.8 on IBM x3690 servers. This is the error from the messages log...

kernel: ------------[ cut here ]------------
Jun 20 00:04:10 kernel: WARNING: at fs/namespace.c:679 mntput_no_expire+0x109/0x110() (Tainted: P -- ------------ )
Jun 20 00:04:10 kernel: Hardware name: System x3690 X5 -[7147AC1]-
Jun 20 00:04:10 rhedcls104 kernel: Modules linked in: oracleacfs(P)(U) oracleadvm(P)(U) oracleoks(P)(U) lshook(U) oracleasm nfs lockd fscache auth_rpcgss nfs_acl bnx2fc cnic uio fcoe libfcoe libfc 8021q garp stp llc sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf bonding ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables vfat fat dm_round_robin dm_multipath vhost_net macvtap macvlan tun kvm_intel kvm microcode iTCO_wdt iTCO_vendor_support ipmi_devintf joydev cdc_ether usbnet mii serio_raw i2c_i801 i2c_core lpc_ich mfd_core shpchp bnx2 ioatdma dca i7core_edac edac_core ipmi_si ipmi_msghandler sg ext4 jbd2 mbcache sr_mod cdrom sd_mod pata_acpi ata_generic ata_piix be2net lpfc scsi_transport_fc scsi_tgt crc_t10dif megaraid_sas wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: linuxshield]
Jun 20 00:04:10 kernel: Pid: 25093, comm: umount.nfs Tainted: P -- ------------ 2.6.32-696.3.1.el6.x86_64 #1
Jun 20 00:04:10 kernel: Call Trace:
Jun 20 00:04:10 kernel: [] ? warn_slowpath_common+0x91/0xe0
Jun 20 00:04:10 kernel: [] ? warn_slowpath_null+0x1a/0x20
Jun 20 00:04:10 kernel: [] ? mntput_no_expire+0x109/0x110
Jun 20 00:04:10 kernel: [] ? sys_umount+0x7b/0x3a0
Jun 20 00:04:10 kernel: [] ? system_call_fastpath+0x16/0x1b
Jun 20 00:04:10 kernel: ---[ end trace 59f348172cdfec6d ]---

Responses

I think it is better if you open a case with Red Hat team to get quick help, because community may not help you precisely. At first look, as per error message, I assume that it is unable to un-mount an nfs share. You would need to take a look at your message file and check if there are any "hung time out tasks", also check if you could unmount an nfs share manually and shutdown nfs service. After which you could take a reboot and check.

The call trace seems to indicate the hang has happened when unmounting a filesystem. The list of modules indicates you have Oracle ASM and ACFS running. ACFS is a cluster filesystem, so any ACFS operations may need network connectivity to its cluster companion(s), and its use suggests that Oracle Clusterware may also be present.

On the other hand, the hanging process seems to be running the "unmount.nfs" command, so it's apparently trying to unmount a NFS filesystem.

Have you verified that the NFS server this system is using is accessible, and that the NFS filesystems are umounted before network connections are shut down?

Normally the shutdown order should be correct by default, but if the NFS mount has been established e.g. using a local Clusterware resource IP address as a source IP address, the system might not be capable to complete the unmounting in the normal way if ASM and Clusterware have already been stopped.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.