RHEL 5.6 servers hang when rebooting on "Please stand by while rebooting the system"

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 5.6

Issue

  • Since upgrading from RHEL 5.5 to 5.6, we've noticed that roughly 10% of the servers don't reboot cleanly. When we investigate, the console shows "Please stand by while rebooting the system".
Unmouting pipe file systems:                    [ OK ]
Unmouting file systems:                         [ OK ]
Please stand by while rebooting the system...        

Resolution

  • In some NFS-heavy environments, it may be necessary to implement a different NFS umount function in the shutdown sequence.
  • The fix for this issue has been released via the errata : RHBA-2011-1081.
  • Update the initscripts package to version 8.45.38-2.el5 or later.

Root Cause

  • Because of a fix implemented to resolve a possible oops when truncating an open file on an NFS mount during shutdown, the nfs_wait_on_request() initscript function was modified to use an uninterruptible sleep. Since it now uses an uninterruptible sleep, the shutdown scripts are no longer able to terminate them with signals.
  • For more information see : BZ 676851

Diagnostic Steps

  • Here is an example of what may be printed on the terminal of a hung machine.

    INFO: task kexec:6480 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    kexec         D ffff810009004420     0  6480   5564                     (NOTLB)
     ffff81042c037ca8 0000000000000086 0000000000000002 0000000000000000
     ffff81010e6de148 0000000000000001 ffff81042bc49820 ffffffff80310b60
     00000035f10b0dbf 00000000000116b1 ffff81042bc49a08 0000000000000000
    Call Trace:
     [<ffffffff8006ec4e>] do_gettimeofday+0x40/0x90
     [<ffffffff885c5941>] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd
     [<ffffffff800637ca>] io_schedule+0x3f/0x67
     [<ffffffff885c594a>] :nfs:nfs_wait_bit_uninterruptible+0x9/0xd
     [<ffffffff800639f6>] __wait_on_bit+0x40/0x6e
     [<ffffffff885c5941>] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd
     [<ffffffff80063a90>] out_of_line_wait_on_bit+0x6c/0x78
     [<ffffffff800a28e2>] wake_bit_function+0x0/0x23
     [<ffffffff885c930a>] :nfs:nfs_wait_on_requests_locked+0x70/0xca
     [<ffffffff885c969e>] :nfs:nfs_flush_inode+0x3c/0x6f
     [<ffffffff885cadc5>] :nfs:nfs_writepages+0xe6/0x13e
     [<ffffffff8005ac54>] do_writepages+0x20/0x2f
     [<ffffffff8002fcd4>] __writeback_single_inode+0x19e/0x318
     [<ffffffff8004a225>] wait_on_page_writeback_range+0xd6/0x12e
     [<ffffffff80020ff8>] sync_sb_inodes+0x1b5/0x26f
     [<ffffffff800f588b>] sync_inodes_sb+0x99/0xa9
     [<ffffffff800f58f8>] __sync_inodes+0x5d/0xaa
     [<ffffffff800e3474>] do_sync+0x36/0x5a
     [<ffffffff800e34a6>] sys_sync+0xe/0x12
     [<ffffffff8005d28d>] tracesys+0xd5/0xe0
    
    SysRq : Trigger a crashdump
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments