RHEL5: System hung with many tasks blocking on "synchronize_rcu"

Solution Unverified - Updated -

Issue

  • System hung with many tasks blocking on "synchronize_rcu"

    crash> sys
          KERNEL: /cores/20110808233212/work/vmlinux
        DUMPFILE: /cores/20110808233212/work/vmcore  [PARTIAL DUMP]
            CPUS: 8
            DATE: Sun Aug  7 18:12:04 2011
          UPTIME: 08:37:32
    LOAD AVERAGE: 99.02, 97.96, 90.16          <------- High load
           TASKS: 371
        NODENAME: hostname
         RELEASE: 2.6.18-194.11.4.el5
         VERSION: #1 SMP Fri Sep 17 04:57:05 EDT 2010
         MACHINE: x86_64  (2500 Mhz)
          MEMORY: 31.5 GB
           PANIC: ""
    
  • There are many crond tasks blocked with the following call trace :

    [...]
    INFO: task crond:31781 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    crond         D ffff81000900caa0     0 31781   3991         31782 31780 (NOTLB)
     ffff8107b2f4be58 0000000000000086 ff610134cd92d920 a930c1e2292cdfc7
     170fc99bb453d21d 0000000000000005 ffff8107b338c820 ffff81082ff18100
     0000159af7f5461e 00000000003e2472 ffff8107b338ca08 00000001801a6bea
    Call Trace:
     [<ffffffff80063167>] wait_for_completion+0x79/0xa2
     [<ffffffff8008cf9d>] default_wake_function+0x0/0xe
     [<ffffffff80123954>] __key_instantiate_and_link+0x8f/0xc5
     [<ffffffff8009ed3d>] synchronize_rcu+0x30/0x36
     [<ffffffff8009e879>] wakeme_after_rcu+0x0/0x9
     [<ffffffff801262f0>] install_session_keyring+0xc0/0xd3
     [<ffffffff80003138>] level3_kernel_pgt+0x138/0x1000
     [<ffffffff8012681e>] join_session_keyring+0x25/0xcb
     [<ffffffff80125cdb>] keyctl_join_session_keyring+0x2d/0x40
     [<ffffffff8005d116>] system_call+0x7e/0x83
    
    INFO: task crond:31782 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    crond         D ffff81000901d7a0     0 31782   3991               31781 (NOTLB)
     ffff8107b2e99e58 0000000000000086 b453d21da930c1e2 0e43583d170fc99b
     73137ef065dad4fe 0000000000000005 ffff8107b338c0c0 ffff81082fe1b100
     0000159af7f95dfe 000000000044a540 ffff8107b338c2a8 00000003801a6bea
    Call Trace:
     [<ffffffff80063167>] wait_for_completion+0x79/0xa2
     [<ffffffff8008cf9d>] default_wake_function+0x0/0xe
     [<ffffffff80123954>] __key_instantiate_and_link+0x8f/0xc5
     [<ffffffff8009ed3d>] synchronize_rcu+0x30/0x36
     [<ffffffff8009e879>] wakeme_after_rcu+0x0/0x9
     [<ffffffff801262f0>] install_session_keyring+0xc0/0xd3
     [<ffffffff80003238>] level3_kernel_pgt+0x238/0x1000
     [<ffffffff8012681e>] join_session_keyring+0x25/0xcb
     [<ffffffff80125cdb>] keyctl_join_session_keyring+0x2d/0x40
     [<ffffffff8005d116>] system_call+0x7e/0x83
    [...]
    
  • Panic occurred due to NMI :

    NMI Watchdog detected LOCKUP on CPU 6
    CPU 6 
    Modules linked in: mptctl sg ipmi_devintf ipmi_si ipmi_msghandler autofs4 lockd sunrpc bonding ipv6 xfrm_nalgo crypto_api dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport tg3 bnx2 shpchp pcspkr i5000_edac hpilo serio_raw edac_mc dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod mptspi mptscsih scsi_transport_spi mptbase cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
    Pid: 0, comm: swapper Not tainted 2.6.18-194.11.4.el5 #1
    RIP: 0010:[<ffffffff80057082>]  [<ffffffff80057082>] mwait_idle+0x36/0x4a
    RSP: 0018:ffff81082fef5ef0  EFLAGS: 00000246
    RAX: 0000000000000000 RBX: ffffffff8005704c RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff8030a718
    RBP: 0000000000000006 R08: ffff81082fef4000 R09: 000000000000003a
    R10: ffff81011cb74038 R11: ffff8107e15cfb58 R12: 00000000000000ff
    R13: ffffffff803d2580 R14: 0000000000000600 R15: ffffffff803f4320
    FS:  0000000000000000(0000) GS:ffff81082feabb40(0000) knlGS:0000000000000000
    CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
    CR2: 0000000018df49fc CR3: 00000007e4584000 CR4: 00000000000006e0
    Process swapper (pid: 0, threadinfo ffff81082fef4000, task ffff81082feaf080)
    Stack:  ffffffff8004923a 00000000000000c0 ffffffff8007796b ffffffff803f2340
     0000000000000000 0000000000000000 0000000000000000 0000000000000000
     0000000000000000 0000000000000000 0000000000000000 0000000000000000
    Call Trace:
     [<ffffffff8004923a>] cpu_idle+0x95/0xb8
     [<ffffffff8007796b>] start_secondary+0x498/0x4a7
    
    
    Code: 65 48 8b 04 25 10 00 00 00 8b 80 38 e0 ff ff a8 08 74 ba c3 
    [....]
    
  • Memory usage is normal, and log of crond tasks in UN state :

    crash> ps | grep UN
        684   3991   7  ffff8107b0337040  UN   0.0   88296   3052  crond
        685   3991   3  ffff8107b071b7e0  UN   0.0   88296   3052  crond
        686   3991   1  ffff8107b06c2820  UN   0.0   88296   3052  crond
        687   3991   0  ffff8107b0112860  UN   0.0   88296   3052  crond
        688   3991   4  ffff8107afd1f7a0  UN   0.0   88296   3052  crond
       1355   3991   0  ffff8107aefe9080  UN   0.0   88296   3052  crond
       1488   3991   4  ffff8107ae96c080  UN   0.0   88296   3052  crond
    
    
    crash> ps | grep UN | wc -l
    99
    
    
    They are spawned by 3991 :
    
    crash> bt 3991
    PID: 3991   TASK: ffff81082e7010c0  CPU: 7   COMMAND: "crond"
     #0 [ffff81081e87bde8] schedule at ffffffff80062f96
     #1 [ffff81081e87bec0] do_nanosleep at ffffffff80063cfd
     #2 [ffff81081e87bed0] hrtimer_nanosleep at ffffffff8005a3dd
     #3 [ffff81081e87bf50] sys_nanosleep at ffffffff80054c2b
     #4 [ffff81081e87bf80] system_call at ffffffff8005d116
        RIP: 00002b11279683c0  RSP: 00007fffabc18100  RFLAGS: 00010297
        RAX: 0000000000000023  RBX: ffffffff8005d116  RCX: 0000000000000000
        RDX: 0000000000000000  RSI: 00007fffabc18800  RDI: 00007fffabc18800
        RBP: 00000000ffffffff   R8: 0000000000000000   R9: 00007fffabc18660
        R10: 0000000000000008  R11: 0000000000000246  R12: 00007fffabc18780
        R13: 00007fffabc18780  R14: 0000000000000000  R15: 000000000000003c
        ORIG_RAX: 0000000000000023  CS: 0033  SS: 002b
    

Environment

  • Red Hat Enterprise Linux 5 .5

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content