RHEL5: System hung with many tasks blocking on "synchronize_rcu"
Issue
-
System hung with many tasks blocking on "synchronize_rcu"
crash> sys KERNEL: /cores/20110808233212/work/vmlinux DUMPFILE: /cores/20110808233212/work/vmcore [PARTIAL DUMP] CPUS: 8 DATE: Sun Aug 7 18:12:04 2011 UPTIME: 08:37:32 LOAD AVERAGE: 99.02, 97.96, 90.16 <------- High load TASKS: 371 NODENAME: hostname RELEASE: 2.6.18-194.11.4.el5 VERSION: #1 SMP Fri Sep 17 04:57:05 EDT 2010 MACHINE: x86_64 (2500 Mhz) MEMORY: 31.5 GB PANIC: ""
-
There are many crond tasks blocked with the following call trace :
[...] INFO: task crond:31781 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. crond D ffff81000900caa0 0 31781 3991 31782 31780 (NOTLB) ffff8107b2f4be58 0000000000000086 ff610134cd92d920 a930c1e2292cdfc7 170fc99bb453d21d 0000000000000005 ffff8107b338c820 ffff81082ff18100 0000159af7f5461e 00000000003e2472 ffff8107b338ca08 00000001801a6bea Call Trace: [<ffffffff80063167>] wait_for_completion+0x79/0xa2 [<ffffffff8008cf9d>] default_wake_function+0x0/0xe [<ffffffff80123954>] __key_instantiate_and_link+0x8f/0xc5 [<ffffffff8009ed3d>] synchronize_rcu+0x30/0x36 [<ffffffff8009e879>] wakeme_after_rcu+0x0/0x9 [<ffffffff801262f0>] install_session_keyring+0xc0/0xd3 [<ffffffff80003138>] level3_kernel_pgt+0x138/0x1000 [<ffffffff8012681e>] join_session_keyring+0x25/0xcb [<ffffffff80125cdb>] keyctl_join_session_keyring+0x2d/0x40 [<ffffffff8005d116>] system_call+0x7e/0x83 INFO: task crond:31782 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. crond D ffff81000901d7a0 0 31782 3991 31781 (NOTLB) ffff8107b2e99e58 0000000000000086 b453d21da930c1e2 0e43583d170fc99b 73137ef065dad4fe 0000000000000005 ffff8107b338c0c0 ffff81082fe1b100 0000159af7f95dfe 000000000044a540 ffff8107b338c2a8 00000003801a6bea Call Trace: [<ffffffff80063167>] wait_for_completion+0x79/0xa2 [<ffffffff8008cf9d>] default_wake_function+0x0/0xe [<ffffffff80123954>] __key_instantiate_and_link+0x8f/0xc5 [<ffffffff8009ed3d>] synchronize_rcu+0x30/0x36 [<ffffffff8009e879>] wakeme_after_rcu+0x0/0x9 [<ffffffff801262f0>] install_session_keyring+0xc0/0xd3 [<ffffffff80003238>] level3_kernel_pgt+0x238/0x1000 [<ffffffff8012681e>] join_session_keyring+0x25/0xcb [<ffffffff80125cdb>] keyctl_join_session_keyring+0x2d/0x40 [<ffffffff8005d116>] system_call+0x7e/0x83 [...]
-
Panic occurred due to NMI :
NMI Watchdog detected LOCKUP on CPU 6 CPU 6 Modules linked in: mptctl sg ipmi_devintf ipmi_si ipmi_msghandler autofs4 lockd sunrpc bonding ipv6 xfrm_nalgo crypto_api dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport tg3 bnx2 shpchp pcspkr i5000_edac hpilo serio_raw edac_mc dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod mptspi mptscsih scsi_transport_spi mptbase cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 0, comm: swapper Not tainted 2.6.18-194.11.4.el5 #1 RIP: 0010:[<ffffffff80057082>] [<ffffffff80057082>] mwait_idle+0x36/0x4a RSP: 0018:ffff81082fef5ef0 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff8005704c RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff8030a718 RBP: 0000000000000006 R08: ffff81082fef4000 R09: 000000000000003a R10: ffff81011cb74038 R11: ffff8107e15cfb58 R12: 00000000000000ff R13: ffffffff803d2580 R14: 0000000000000600 R15: ffffffff803f4320 FS: 0000000000000000(0000) GS:ffff81082feabb40(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000018df49fc CR3: 00000007e4584000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo ffff81082fef4000, task ffff81082feaf080) Stack: ffffffff8004923a 00000000000000c0 ffffffff8007796b ffffffff803f2340 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace: [<ffffffff8004923a>] cpu_idle+0x95/0xb8 [<ffffffff8007796b>] start_secondary+0x498/0x4a7 Code: 65 48 8b 04 25 10 00 00 00 8b 80 38 e0 ff ff a8 08 74 ba c3 [....]
-
Memory usage is normal, and log of crond tasks in UN state :
crash> ps | grep UN 684 3991 7 ffff8107b0337040 UN 0.0 88296 3052 crond 685 3991 3 ffff8107b071b7e0 UN 0.0 88296 3052 crond 686 3991 1 ffff8107b06c2820 UN 0.0 88296 3052 crond 687 3991 0 ffff8107b0112860 UN 0.0 88296 3052 crond 688 3991 4 ffff8107afd1f7a0 UN 0.0 88296 3052 crond 1355 3991 0 ffff8107aefe9080 UN 0.0 88296 3052 crond 1488 3991 4 ffff8107ae96c080 UN 0.0 88296 3052 crond crash> ps | grep UN | wc -l 99 They are spawned by 3991 : crash> bt 3991 PID: 3991 TASK: ffff81082e7010c0 CPU: 7 COMMAND: "crond" #0 [ffff81081e87bde8] schedule at ffffffff80062f96 #1 [ffff81081e87bec0] do_nanosleep at ffffffff80063cfd #2 [ffff81081e87bed0] hrtimer_nanosleep at ffffffff8005a3dd #3 [ffff81081e87bf50] sys_nanosleep at ffffffff80054c2b #4 [ffff81081e87bf80] system_call at ffffffff8005d116 RIP: 00002b11279683c0 RSP: 00007fffabc18100 RFLAGS: 00010297 RAX: 0000000000000023 RBX: ffffffff8005d116 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 00007fffabc18800 RDI: 00007fffabc18800 RBP: 00000000ffffffff R8: 0000000000000000 R9: 00007fffabc18660 R10: 0000000000000008 R11: 0000000000000246 R12: 00007fffabc18780 R13: 00007fffabc18780 R14: 0000000000000000 R15: 000000000000003c ORIG_RAX: 0000000000000023 CS: 0033 SS: 002b
Environment
- Red Hat Enterprise Linux 5 .5
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.