Soft lockup due to possible priority starvation
Issue
- Kernel panics due to Oracle RAC node eviction performed by cssdagent.
crash> bt
PID: 8288 TASK: ffff881006ae0080 CPU: 6 COMMAND: "cssdagent"
#0 [ffff881016acd9e0] machine_kexec at ffffffff81038f3b
#1 [ffff881016acda40] crash_kexec at ffffffff810c59f2
#2 [ffff881016acdb10] oops_end at ffffffff8152b7f0
#3 [ffff881016acdb40] no_context at ffffffff8104a00b
#4 [ffff881016acdb90] __bad_area_nosemaphore at ffffffff8104a295
#5 [ffff881016acdbe0] bad_area at ffffffff8104a3be
#6 [ffff881016acdc10] __do_page_fault at ffffffff8104ab6f
#7 [ffff881016acdd30] do_page_fault at ffffffff8152d73e
#8 [ffff881016acdd60] page_fault at ffffffff8152aaf5
[exception RIP: sysrq_handle_crash+22]
RIP: ffffffff8134b516 RSP: ffff881016acde18 RFLAGS: 00010096
RAX: 0000000000000010 RBX: 0000000000000063 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000063
RBP: ffff881016acde18 R8: 000000000000000a R9: 203a207152737953
R10: 00007f8497dfd8f0 R11: 0000000000000293 R12: 0000000000000000
R13: ffffffff81b01a40 R14: 0000000000000286 R15: 0000000000000007
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff881016acde20] __handle_sysrq at ffffffff8134b7d2
#10 [ffff881016acde70] write_sysrq_trigger at ffffffff8134b88e
#11 [ffff881016acdea0] proc_reg_write at ffffffff811f2f1e
#12 [ffff881016acdef0] vfs_write at ffffffff81188c38
#13 [ffff881016acdf30] sys_write at ffffffff81189531
#14 [ffff881016acdf80] system_call_fastpath at ffffffff8100b072
RIP: 00007f84a60596fd RSP: 00007f8497dfdb68 RFLAGS: 00010206
RAX: 0000000000000001 RBX: ffffffff8100b072 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 00007f84a97acd8c RDI: 0000000000000004
RBP: 00007f8497dfdb60 R8: 0000000000000058 R9: 0000000000c00000
R10: 00007f8497dfd8f0 R11: 0000000000000293 R12: 00007f8497dfe9c0
R13: 00007f84a49249a0 R14: 0000000000006c48 R15: 0000000000000007
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
crash> files ffff881006ae0080 | grep sysrq
4 ffff8810149c6e00 ffff880ffb2f8c00 ffff881022abeab8 REG /proc/sysrq-trigger
There were 1390 running/runnable tasks, and 50 blocked tasks in uninterruptible sleep.
crash> ps | grep RU | wc -l
1390
crash> ps | grep UN | wc -l
50
crash> ps | grep UN | awk '{print $9}' | sort | uniq -c | sort -nr
22 oracle
6 crond
4 java
2 ps
2 ocssd.bin
2 cssdmonitor
2 cssdagent
1 snmpd
1 perl
1 osysmond.bin
1 oraagent.bin
1 netstat
1 ksh93
1 ksh
1 irqbalance
1 find
1 automount
Complete list of UN tasks.
crash> ps -m | grep UN
[ 0 00:00:00.003] [UN] PID: 8557 TASK: ffff880fdc874ae0 CPU: 3 COMMAND: "cssdagent"
[ 0 00:00:00.012] [UN] PID: 8556 TASK: ffff880fdc83e080 CPU: 6 COMMAND: "cssdmonitor"
[ 0 00:00:00.056] [UN] PID: 8286 TASK: ffff881000963540 CPU: 5 COMMAND: "cssdmonitor"
[ 0 00:01:50.538] [UN] PID: 25943 TASK: ffff88101fb0cae0 CPU: 4 COMMAND: "oracle"
[ 0 00:01:50.615] [UN] PID: 10546 TASK: ffff880fdcb92ae0 CPU: 3 COMMAND: "oracle"
[ 0 00:02:02.251] [UN] PID: 8290 TASK: ffff8810072d9500 CPU: 6 COMMAND: "cssdagent"
[ 0 00:02:02.250] [UN] PID: 8544 TASK: ffff880fdc875540 CPU: 1 COMMAND: "ocssd.bin"
[ 0 00:02:39.212] [UN] PID: 8211 TASK: ffff881016ae3500 CPU: 2 COMMAND: "osysmond.bin"
[ 0 00:02:39.168] [UN] PID: 4795 TASK: ffff880571e14aa0 CPU: 4 COMMAND: "automount"
[ 0 00:02:39.309] [UN] PID: 12603 TASK: ffff880afac62080 CPU: 6 COMMAND: "oracle"
[ 0 00:02:39.325] [UN] PID: 4854 TASK: ffff880065dd4040 CPU: 3 COMMAND: "find"
[ 0 00:02:39.360] [UN] PID: 3911 TASK: ffff880eaa028080 CPU: 6 COMMAND: "oracle"
[ 0 00:02:39.373] [UN] PID: 8680 TASK: ffff880791596aa0 CPU: 0 COMMAND: "oracle"
[ 0 00:02:39.382] [UN] PID: 4838 TASK: ffff880111226040 CPU: 3 COMMAND: "ksh93"
[ 0 00:02:39.344] [UN] PID: 4831 TASK: ffff880574537500 CPU: 4 COMMAND: "oraagent.bin"
[ 0 00:02:39.422] [UN] PID: 4839 TASK: ffff88010c863540 CPU: 3 COMMAND: "ps"
[ 0 00:02:39.500] [UN] PID: 4835 TASK: ffff880fcd7c0aa0 CPU: 3 COMMAND: "ps"
[ 0 00:02:39.627] [UN] PID: 10405 TASK: ffff880fcd7a2aa0 CPU: 6 COMMAND: "oracle"
[ 0 00:02:39.656] [UN] PID: 12556 TASK: ffff880b8d0d2aa0 CPU: 6 COMMAND: "oracle"
[ 0 00:02:39.657] [UN] PID: 3943 TASK: ffff880fdc802aa0 CPU: 6 COMMAND: "oracle"
[ 0 00:02:39.706] [UN] PID: 10346 TASK: ffff880fdc976ae0 CPU: 2 COMMAND: "oracle"
[ 0 00:02:39.721] [UN] PID: 25935 TASK: ffff880fdc895500 CPU: 0 COMMAND: "oracle"
[ 0 00:02:39.736] [UN] PID: 8107 TASK: ffff880ddc3f8080 CPU: 0 COMMAND: "oracle"
[ 0 00:02:39.737] [UN] PID: 4807 TASK: ffff880115934080 CPU: 5 COMMAND: "crond"
[ 0 00:02:39.769] [UN] PID: 10553 TASK: ffff880fc9013540 CPU: 0 COMMAND: "oracle"
[ 0 00:02:39.766] [UN] PID: 4810 TASK: ffff8805705f4040 CPU: 5 COMMAND: "crond"
[ 0 00:02:39.751] [UN] PID: 4760 TASK: ffff8801018ae040 CPU: 4 COMMAND: "netstat"
[ 0 00:02:39.753] [UN] PID: 4804 TASK: ffff8801019f3500 CPU: 4 COMMAND: "crond"
[ 0 00:02:39.812] [UN] PID: 3594 TASK: ffff8805aa57a040 CPU: 6 COMMAND: "perl"
[ 0 00:02:39.815] [UN] PID: 25895 TASK: ffff880ec5e2aae0 CPU: 3 COMMAND: "oracle"
[ 0 00:02:39.782] [UN] PID: 4806 TASK: ffff880110756040 CPU: 4 COMMAND: "crond"
[ 0 00:02:39.853] [UN] PID: 12573 TASK: ffff880bd9926080 CPU: 6 COMMAND: "oracle"
[ 0 00:02:39.878] [UN] PID: 10564 TASK: ffff880fcd5ad540 CPU: 3 COMMAND: "oracle"
[ 0 00:02:39.881] [UN] PID: 5284 TASK: ffff8810242ec040 CPU: 6 COMMAND: "snmpd"
[ 0 00:02:39.888] [UN] PID: 4805 TASK: ffff88011c00b540 CPU: 5 COMMAND: "crond"
[ 0 00:02:39.906] [UN] PID: 25929 TASK: ffff880e8b810aa0 CPU: 0 COMMAND: "oracle"
[ 0 00:02:39.908] [UN] PID: 8077 TASK: ffff880689ea0aa0 CPU: 5 COMMAND: "oracle"
[ 0 00:02:39.868] [UN] PID: 4808 TASK: ffff880582d7f500 CPU: 4 COMMAND: "crond"
[ 0 00:02:39.939] [UN] PID: 10379 TASK: ffff880fc924c040 CPU: 3 COMMAND: "oracle"
[ 0 00:02:39.939] [UN] PID: 3933 TASK: ffff881007017500 CPU: 7 COMMAND: "oracle"
[ 0 00:02:39.941] [UN] PID: 8053 TASK: ffff8805607a9540 CPU: 0 COMMAND: "oracle"
[ 0 00:02:39.939] [UN] PID: 10542 TASK: ffff88101680e040 CPU: 7 COMMAND: "oracle"
[ 0 00:02:39.959] [UN] PID: 1927 TASK: ffff881023bda080 CPU: 0 COMMAND: "irqbalance"
[ 0 00:02:39.964] [UN] PID: 8552 TASK: ffff881000bd7500 CPU: 7 COMMAND: "ocssd.bin"
[ 0 00:03:12.372] [UN] PID: 8071 TASK: ffff880689ea1500 CPU: 3 COMMAND: "oracle"
[ 0 00:03:12.408] [UN] PID: 4689 TASK: ffff880fcd734ae0 CPU: 6 COMMAND: "ksh"
[ 0 00:04:57.263] [UN] PID: 4779 TASK: ffff88010655b500 CPU: 0 COMMAND: "java"
[ 0 00:04:57.338] [UN] PID: 4772 TASK: ffff88010655a040 CPU: 0 COMMAND: "java"
[ 0 00:04:57.377] [UN] PID: 4761 TASK: ffff880582d7e040 CPU: 5 COMMAND: "java"
[ 0 00:04:57.460] [UN] PID: 4765 TASK: ffff880115b3e080 CPU: 6 COMMAND: "java" <<---- The most oldest.
Looking at the most oldest task. It looks like this was calling fork() and hung up on something.
crash> bt 4765
PID: 4765 TASK: ffff880115b3e080 CPU: 6 COMMAND: "java"
#0 [ffff8800696e9c60] schedule at ffffffff81527bb0
#1 [ffff8800696e9d28] schedule_timeout at ffffffff81528a95
#2 [ffff8800696e9dd8] wait_for_common at ffffffff81528713
#3 [ffff8800696e9e68] wait_for_completion at ffffffff8152882d
#4 [ffff8800696e9e78] do_fork at ffffffff81070bb9 <<---------
#5 [ffff8800696e9f38] sys_vfork at ffffffff81015e75
#6 [ffff8800696e9f48] stub_vfork at ffffffff8100b3d3
RIP: 00007f0c99d4a69c RSP: 00007f0b39ee0590 RFLAGS: 00000293
RAX: 000000000000003a RBX: 00007f0bd0a36250 RCX: ffffffffffffffff
RDX: 00000000ffffdfba RSI: 0000000000002046 RDI: 00007f0c983e431c
RBP: 00007f0bd0a649d0 R8: 00007f0b39ee0648 R9: 00007f0b39ee0650
R10: 0000000000000250 R11: 0000000000000293 R12: 00007f0bd0887060
R13: 00007f0c996288e0 R14: 0000000000000001 R15: 00007f0bd0a48990
ORIG_RAX: 000000000000003a CS: 0033 SS: 002b
Environment
- Red Hat Enterprise Linux 6 - kernel-2.6.32-431.17.1.el6.x86_64 running on VMware ESXi
- Oracle RAC
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
