Soft lockup due to possible priority starvation

Solution Unverified - Updated -

Issue

  • Kernel panics due to Oracle RAC node eviction performed by cssdagent.
crash> bt
PID: 8288   TASK: ffff881006ae0080  CPU: 6   COMMAND: "cssdagent"
 #0 [ffff881016acd9e0] machine_kexec at ffffffff81038f3b
 #1 [ffff881016acda40] crash_kexec at ffffffff810c59f2
 #2 [ffff881016acdb10] oops_end at ffffffff8152b7f0
 #3 [ffff881016acdb40] no_context at ffffffff8104a00b
 #4 [ffff881016acdb90] __bad_area_nosemaphore at ffffffff8104a295
 #5 [ffff881016acdbe0] bad_area at ffffffff8104a3be
 #6 [ffff881016acdc10] __do_page_fault at ffffffff8104ab6f
 #7 [ffff881016acdd30] do_page_fault at ffffffff8152d73e
 #8 [ffff881016acdd60] page_fault at ffffffff8152aaf5
    [exception RIP: sysrq_handle_crash+22]
    RIP: ffffffff8134b516  RSP: ffff881016acde18  RFLAGS: 00010096
    RAX: 0000000000000010  RBX: 0000000000000063  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000063
    RBP: ffff881016acde18   R8: 000000000000000a   R9: 203a207152737953
    R10: 00007f8497dfd8f0  R11: 0000000000000293  R12: 0000000000000000
    R13: ffffffff81b01a40  R14: 0000000000000286  R15: 0000000000000007
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff881016acde20] __handle_sysrq at ffffffff8134b7d2
#10 [ffff881016acde70] write_sysrq_trigger at ffffffff8134b88e
#11 [ffff881016acdea0] proc_reg_write at ffffffff811f2f1e
#12 [ffff881016acdef0] vfs_write at ffffffff81188c38
#13 [ffff881016acdf30] sys_write at ffffffff81189531
#14 [ffff881016acdf80] system_call_fastpath at ffffffff8100b072
    RIP: 00007f84a60596fd  RSP: 00007f8497dfdb68  RFLAGS: 00010206
    RAX: 0000000000000001  RBX: ffffffff8100b072  RCX: 0000000000000000
    RDX: 0000000000000001  RSI: 00007f84a97acd8c  RDI: 0000000000000004
    RBP: 00007f8497dfdb60   R8: 0000000000000058   R9: 0000000000c00000
    R10: 00007f8497dfd8f0  R11: 0000000000000293  R12: 00007f8497dfe9c0
    R13: 00007f84a49249a0  R14: 0000000000006c48  R15: 0000000000000007
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

crash> files ffff881006ae0080 | grep sysrq
  4 ffff8810149c6e00 ffff880ffb2f8c00 ffff881022abeab8 REG  /proc/sysrq-trigger

There were 1390 running/runnable tasks, and 50 blocked tasks in uninterruptible sleep.

crash> ps | grep RU | wc -l
1390

crash> ps | grep UN | wc -l
50

crash> ps | grep UN | awk '{print $9}' | sort | uniq -c | sort -nr
     22 oracle
      6 crond
      4 java
      2 ps
      2 ocssd.bin
      2 cssdmonitor
      2 cssdagent
      1 snmpd
      1 perl
      1 osysmond.bin
      1 oraagent.bin
      1 netstat
      1 ksh93
      1 ksh
      1 irqbalance
      1 find
      1 automount

Complete list of UN tasks.

crash> ps -m | grep UN 
[ 0 00:00:00.003] [UN]  PID: 8557   TASK: ffff880fdc874ae0  CPU: 3   COMMAND: "cssdagent"
[ 0 00:00:00.012] [UN]  PID: 8556   TASK: ffff880fdc83e080  CPU: 6   COMMAND: "cssdmonitor"
[ 0 00:00:00.056] [UN]  PID: 8286   TASK: ffff881000963540  CPU: 5   COMMAND: "cssdmonitor"
[ 0 00:01:50.538] [UN]  PID: 25943  TASK: ffff88101fb0cae0  CPU: 4   COMMAND: "oracle"
[ 0 00:01:50.615] [UN]  PID: 10546  TASK: ffff880fdcb92ae0  CPU: 3   COMMAND: "oracle"
[ 0 00:02:02.251] [UN]  PID: 8290   TASK: ffff8810072d9500  CPU: 6   COMMAND: "cssdagent"
[ 0 00:02:02.250] [UN]  PID: 8544   TASK: ffff880fdc875540  CPU: 1   COMMAND: "ocssd.bin"
[ 0 00:02:39.212] [UN]  PID: 8211   TASK: ffff881016ae3500  CPU: 2   COMMAND: "osysmond.bin"
[ 0 00:02:39.168] [UN]  PID: 4795   TASK: ffff880571e14aa0  CPU: 4   COMMAND: "automount"
[ 0 00:02:39.309] [UN]  PID: 12603  TASK: ffff880afac62080  CPU: 6   COMMAND: "oracle"
[ 0 00:02:39.325] [UN]  PID: 4854   TASK: ffff880065dd4040  CPU: 3   COMMAND: "find"
[ 0 00:02:39.360] [UN]  PID: 3911   TASK: ffff880eaa028080  CPU: 6   COMMAND: "oracle"
[ 0 00:02:39.373] [UN]  PID: 8680   TASK: ffff880791596aa0  CPU: 0   COMMAND: "oracle"
[ 0 00:02:39.382] [UN]  PID: 4838   TASK: ffff880111226040  CPU: 3   COMMAND: "ksh93"
[ 0 00:02:39.344] [UN]  PID: 4831   TASK: ffff880574537500  CPU: 4   COMMAND: "oraagent.bin"
[ 0 00:02:39.422] [UN]  PID: 4839   TASK: ffff88010c863540  CPU: 3   COMMAND: "ps"
[ 0 00:02:39.500] [UN]  PID: 4835   TASK: ffff880fcd7c0aa0  CPU: 3   COMMAND: "ps"
[ 0 00:02:39.627] [UN]  PID: 10405  TASK: ffff880fcd7a2aa0  CPU: 6   COMMAND: "oracle"
[ 0 00:02:39.656] [UN]  PID: 12556  TASK: ffff880b8d0d2aa0  CPU: 6   COMMAND: "oracle"
[ 0 00:02:39.657] [UN]  PID: 3943   TASK: ffff880fdc802aa0  CPU: 6   COMMAND: "oracle"
[ 0 00:02:39.706] [UN]  PID: 10346  TASK: ffff880fdc976ae0  CPU: 2   COMMAND: "oracle"
[ 0 00:02:39.721] [UN]  PID: 25935  TASK: ffff880fdc895500  CPU: 0   COMMAND: "oracle"
[ 0 00:02:39.736] [UN]  PID: 8107   TASK: ffff880ddc3f8080  CPU: 0   COMMAND: "oracle"
[ 0 00:02:39.737] [UN]  PID: 4807   TASK: ffff880115934080  CPU: 5   COMMAND: "crond"
[ 0 00:02:39.769] [UN]  PID: 10553  TASK: ffff880fc9013540  CPU: 0   COMMAND: "oracle"
[ 0 00:02:39.766] [UN]  PID: 4810   TASK: ffff8805705f4040  CPU: 5   COMMAND: "crond"
[ 0 00:02:39.751] [UN]  PID: 4760   TASK: ffff8801018ae040  CPU: 4   COMMAND: "netstat"
[ 0 00:02:39.753] [UN]  PID: 4804   TASK: ffff8801019f3500  CPU: 4   COMMAND: "crond"
[ 0 00:02:39.812] [UN]  PID: 3594   TASK: ffff8805aa57a040  CPU: 6   COMMAND: "perl"
[ 0 00:02:39.815] [UN]  PID: 25895  TASK: ffff880ec5e2aae0  CPU: 3   COMMAND: "oracle"
[ 0 00:02:39.782] [UN]  PID: 4806   TASK: ffff880110756040  CPU: 4   COMMAND: "crond"
[ 0 00:02:39.853] [UN]  PID: 12573  TASK: ffff880bd9926080  CPU: 6   COMMAND: "oracle"
[ 0 00:02:39.878] [UN]  PID: 10564  TASK: ffff880fcd5ad540  CPU: 3   COMMAND: "oracle"
[ 0 00:02:39.881] [UN]  PID: 5284   TASK: ffff8810242ec040  CPU: 6   COMMAND: "snmpd"
[ 0 00:02:39.888] [UN]  PID: 4805   TASK: ffff88011c00b540  CPU: 5   COMMAND: "crond"
[ 0 00:02:39.906] [UN]  PID: 25929  TASK: ffff880e8b810aa0  CPU: 0   COMMAND: "oracle"
[ 0 00:02:39.908] [UN]  PID: 8077   TASK: ffff880689ea0aa0  CPU: 5   COMMAND: "oracle"
[ 0 00:02:39.868] [UN]  PID: 4808   TASK: ffff880582d7f500  CPU: 4   COMMAND: "crond"
[ 0 00:02:39.939] [UN]  PID: 10379  TASK: ffff880fc924c040  CPU: 3   COMMAND: "oracle"
[ 0 00:02:39.939] [UN]  PID: 3933   TASK: ffff881007017500  CPU: 7   COMMAND: "oracle"
[ 0 00:02:39.941] [UN]  PID: 8053   TASK: ffff8805607a9540  CPU: 0   COMMAND: "oracle"
[ 0 00:02:39.939] [UN]  PID: 10542  TASK: ffff88101680e040  CPU: 7   COMMAND: "oracle"
[ 0 00:02:39.959] [UN]  PID: 1927   TASK: ffff881023bda080  CPU: 0   COMMAND: "irqbalance"
[ 0 00:02:39.964] [UN]  PID: 8552   TASK: ffff881000bd7500  CPU: 7   COMMAND: "ocssd.bin"
[ 0 00:03:12.372] [UN]  PID: 8071   TASK: ffff880689ea1500  CPU: 3   COMMAND: "oracle"
[ 0 00:03:12.408] [UN]  PID: 4689   TASK: ffff880fcd734ae0  CPU: 6   COMMAND: "ksh"
[ 0 00:04:57.263] [UN]  PID: 4779   TASK: ffff88010655b500  CPU: 0   COMMAND: "java"
[ 0 00:04:57.338] [UN]  PID: 4772   TASK: ffff88010655a040  CPU: 0   COMMAND: "java"
[ 0 00:04:57.377] [UN]  PID: 4761   TASK: ffff880582d7e040  CPU: 5   COMMAND: "java"
[ 0 00:04:57.460] [UN]  PID: 4765   TASK: ffff880115b3e080  CPU: 6   COMMAND: "java" <<---- The most oldest.

Looking at the most oldest task. It looks like this was calling fork() and hung up on something.

crash> bt 4765
PID: 4765   TASK: ffff880115b3e080  CPU: 6   COMMAND: "java"
 #0 [ffff8800696e9c60] schedule at ffffffff81527bb0
 #1 [ffff8800696e9d28] schedule_timeout at ffffffff81528a95
 #2 [ffff8800696e9dd8] wait_for_common at ffffffff81528713
 #3 [ffff8800696e9e68] wait_for_completion at ffffffff8152882d
 #4 [ffff8800696e9e78] do_fork at ffffffff81070bb9  <<---------
 #5 [ffff8800696e9f38] sys_vfork at ffffffff81015e75
 #6 [ffff8800696e9f48] stub_vfork at ffffffff8100b3d3
    RIP: 00007f0c99d4a69c  RSP: 00007f0b39ee0590  RFLAGS: 00000293
    RAX: 000000000000003a  RBX: 00007f0bd0a36250  RCX: ffffffffffffffff
    RDX: 00000000ffffdfba  RSI: 0000000000002046  RDI: 00007f0c983e431c
    RBP: 00007f0bd0a649d0   R8: 00007f0b39ee0648   R9: 00007f0b39ee0650
    R10: 0000000000000250  R11: 0000000000000293  R12: 00007f0bd0887060
    R13: 00007f0c996288e0  R14: 0000000000000001  R15: 00007f0bd0a48990
    ORIG_RAX: 000000000000003a  CS: 0033  SS: 002b

Environment

  • Red Hat Enterprise Linux 6 - kernel-2.6.32-431.17.1.el6.x86_64 running on VMware ESXi
  • Oracle RAC

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.