Oracle RAC eviction following multiple soft lockups in shrink_zone using RHEL 5.3 or earlier

Solution Verified - Updated -

Issue

  • Oracle RAC nodes are being evicted and rebooted following the kernel reporting multiple soft lockups in /var/log/messages

  • Server is crashing after error messages below: 

    Jul 20 18:40:31 node1 kernel: BUG: soft lockup - CPU#14 stuck for 10s! [bgsagent:23164]
    Jul 20 18:41:02 node1 kernel: BUG: soft lockup - CPU#6 stuck for 10s! [oraagent.bin:23494]
    Jul 20 18:41:26 node1 kernel: BUG: soft lockup - CPU#7 stuck for 10s! [perl:17624]
    Jul 20 18:41:28 node1 kernel: BUG: soft lockup - CPU#3 stuck for 10s! [oracle:7233]
    Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#11 stuck for 10s! [oracle:9884]
    Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#15 stuck for 11s! [oracle:20656]
    Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#7 stuck for 10s! [perl:17624]
    Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#11 stuck for 10s! [orarootagent.bi:24100]
    Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#0 stuck for 10s! [multipathd:10205]
    Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#7 stuck for 14s! [perl:17624]
    Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#3 stuck for 19s! [oracle:7233]
    Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#11 stuck for 12s! [orarootagent.bi:24100]
    Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#15 stuck for 13s! [oracle:20656]
    Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#3 stuck for 11s! [oracle:7233]
    Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#12 stuck for 10s! [multipathd:10205]
    Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#7 stuck for 11s! [perl:17624]
    Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#11 stuck for 15s! [orarootagent.bi:24100]
    Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#15 stuck for 13s! [oracle:20656]
    Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#3 stuck for 13s! [oracle:7233]
    Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#15 stuck for 12s! [oracle:913]
    Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#11 stuck for 12s! [orarootagent.bi:24100]
    
  • Multiple soft lockups are occurring in shrink_zone or shrink_inactive_list                                                                                       

    Aug  6 14:52:33 node1 kernel: BUG: soft lockup - CPU#11 stuck for 10s! [tnslsnr:29345]
    Aug  6 14:52:33 node1 kernel: CPU 11:
    Aug  6 14:52:33 node1 kernel: Modules linked in: oracleasm(U) ocfs2(U) ocfs2_dlmfs(U) ocfs2_dlm(U) ocfs2_nodemanager(U) configfs bonding dm_round_robin dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi
    acpi_memhotplug ac parport_pc lp parport sg pcspkr bnx2 ide_cd serio_raw cdrom hpilo dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod lpfc scsi_transport_fc shpchp cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd
    ehci_hcd
    Aug  6 14:52:33 node1 kernel: Pid: 29345, comm: tnslsnr Tainted: G      2.6.18-128.7.1.el5 #1 
    Aug  6 14:52:33 node1 kernel: RIP: 0010:[<ffffffff800c7a5f>]  [<ffffffff800c7a5f>] shrink_inactive_list+0x770/0x7f9
    Aug  6 14:52:33 node1 kernel: RSP: 0018:ffff811a01aa9828  EFLAGS: 00000246
    Aug  6 14:52:33 node1 kernel: RAX: 000000000000000e RBX: ffff81016f63bc78 RCX: ffff810009064460
    Aug  6 14:52:33 node1 kernel: RDX: ffff81016f63bc40 RSI: ffff81000008bf98 RDI: ffff811a01aa98e8
    Aug  6 14:52:33 node1 kernel: RBP: 0000000000000000 R08: ffff81000008b600 R09: ffff811a01aa9a78
    Aug  6 14:52:33 node1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8120644d9860
    Aug  6 14:52:33 node1 kernel: R13: ffff81131e04ceb0 R14: 0000000b8804dd3d R15: ffff811967239990
    Aug  6 14:52:33 node1 kernel: FS:  00002ba65c4d7450(0000) GS:ffff810171db1f40(0000) knlGS:00000000f696bb90
    Aug  6 14:52:33 node1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    Aug  6 14:52:38 node1 kernel: CR2: 0000000015cd78e4 CR3: 00000015412a3000 CR4: 00000000000006e0
    Aug  6 14:52:38 node1 kernel:
    Aug  6 14:52:38 node1 kernel: Call Trace:
    Aug  6 14:52:38 node1 kernel:  [<ffffffff800c7a1d>] shrink_inactive_list+0x72e/0x7f9
    Aug  6 14:52:38 node1 kernel:  [<ffffffff80046a6e>] try_to_wake_up+0x46f/0x481
    Aug  6 14:52:38 node1 kernel:  [<ffffffff80012d17>] shrink_zone+0xf6/0x11c
    Aug  6 14:52:38 node1 kernel:  [<ffffffff800c81e5>] try_to_free_pages+0x197/0x2c2
    Aug  6 14:52:38 node1 kernel:  [<ffffffff8000f270>] __alloc_pages+0x1cb/0x2ce
    Aug  6 14:52:38 node1 kernel:  [<ffffffff8003c04f>] __get_free_pages+0xe/0x71
    Aug  6 14:52:38 node1 kernel:  [<ffffffff8001e6d8>] __pollwait+0x58/0xe2
    Aug  6 14:52:38 node1 kernel:  [<ffffffff8002d8b1>] pipe_poll+0x2d/0x90
    Aug  6 14:52:38 node1 kernel:  [<ffffffff8002f3c6>] do_sys_poll+0x1b8/0x360
    Aug  6 14:52:38 node1 kernel:  [<ffffffff8001e680>] __pollwait+0x0/0xe2
    Aug  6 14:52:38 node1 kernel:  [<ffffffff8008a4b4>] default_wake_function+0x0/0xe
    

Environment

Red Hat Enterprise Linux (RHEL) 5

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content