Oracle RAC eviction following multiple soft lockups in shrink_zone using RHEL 5.3 or earlier
Issue
-
Oracle RAC nodes are being evicted and rebooted following the kernel reporting multiple soft lockups in
/var/log/messages
-
Server is crashing after error messages below:
Jul 20 18:40:31 node1 kernel: BUG: soft lockup - CPU#14 stuck for 10s! [bgsagent:23164] Jul 20 18:41:02 node1 kernel: BUG: soft lockup - CPU#6 stuck for 10s! [oraagent.bin:23494] Jul 20 18:41:26 node1 kernel: BUG: soft lockup - CPU#7 stuck for 10s! [perl:17624] Jul 20 18:41:28 node1 kernel: BUG: soft lockup - CPU#3 stuck for 10s! [oracle:7233] Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#11 stuck for 10s! [oracle:9884] Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#15 stuck for 11s! [oracle:20656] Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#7 stuck for 10s! [perl:17624] Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#11 stuck for 10s! [orarootagent.bi:24100] Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#0 stuck for 10s! [multipathd:10205] Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#7 stuck for 14s! [perl:17624] Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#3 stuck for 19s! [oracle:7233] Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#11 stuck for 12s! [orarootagent.bi:24100] Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#15 stuck for 13s! [oracle:20656] Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#3 stuck for 11s! [oracle:7233] Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#12 stuck for 10s! [multipathd:10205] Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#7 stuck for 11s! [perl:17624] Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#11 stuck for 15s! [orarootagent.bi:24100] Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#15 stuck for 13s! [oracle:20656] Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#3 stuck for 13s! [oracle:7233] Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#15 stuck for 12s! [oracle:913] Jul 20 18:41:29 node1 kernel: BUG: soft lockup - CPU#11 stuck for 12s! [orarootagent.bi:24100]
-
Multiple soft lockups are occurring in
shrink_zone
orshrink_inactive_list
Aug 6 14:52:33 node1 kernel: BUG: soft lockup - CPU#11 stuck for 10s! [tnslsnr:29345] Aug 6 14:52:33 node1 kernel: CPU 11: Aug 6 14:52:33 node1 kernel: Modules linked in: oracleasm(U) ocfs2(U) ocfs2_dlmfs(U) ocfs2_dlm(U) ocfs2_nodemanager(U) configfs bonding dm_round_robin dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport sg pcspkr bnx2 ide_cd serio_raw cdrom hpilo dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod lpfc scsi_transport_fc shpchp cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Aug 6 14:52:33 node1 kernel: Pid: 29345, comm: tnslsnr Tainted: G 2.6.18-128.7.1.el5 #1 Aug 6 14:52:33 node1 kernel: RIP: 0010:[<ffffffff800c7a5f>] [<ffffffff800c7a5f>] shrink_inactive_list+0x770/0x7f9 Aug 6 14:52:33 node1 kernel: RSP: 0018:ffff811a01aa9828 EFLAGS: 00000246 Aug 6 14:52:33 node1 kernel: RAX: 000000000000000e RBX: ffff81016f63bc78 RCX: ffff810009064460 Aug 6 14:52:33 node1 kernel: RDX: ffff81016f63bc40 RSI: ffff81000008bf98 RDI: ffff811a01aa98e8 Aug 6 14:52:33 node1 kernel: RBP: 0000000000000000 R08: ffff81000008b600 R09: ffff811a01aa9a78 Aug 6 14:52:33 node1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8120644d9860 Aug 6 14:52:33 node1 kernel: R13: ffff81131e04ceb0 R14: 0000000b8804dd3d R15: ffff811967239990 Aug 6 14:52:33 node1 kernel: FS: 00002ba65c4d7450(0000) GS:ffff810171db1f40(0000) knlGS:00000000f696bb90 Aug 6 14:52:33 node1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Aug 6 14:52:38 node1 kernel: CR2: 0000000015cd78e4 CR3: 00000015412a3000 CR4: 00000000000006e0 Aug 6 14:52:38 node1 kernel: Aug 6 14:52:38 node1 kernel: Call Trace: Aug 6 14:52:38 node1 kernel: [<ffffffff800c7a1d>] shrink_inactive_list+0x72e/0x7f9 Aug 6 14:52:38 node1 kernel: [<ffffffff80046a6e>] try_to_wake_up+0x46f/0x481 Aug 6 14:52:38 node1 kernel: [<ffffffff80012d17>] shrink_zone+0xf6/0x11c Aug 6 14:52:38 node1 kernel: [<ffffffff800c81e5>] try_to_free_pages+0x197/0x2c2 Aug 6 14:52:38 node1 kernel: [<ffffffff8000f270>] __alloc_pages+0x1cb/0x2ce Aug 6 14:52:38 node1 kernel: [<ffffffff8003c04f>] __get_free_pages+0xe/0x71 Aug 6 14:52:38 node1 kernel: [<ffffffff8001e6d8>] __pollwait+0x58/0xe2 Aug 6 14:52:38 node1 kernel: [<ffffffff8002d8b1>] pipe_poll+0x2d/0x90 Aug 6 14:52:38 node1 kernel: [<ffffffff8002f3c6>] do_sys_poll+0x1b8/0x360 Aug 6 14:52:38 node1 kernel: [<ffffffff8001e680>] __pollwait+0x0/0xe2 Aug 6 14:52:38 node1 kernel: [<ffffffff8008a4b4>] default_wake_function+0x0/0xe
Environment
Red Hat Enterprise Linux (RHEL) 5
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.