Oracle node rebooted due to the stalled IO requests on few paths to SAN devices
Issue
-
The Oracle RAC nodes are having 8 paths with each LUN, but during the failure of only few of the sub paths, IO requests on voting disks were stalled and Oracle cluster had evicted the node out of cluster:
ccsdagent
process had triggered SysRq and crashed the node:BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8135ca26>] sysrq_handle_crash+0x16/0x20 PGD 4c370d5067 PUD 4895815067 PMD 0 Oops: 0002 [#1] SMP [...]Pid: 37192, comm: cssdagent Tainted: P W -- ------------ 2.6.32-642.15.1.el6.x86_64 #1 HP ProLiant BL460c Gen9 RIP: 0010:[<ffffffff8135ca26>] [<ffffffff8135ca26>] sysrq_handle_crash+0x16/0x20 RSP: 0018:ffff884889d87e18 EFLAGS: 00010092 RAX: 0000000000000010 RBX: 0000000000000063 RCX: 000000000000894e RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000063 RBP: ffff884889d87e18 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000293 R12: 0000000000000000 R13: ffffffff81b12f60 R14: 0000000000000286 R15: 0000000000000004 FS: 00007f6eab0e3700(0000) GS:ffff880291800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000004b8bbba000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process cssdagent (pid: 37192, threadinfo ffff884889d84000, task ffff884bfab51520) Stack: ffff884889d87e68 ffffffff8135cce2 ffff884b982bba28 0000000000000000 <d> 0000000000000018 0000000000000001 ffff884baf9d7cc0 00007f6eb1692b14 <d> 0000000000000001 fffffffffffffffb ffff884889d87e98 ffffffff8135cd9e Call Trace: [<ffffffff8135cce2>] __handle_sysrq+0x132/0x1a0 [<ffffffff8135cd9e>] write_sysrq_trigger+0x4e/0x50 [<ffffffff8120595e>] proc_reg_write+0x7e/0xc0 [<ffffffff81199e48>] vfs_write+0xb8/0x1a0 [<ffffffff8119b35f>] ? fget_light_pos+0x3f/0x50 [<ffffffff8119a981>] sys_write+0x51/0xb0 [<ffffffff810ee43e>] ? __audit_syscall_exit+0x25e/0x290 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b Code: d0 88 81 e3 eb ff 81 c9 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 c7 05 8d f3 74 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 c9 c3 55 48 89 e5 0f 1f 44 00 00 8d 47 RIP [<ffffffff8135ca26>] sysrq_handle_crash+0x16/0x20 RSP <ffff884889d87e18> CR2: 0000000000000000
Environment
- Red Hat Enterprise Linux 6.9
kernel-2.6.32-642.15.1.el6.x86_64
- Red Hat Enterprise Linux 7.4
kernel
version <3.10.0-862.el7
Broadcom BCM57840
partial h/w offload FCoE interfaces
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.