Oracle ocssd.bin running with real-time priority monopolizes a CPU preventing IO submittals leading to a hang

Solution Unverified - Updated -

Issue

  • System hangs because many IO requests for a subset of disks are not being sent to their corresponding drivers.

  • There may be tasks reported as blocked for more than hung_task_timeout_secs number of seconds:

<3>INFO: task filebeat:29183 blocked for more than 120 seconds.
<3>      Tainted: P           -- ------------    2.6.32-754.28.1.el6.x86_64 #1
<3>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>filebeat      D 0000000000000003     0 29183  29178 0x00000080
<4> ffff880e54577a18 0000000000000086 0000000000000000 ffff880e545779dc
<4> ffff880e54577ae8 00000000000005f9 005d53d049f1b7ea ffff8818d4858c00
<4> 000000071ef6852e 0000000000001d2f ffff880e567ae5f8 ffff880e54577fd8
<4>Call Trace:
<4> [<ffffffff8155b3c6>] __mutex_lock_slowpath+0x96/0x210
<4> [<ffffffff8155ac4b>] mutex_lock+0x2b/0x50
<4> [<ffffffffa01abee0>] __log_wait_for_space+0xc0/0x1a0 [jbd]
<4> [<ffffffffa01a7c67>] start_this_handle+0xf7/0x3f0 [jbd]
<4> [<ffffffffa01a8135>] journal_start+0xb5/0x100 [jbd]
<4> [<ffffffffa01db481>] ext3_journal_start_sb+0x31/0x60 [ext3]
<4> [<ffffffffa01cadcd>] ext3_dirty_inode+0x3d/0xa0 [ext3]
<4> [<ffffffff811cec1e>] __mark_inode_dirty+0x3e/0x1c0
<4> [<ffffffff811bf385>] touch_atime+0x195/0x1a0
<4> [<ffffffff8113559c>] generic_file_aio_read+0x38c/0x710
<4> [<ffffffff811a1890>] do_sync_read+0x100/0x140
<4> [<ffffffff810ab140>] ? autoremove_wake_function+0x0/0x40
<4> [<ffffffff81243c7c>] ? security_file_permission+0x1c/0x20
<4> [<ffffffff811a2187>] vfs_read+0xb7/0x1a0
<4> [<ffffffff811a2f7f>] ? fget_light_pos+0x3f/0x50
<4> [<ffffffff811a24d1>] sys_read+0x51/0xb0
<4> [<ffffffff815642c2>] ? system_call_after_swapgs+0xa2/0x152
<4> [<ffffffff815642ce>] ? system_call_after_swapgs+0xae/0x152
<4> [<ffffffff815643a7>] system_call_fastpath+0x35/0x3a
<4> [<ffffffff815642ce>] ? system_call_after_swapgs+0xae/0x152
  • The system is either taken down manually by the operator (by sending a SysRq or an NMI) or a clustering solution will bring the unresponsive node down to prevent the split brain syndrome.

Environment

  • Red Hat Enterprise Linux 6
  • The Oracle Cluster Synchronization Service Daemon (OCSSD) is running. This may or may not be Oracle RAC environment as this daemon could also be used by Oracle ASM (Automatic Storage Management)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content