The current commit thread of the ext4 journal is calling cond_resched() to give the scheduler a chance to run a higher priority task on the runqueue without finishing committing the journal. Oracle RAC node exiction occurred as a result.

Solution Unverified - Updated -

Issue

  • The system gets hung up.

  • The tasks are stuck waiting for the ext4 journal to be committed by the current commit thread of that journal.

PID: 12471  TASK: ffff90baf6618000  CPU: 10  COMMAND: "java"
 #0 [ffff90ba932b7d88] __schedule at ffffffffa1169a72
 #1 [ffff90ba932b7e10] schedule at ffffffffa1169f19
 #2 [ffff90ba932b7e20] jbd2_log_wait_commit at ffffffffc06a77c5 [jbd2]
 #3 [ffff90ba932b7e98] jbd2_complete_transaction at ffffffffc06a8e52 [jbd2]
 #4 [ffff90ba932b7eb8] ext4_sync_file at ffffffffc060e782 [ext4]
        ...
  • The current commit thread of the ext4 journal is calling cond_resched() to give the scheduler a chance to run a higher priority task on the runqueue without finishing committing the journal. Oracle RAC node exiction occurred as a result.
PID: 6081   TASK: ffff90bb10b04100  CPU: 1   COMMAND: "jbd2/dm-6-8"
 #0 [ffff90bb1eb6fa58] __schedule at ffffffffa1169a72
 #1 [ffff90bb1eb6fae0] __cond_resched at ffffffffa0ad4646
 #2 [ffff90bb1eb6faf8] _cond_resched at ffffffffa116a1ba
 #3 [ffff90bb1eb6fb08] tag_pages_for_writeback at ffffffffa0bc2b36
 #4 [ffff90bb1eb6fb40] write_cache_pages at ffffffffa0bc350c
 #5 [ffff90bb1eb6fc48] generic_writepages at ffffffffa0bc392d
 #6 [ffff90bb1eb6fca8] jbd2_journal_commit_transaction at ffffffffc06a150e [jbd2]
        ...

PID: 28810  TASK: ffff90baf4a5e180  CPU: 4   COMMAND: "cssdmonitor"
        ...
    [exception RIP: sysrq_handle_crash+22]
    RIP: ffffffffa0e64106  RSP: ffff90ba2beafe58  RFLAGS: 00010246
    RAX: ffffffffa0e640f0  RBX: ffffffffa16e4f40  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000282  RDI: 0000000000000063
    RBP: ffff90ba2beafe58   R8: 00000000a06022b0   R9: ffffffffa19f9667
    R10: 00000000000ea274  R11: 0000000000100000  R12: 0000000000000063
    R13: 0000000000000000  R14: 0000000000000007  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#11 [ffff90ba2beafe60] __handle_sysrq at ffffffffa0e6492d
#12 [ffff90ba2beafe90] write_sysrq_trigger at ffffffffa0e64d98
        ...
  • A possible CPU runtime starvation is being encountered due to the highly CPU-bound workload of Oracle real-time application threads.
crash> runq -c 1
CPU 1 RUNQUEUE: ffff90bb2325ab80
  CURRENT: PID: 28782  TASK: ffff90bb1beda080  COMMAND: "cssdagent" <<------------
  RT PRIO_ARRAY: ffff90bb2325ad20
     [  0] PID: 28782  TASK: ffff90bb1beda080  COMMAND: "cssdagent" <<------------
  CFS RB_ROOT: ffff90bb2325ac28
     [120] PID: 2020   TASK: ffff90a60beb9040  COMMAND: "kworker/1:11"
     [120] PID: 27266  TASK: ffff90a74f7f8000  COMMAND: "ora_q004_tolss1"
     [120] PID: 24997  TASK: ffff90a59ab430c0  COMMAND: "ora_tt03_arcdb1"
     [120] PID: 28485  TASK: ffff90ba25b50000  COMMAND: "gipcd.bin"
     [120] PID: 6081   TASK: ffff90bb10b04100  COMMAND: "jbd2/dm-6-8" <<------------
     [120] PID: 19318  TASK: ffff90b75eec2080  COMMAND: "ora_lmd0_wwsm1"
     [120] PID: 19740  TASK: ffff90b65ff45140  COMMAND: "ora_rs00_dwh1"
     [120] PID: 29158  TASK: ffff90ba502a0000  COMMAND: "crsd.bin"
     [120] PID: 21628  TASK: ffff90ac41b22080  COMMAND: "ora_ctwr_upp1"

crash> ps -y RR | awk '$1~/>/'
> 19037      1   5  ffff90b7dbf9e180  RU   0.0 11220908  25256  ora_lmhb_ccr1
> 28540      1   0  ffff90ba07e030c0  RU   0.1 1663924 176868  osysmond.bin
> 28752      1   7  ffff90bb1b928000  RU   0.2 3164060 279688  ocssd.bin
> 28753      1  10  ffff90bb1b92d140  RU   0.2 3164060 279688  ocssd.bin
> 28765      1   2  ffff90b90129a080  RU   0.2 3164060 279688  ocssd.bin
> 28767      1  11  ffff90b901298000  RU   0.2 3164060 279688  ocssd.bin
> 28782      1   1  ffff90bb1beda080  RU   0.1 1186164 156112  cssdagent
> 28810      1   4  ffff90baf4a5e180  RU   0.1 1182784 154088  cssdmonitor
> 28830      1   6  ffff90baf661b0c0  RU   0.2 3164060 279688  ocssd.bin
> 28831      1   9  ffff90baf661c100  RU   0.2 3164060 279688  ocssd.bin
> 31742      1  13  ffff90bb11e65140  RU   0.0 4794312  37076  asm_lms0_+asm1

Environment

  • Red Hat Enterprise Linux 7.6 kernel-3.10.0-957.27.2.el7
  • The RHEL guest running on the KVM hypervisor
  • Oracle RAC

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content