RHEL7: jbd2 deadlock occurs while checkpoint thread waits commit thread to finish.

Solution Verified - Updated -

Issue

  • Some applications become unresponsive (hanging)
  • A number of processes are in 'D' state (uninterruptible sleep)
  • In dmesg output and/or in /var/log/messages, we can see messages with a stack like this:

    Apr  1 22:46:09 localhost kernel: INFO: task jbd2/dm-5-8:1484 blocked for more than 120 seconds.
    Apr  1 22:46:09 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Apr  1 22:46:09 localhost kernel: jbd2/dm-5-8     D   ffff8d52674b8fd0   0  1484      2 0x00000000
    Apr  1 22:46:09 localhost kernel: Call Trace:
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e768b69>] schedule_preempt_disabled+0x29/0x70
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e766ab7>] __mutex_lock_slowpath+0xc7/0x1d0
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e765e9f>] mutex_lock+0x1f/0x2f
    Apr  1 22:46:09 localhost kernel: [<ffffffffc018b7b8>] jbd2_update_log_tail+0x28/0x60 [jbd2]
    Apr  1 22:46:09 localhost kernel: [<ffffffffc01845dd>] jbd2_journal_commit_transaction+0x155d/0x19b0 [jbd2]
    Apr  1 22:46:09 localhost kernel: [<ffffffffc0189e89>] kjournald2+0xc9/0x260 [jbd2]
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e0c2d00>] ? wake_up_atomic_t+0x30/0x30
    Apr  1 22:46:09 localhost kernel: [<ffffffffc0189dc0>] ? commit_timeout+0x10/0x10 [jbd2]
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e0c1c31>] kthread+0xd1/0xe0
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e0c1b60>] ? insert_kthread_work+0x40/0x40
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e774c37>] ret_from_fork_nospec_begin+0x21/0x21
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e0c1b60>] ? insert_kthread_work+0x40/0x40
    

    or like this:

    Apr  1 22:46:09 localhost kernel: INFO: task java:63838 blocked for more than 120 seconds.
    Apr  1 22:46:09 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Apr  1 22:46:09 localhost kernel: java          D  ffff8d802b344f10    0 63838      1 0x00000000
    Apr  1 22:46:09 localhost kernel: Call Trace:
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e768b69>] schedule_preempt_disabled+0x29/0x70
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e766ab7>] __mutex_lock_slowpath+0xc7/0x1d0
    Apr  1 22:46:09 localhost kernel: [<ffffffffc01813da>] ? jbd2_journal_stop+0x1ea/0x3d0 [jbd2]
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e765e9f>] mutex_lock+0x1f/0x2f
    Apr  1 22:46:09 localhost kernel: [<ffffffffc0186508>] __jbd2_log_wait_for_space+0xc8/0x1f0 [jbd2]
    Apr  1 22:46:09 localhost kernel: [<ffffffffc01803d3>] add_transaction_credits+0x2d3/0x2f0 [jbd2]
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e27ad6f>] ? __block_write_begin_int+0x8f/0x5f0
    Apr  1 22:46:09 localhost kernel: [<ffffffffc01805e1>] start_this_handle+0x1a1/0x430 [jbd2]
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e2af000>] ? dquot_file_open+0x40/0x50
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e21c0b2>] ? kmem_cache_alloc+0x1c2/0x1f0
    Apr  1 22:46:09 localhost kernel: [<ffffffffc0180a93>] jbd2__journal_start+0xf3/0x1f0 [jbd2]
    Apr  1 22:46:09 localhost kernel: [<ffffffffc0378b0a>] ? ext4_dirty_inode+0x2a/0x60 [ext4]
    Apr  1 22:46:09 localhost kernel: [<ffffffffc03a4c59>] __ext4_journal_start_sb+0x69/0xe0 [ext4]
    Apr  1 22:46:09 localhost kernel: [<ffffffffc0378b0a>] ext4_dirty_inode+0x2a/0x60 [ext4]
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e27012d>] __mark_inode_dirty+0x16d/0x270
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e25e229>] update_time+0x89/0xd0
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e25e6ff>] touch_atime+0xff/0x240
    Apr  1 22:46:09 localhost kernel: [<ffffffffc036c0dd>] ext4_file_mmap+0x7d/0xa0 [ext4]
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e1f25c5>] mmap_region+0x435/0x660
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e1f2b68>] do_mmap+0x378/0x530
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e1d6380>] vm_mmap_pgoff+0xd0/0x120
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e1f0b11>] SyS_mmap_pgoff+0x1f1/0x270
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e774d15>] ? system_call_after_swapgs+0xa2/0x146
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e774d21>] ? system_call_after_swapgs+0xae/0x146
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e030c22>] SyS_mmap+0x22/0x30
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e774ddb>] system_call_fastpath+0x22/0x27
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e774d21>] ? system_call_after_swapgs+0xae/0x146
    
  • Similar backtraces can be seen in /proc/<pid>/stack of the processes that are hanging (<pid> is the actual pid of the hanging process)

Environment

  • Red Hat Enterprise Linux 7 (before kernel-3.10.0-1127.el7)
  • Red Hat Enterprise Linux 8 (before kernel-4.18.0-147.el8)
  • Ext4 / jbd2

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content