Translated message

A translation of this page exists in English.

RHEL7: 当 checkpoint 线程等待 commit 线程完成时 发生 jbd2 死锁

Solution In Progress - Updated -

Issue

  • 有些应用程序变得无响应(挂起)
  • 许多进程处于 'D' 状态(不可中断的睡眠)
  • 在 /var/log/messages 和/或 dmesg 输出中包括类似如下的堆栈消息:

    Apr  1 22:46:09 localhost kernel: INFO: task jbd2/dm-5-8:1484 blocked for more than 120 seconds.
    Apr  1 22:46:09 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Apr  1 22:46:09 localhost kernel: jbd2/dm-5-8     D   ffff8d52674b8fd0   0  1484      2 0x00000000
    Apr  1 22:46:09 localhost kernel: Call Trace:
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e768b69>] schedule_preempt_disabled+0x29/0x70
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e766ab7>] __mutex_lock_slowpath+0xc7/0x1d0
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e765e9f>] mutex_lock+0x1f/0x2f
    Apr  1 22:46:09 localhost kernel: [<ffffffffc018b7b8>] jbd2_update_log_tail+0x28/0x60 [jbd2]
    Apr  1 22:46:09 localhost kernel: [<ffffffffc01845dd>] jbd2_journal_commit_transaction+0x155d/0x19b0 [jbd2]
    Apr  1 22:46:09 localhost kernel: [<ffffffffc0189e89>] kjournald2+0xc9/0x260 [jbd2]
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e0c2d00>] ? wake_up_atomic_t+0x30/0x30
    Apr  1 22:46:09 localhost kernel: [<ffffffffc0189dc0>] ? commit_timeout+0x10/0x10 [jbd2]
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e0c1c31>] kthread+0xd1/0xe0
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e0c1b60>] ? insert_kthread_work+0x40/0x40
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e774c37>] ret_from_fork_nospec_begin+0x21/0x21
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e0c1b60>] ? insert_kthread_work+0x40/0x40
    

    或者类似:

    Apr  1 22:46:09 localhost kernel: INFO: task java:63838 blocked for more than 120 seconds.
    Apr  1 22:46:09 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Apr  1 22:46:09 localhost kernel: java          D  ffff8d802b344f10    0 63838      1 0x00000000
    Apr  1 22:46:09 localhost kernel: Call Trace:
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e768b69>] schedule_preempt_disabled+0x29/0x70
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e766ab7>] __mutex_lock_slowpath+0xc7/0x1d0
    Apr  1 22:46:09 localhost kernel: [<ffffffffc01813da>] ? jbd2_journal_stop+0x1ea/0x3d0 [jbd2]
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e765e9f>] mutex_lock+0x1f/0x2f
    Apr  1 22:46:09 localhost kernel: [<ffffffffc0186508>] __jbd2_log_wait_for_space+0xc8/0x1f0 [jbd2]
    Apr  1 22:46:09 localhost kernel: [<ffffffffc01803d3>] add_transaction_credits+0x2d3/0x2f0 [jbd2]
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e27ad6f>] ? __block_write_begin_int+0x8f/0x5f0
    Apr  1 22:46:09 localhost kernel: [<ffffffffc01805e1>] start_this_handle+0x1a1/0x430 [jbd2]
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e2af000>] ? dquot_file_open+0x40/0x50
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e21c0b2>] ? kmem_cache_alloc+0x1c2/0x1f0
    Apr  1 22:46:09 localhost kernel: [<ffffffffc0180a93>] jbd2__journal_start+0xf3/0x1f0 [jbd2]
    Apr  1 22:46:09 localhost kernel: [<ffffffffc0378b0a>] ? ext4_dirty_inode+0x2a/0x60 [ext4]
    Apr  1 22:46:09 localhost kernel: [<ffffffffc03a4c59>] __ext4_journal_start_sb+0x69/0xe0 [ext4]
    Apr  1 22:46:09 localhost kernel: [<ffffffffc0378b0a>] ext4_dirty_inode+0x2a/0x60 [ext4]
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e27012d>] __mark_inode_dirty+0x16d/0x270
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e25e229>] update_time+0x89/0xd0
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e25e6ff>] touch_atime+0xff/0x240
    Apr  1 22:46:09 localhost kernel: [<ffffffffc036c0dd>] ext4_file_mmap+0x7d/0xa0 [ext4]
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e1f25c5>] mmap_region+0x435/0x660
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e1f2b68>] do_mmap+0x378/0x530
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e1d6380>] vm_mmap_pgoff+0xd0/0x120
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e1f0b11>] SyS_mmap_pgoff+0x1f1/0x270
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e774d15>] ? system_call_after_swapgs+0xa2/0x146
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e774d21>] ? system_call_after_swapgs+0xae/0x146
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e030c22>] SyS_mmap+0x22/0x30
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e774ddb>] system_call_fastpath+0x22/0x27
    Apr  1 22:46:09 localhost kernel: [<ffffffff9e774d21>] ? system_call_after_swapgs+0xae/0x146
    
  • 相似的回溯追踪可在正在挂起的进程的 /proc/<pid>/stack 中看到(<pid> 是挂起进程的实际 pid)

Environment

  • Red Hat Enterprise Linux 7 (kernel-3.10.0-1127.el7 之前)
  • Red Hat Enterprise Linux 8 (kernel-4.18.0-147.el8 之前)
  • Ext4 / jbd2

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content