Deadlock in XFS splice on RHEL

Solution Verified - Updated -

Issue

  • I noticed an issue on our production systems that would manifest as some processes becoming blocked in an uninterruptible sleep inside filesystem code. On further examination using "crash", the blocked processes appeared to be deadlocked due to a locking order violation in the XFS kernel module. Once this occurs, the problem can be fixed only by a hard reset of the machine.

    The system logs contained hung task timeout messages

    INFO: task fio:5127 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    fio           D 0000000000000005     0  5127   5123 0x00000000
    ffff8800bab6de58 0000000000000082 ffff8800bab6ddb8 ffffffffa034aa0f
    ffff8800bab6dee8 ffffffff8117256a ffff8800bab6dde8 ffffffff81050be3
    ffff8800bcb16638 ffff8800bab6dfd8 000000000000f598 ffff8800bcb16638
    Call Trace:
    [<ffffffffa034aa0f>] ? xfs_file_aio_read+0x5f/0x70 [xfs]
    [<ffffffff8117256a>] ? do_sync_read+0xfa/0x140
    [<ffffffff81050be3>] ? enqueue_task_fair+0x43/0x90
    [<ffffffff810571ee>] ? activate_task+0x2e/0x40
    [<ffffffff814dc6be>] __mutex_lock_slowpath+0x13e/0x180
    [<ffffffff814dc55b>] mutex_lock+0x2b/0x50
    [<ffffffff811728e6>] generic_file_llseek+0x36/0x70
    [<ffffffff811712ea>] vfs_llseek+0x3a/0x40
    [<ffffffff81172a96>] sys_lseek+0x66/0x80
    [<ffffffff814de17e>] ? do_device_not_available+0xe/0x10
    [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
    INFO: task fio:5128 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    fio           D 0000000000000003     0  5128   5123 0x00000000
    ffff8800baab3e58 0000000000000086 0000000000000000 ffff8800baab3dd0
    ffffffff81103c76 ffff8800bcbc4040 ffff8801a81c0b40 0000000100274b2f
    ffff8800bcbc45f8 ffff8800baab3fd8 000000000000f598 ffff8800bcbc45f8
    Call Trace:
    [<ffffffff81103c76>] ? __perf_event_task_sched_out+0x36/0x50
    [<ffffffff814dc6be>] __mutex_lock_slowpath+0x13e/0x180
    [<ffffffff814dc55b>] mutex_lock+0x2b/0x50
    [<ffffffff811728e6>] generic_file_llseek+0x36/0x70
    [<ffffffff811712ea>] vfs_llseek+0x3a/0x40
    [<ffffffff81172a96>] sys_lseek+0x66/0x80
    [<ffffffff814de17e>] ? do_device_not_available+0xe/0x10
    [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
    INFO: task fio:5131 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    fio           D 0000000000000003     0  5131   5123 0x00000000
    ffff8800bab75e58 0000000000000082 ffff8800bab75db8 ffffffffa034aa0f
    ffff8800bab75ee8 ffffffff8117256a ffff8800bab75de8 ffffffff81050be3
    ffff8800bb26daf8 ffff8800bab75fd8 000000000000f598 ffff8800bb26daf8
    Call Trace:
    [<ffffffffa034aa0f>] ? xfs_file_aio_read+0x5f/0x70 [xfs]
    [<ffffffff8117256a>] ? do_sync_read+0xfa/0x140
    [<ffffffff81050be3>] ? enqueue_task_fair+0x43/0x90
    [<ffffffff810571ee>] ? activate_task+0x2e/0x40
    [<ffffffff814dc6be>] __mutex_lock_slowpath+0x13e/0x180
    [<ffffffff814dc55b>] mutex_lock+0x2b/0x50
    [<ffffffff811728e6>] generic_file_llseek+0x36/0x70
    [<ffffffff811712ea>] vfs_llseek+0x3a/0x40
    [<ffffffff81172a96>] sys_lseek+0x66/0x80
    [<ffffffff814de17e>] ? do_device_not_available+0xe/0x10
    [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
    INFO: task fio:5132 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    fio           D 0000000000000003     0  5132   5123 0x00000000
    ffff8800bb129e58 0000000000000086 ffff8800bb129df8 ffffffff81138318
    0000000000000062 ffff8800a99656b0 ffff8800bb129e28 0000000000000008
    ffff8800baa83078 ffff8800bb129fd8 000000000000f598 ffff8800baa83078
    Call Trace:
    [<ffffffff81138318>] ? handle_mm_fault+0x1d8/0x2a0
    [<ffffffff814dc6be>] __mutex_lock_slowpath+0x13e/0x180
    [<ffffffff814dc55b>] mutex_lock+0x2b/0x50
    [<ffffffff811728e6>] generic_file_llseek+0x36/0x70
    [<ffffffff811712ea>] vfs_llseek+0x3a/0x40
    [<ffffffff81172a96>] sys_lseek+0x66/0x80
    [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
    

Environment

  • Red Hat Enterprise Linux 6.5
  • Red Hat Enterprise Linux 7.1

When running applications that take advantage of zero-copy operations in the Linux kernel including:

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content