Deadlock in XFS splice on RHEL
Issue
-
I noticed an issue on our production systems that would manifest as some processes becoming blocked in an uninterruptible sleep inside filesystem code. On further examination using "crash", the blocked processes appeared to be deadlocked due to a locking order violation in the XFS kernel module. Once this occurs, the problem can be fixed only by a hard reset of the machine.
The system logs contained hung task timeout messages
INFO: task fio:5127 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. fio D 0000000000000005 0 5127 5123 0x00000000 ffff8800bab6de58 0000000000000082 ffff8800bab6ddb8 ffffffffa034aa0f ffff8800bab6dee8 ffffffff8117256a ffff8800bab6dde8 ffffffff81050be3 ffff8800bcb16638 ffff8800bab6dfd8 000000000000f598 ffff8800bcb16638 Call Trace: [<ffffffffa034aa0f>] ? xfs_file_aio_read+0x5f/0x70 [xfs] [<ffffffff8117256a>] ? do_sync_read+0xfa/0x140 [<ffffffff81050be3>] ? enqueue_task_fair+0x43/0x90 [<ffffffff810571ee>] ? activate_task+0x2e/0x40 [<ffffffff814dc6be>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff814dc55b>] mutex_lock+0x2b/0x50 [<ffffffff811728e6>] generic_file_llseek+0x36/0x70 [<ffffffff811712ea>] vfs_llseek+0x3a/0x40 [<ffffffff81172a96>] sys_lseek+0x66/0x80 [<ffffffff814de17e>] ? do_device_not_available+0xe/0x10 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b INFO: task fio:5128 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. fio D 0000000000000003 0 5128 5123 0x00000000 ffff8800baab3e58 0000000000000086 0000000000000000 ffff8800baab3dd0 ffffffff81103c76 ffff8800bcbc4040 ffff8801a81c0b40 0000000100274b2f ffff8800bcbc45f8 ffff8800baab3fd8 000000000000f598 ffff8800bcbc45f8 Call Trace: [<ffffffff81103c76>] ? __perf_event_task_sched_out+0x36/0x50 [<ffffffff814dc6be>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff814dc55b>] mutex_lock+0x2b/0x50 [<ffffffff811728e6>] generic_file_llseek+0x36/0x70 [<ffffffff811712ea>] vfs_llseek+0x3a/0x40 [<ffffffff81172a96>] sys_lseek+0x66/0x80 [<ffffffff814de17e>] ? do_device_not_available+0xe/0x10 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b INFO: task fio:5131 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. fio D 0000000000000003 0 5131 5123 0x00000000 ffff8800bab75e58 0000000000000082 ffff8800bab75db8 ffffffffa034aa0f ffff8800bab75ee8 ffffffff8117256a ffff8800bab75de8 ffffffff81050be3 ffff8800bb26daf8 ffff8800bab75fd8 000000000000f598 ffff8800bb26daf8 Call Trace: [<ffffffffa034aa0f>] ? xfs_file_aio_read+0x5f/0x70 [xfs] [<ffffffff8117256a>] ? do_sync_read+0xfa/0x140 [<ffffffff81050be3>] ? enqueue_task_fair+0x43/0x90 [<ffffffff810571ee>] ? activate_task+0x2e/0x40 [<ffffffff814dc6be>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff814dc55b>] mutex_lock+0x2b/0x50 [<ffffffff811728e6>] generic_file_llseek+0x36/0x70 [<ffffffff811712ea>] vfs_llseek+0x3a/0x40 [<ffffffff81172a96>] sys_lseek+0x66/0x80 [<ffffffff814de17e>] ? do_device_not_available+0xe/0x10 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b INFO: task fio:5132 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. fio D 0000000000000003 0 5132 5123 0x00000000 ffff8800bb129e58 0000000000000086 ffff8800bb129df8 ffffffff81138318 0000000000000062 ffff8800a99656b0 ffff8800bb129e28 0000000000000008 ffff8800baa83078 ffff8800bb129fd8 000000000000f598 ffff8800baa83078 Call Trace: [<ffffffff81138318>] ? handle_mm_fault+0x1d8/0x2a0 [<ffffffff814dc6be>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff814dc55b>] mutex_lock+0x2b/0x50 [<ffffffff811728e6>] generic_file_llseek+0x36/0x70 [<ffffffff811712ea>] vfs_llseek+0x3a/0x40 [<ffffffff81172a96>] sys_lseek+0x66/0x80 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
Environment
- Red Hat Enterprise Linux 6.5
- Red Hat Enterprise Linux 7.1
When running applications that take advantage of zero-copy operations in the Linux kernel including:
- Ruby 2.2.2p95 using a standard logging gem
- Apache Kafka
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.