Fstrim running on a Ceph backed storage can lead to hung task panics

Solution Verified - Updated -

Issue

Running fstrim on a system with storage backed by Ceph volumes can sometimes lead to delays of IO completion and consequently to hung task watchdog pulling the trigger and crash the system if hung_task_panic kernel tunable is enabled.

The symptoms look like a typical hung task timeout followed by a panic as seen in the console log:

[102699.499367] INFO: task kworker/34:4:192928 blocked for more than 622 seconds.
[102699.500333]       Not tainted 5.14.0-284.11.1.el9_2.x86_64 #1
[102699.501080] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102699.501808] task:kworker/34:4    state:D stack:    0 pid:192928 ppid:     2 flags:0x00004000
[102699.502537] Workqueue: xfs-inodegc/dm-2 xfs_inodegc_worker [xfs]
[102699.503347] Call Trace:
[102699.504031]  <TASK>
[102699.504712]  __schedule+0x248/0x620
[102699.505394]  schedule+0x5a/0xc0
[102699.506067]  schedule_timeout+0x11d/0x160
[102699.506731]  ? dequeue_skb+0x80/0x500
[102699.507402]  ? xfs_buf_rele+0x53/0x2d0 [xfs]
[102699.508128]  __down+0x89/0xe0
[102699.508777]  down+0x43/0x60
[102699.509420]  xfs_buf_lock+0x2d/0xe0 [xfs]
[102699.510133]  xfs_buf_find+0x17e/0x380 [xfs]
[102699.510843]  xfs_buf_get_map+0x46/0x3d0 [xfs]
[102699.511546]  ? xfs_trans_read_buf_map+0xa4/0x300 [xfs]
[102699.512258]  ? xfs_bmbt_init_high_key_from_rec+0x30/0x30 [xfs]
[102699.512951]  xfs_buf_read_map+0x54/0x290 [xfs]
[102699.513655]  ? xfs_read_agf+0x87/0x120 [xfs]
[102699.514344]  xfs_trans_read_buf_map+0x133/0x300 [xfs]
[102699.515046]  ? xfs_read_agf+0x87/0x120 [xfs]
[102699.515761]  xfs_read_agf+0x87/0x120 [xfs]
[102699.516480]  xfs_alloc_read_agf+0x30/0xb0 [xfs]
[102699.517155]  xfs_alloc_fix_freelist+0x3cd/0x500 [xfs]
[102699.517835]  ? xfs_btree_delrec+0xdfd/0x10a0 [xfs]
[102699.518510]  ? xfs_btree_read_buf_block.constprop.0+0xaf/0xd0 [xfs]
[102699.519179]  xfs_free_extent_fix_freelist+0x61/0xa0 [xfs]
[102699.519846]  __xfs_free_extent+0x72/0x1c0 [xfs]
[102699.520509]  xfs_trans_free_extent+0x3d/0x100 [xfs]
[102699.521223]  xfs_extent_free_finish_item+0x69/0xa0 [xfs]
[102699.521897]  xfs_defer_finish_one+0xd5/0x240 [xfs]
[102699.522561]  xfs_defer_finish_noroll+0xb5/0x260 [xfs]
[102699.523219]  xfs_defer_finish+0x11/0x70 [xfs]
[102699.523864]  xfs_itruncate_extents_flags+0xca/0x250 [xfs]
[102699.524526]  xfs_inactive_truncate+0xab/0xe0 [xfs]
[102699.525178]  xfs_inactive+0x154/0x170 [xfs]
[102699.525829]  xfs_inodegc_worker+0x78/0x140 [xfs]
[102699.526474]  process_one_work+0x1e8/0x3c0
[102699.527029]  ? rescuer_thread+0x3a0/0x3a0
[102699.527581]  worker_thread+0x50/0x3b0
[102699.528120]  ? rescuer_thread+0x3a0/0x3a0
[102699.528652]  kthread+0xd9/0x100
[102699.529174]  ? kthread_complete_and_exit+0x20/0x20
[102699.529710]  ret_from_fork+0x22/0x30
[102699.530233]  </TASK>
[102699.530742] Kernel panic - not syncing: hung_task: blocked tasks
[102699.531234] CPU: 8 PID: 316 Comm: khungtaskd Kdump: loaded Not tainted 5.14.0-284.11.1.el9_2.x86_64 #1
[102699.531737] Hardware name: Supermicro Super Server/H11SSL-NC, BIOS 1.0b 04/27/2018
[102699.532238] Call Trace:
[102699.532722]  <TASK>
[102699.533191]  dump_stack_lvl+0x34/0x48
[102699.533666]  panic+0xf4/0x2c6
[102699.534142]  check_hung_uninterruptible_tasks.cold+0xc/0xc
[102699.534640]  ? check_hung_uninterruptible_tasks+0x2d0/0x2d0
[102699.535132]  watchdog+0x9a/0xa0
[102699.535620]  kthread+0xd9/0x100
[102699.536102]  ? kthread_complete_and_exit+0x20/0x20
[102699.536592]  ret_from_fork+0x22/0x30
[102699.537080]  </TASK>

It is not immediately obvious why the kernel worker doing cleanup in XFS filesystem got victimized.
There was an fstrim processing going on the same storage hosting that XFS filesystem at the time.
See the diagnostics steps for more information.

Environment

  • Red Hat Enterprise Linux 8, 9
  • File systems hosted by a large Ceph storage (RBD), in this case XFS was used
  • hung_task_panic kernel tunable is enabled
  • fstrim is active

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content