RHEL6: Systems hangs when an application uses direct I/O on XFS

Solution Verified - Updated -

Issue

  • We seem to be hitting a issue similar to bug #695827 with rhel6 and xfs where directIO writes from the database become blocked.
  • I can see many xfs related kernel trace messages in the messages file (attached).The blocked process doesn't come back to normal for a long time and we mostly had to reboot the server.
  • The same program runs fine with RHEL6 and ext4.
  • The same program runs fine on RHEL5.6 and xfs.
  • Workload / test which triggers hang
    • The IO workload is "4k, random, write only, 12 threads, directio"
    • Workload is 99% writes, random 4k (page writes) across files and within the same file, with parallelism
    • lots of parallelism, writes to the same file (gut feel is bug is related to parallelism)
    • log files: opened in non-direct mode, append / read (fairly small writes 64k); append to the end, read chunks from the middle
    • non-log files: opened in DIRECT mode; random 4k writes, often to same file, a lot of parallelism
  • Reproducibility
    • can reproduce it after about an hour 9 times out of 10
    • unable to write a simplified, synthetic test program to trigger the hang
    • Reproduced on local SSD (probaby 2TB size of the volume), reproduced w/out multipath, different storage, etc. Same test runs fine with EXT4.
    • in all cases they were using a fairly large striped LVM LV underneath XFS but the storage varied

Environment

  • Red Hat Enterprise Linux 6.2 - 6.3
    • Any kernel prior to 2.6.32-279.19.1.el6
    • Seen on 2.6.32-220.2.1.el6, 2.6.32-279.5.2.el6

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content