RHEL6: nfsiod soft lockup messages while running iozone, or soft lockups with worker_thread

Solution Unverified - Updated -

Issue

  • soft lockups inside kworker -> worker_thread functions due to a very long list of work items
  • nfsiod soft lockup messages.
kernel:BUG: soft lockup - CPU#6 stuck for 67s! [nfsiod:3172]
  • We did not reboot the system or take a vmcore
  • The nfsiod backtraces indicate a code path freeing an RPC task, either NFS write or commit operations are completing. This is evidenced by the backtrace containing the following symbols
    1. For writes: rpc_release_calldata, nfs_writeback_release_common
    2. For commits: rpc_release_calldata, nfs_commit_release
  • We had a few iozones running on NFSv4 mounts with default mount options. The iozones we were doing were write tests, so for example we started 4 of these simultaneously, changing the filenames as necessary: iozone -m -R -s 400g -w -r 128k -i 0 -t 4 -F F1 F2 F3 F4 &
  • Our clients have 1TB of memory, and since we use the default mount options we get async writes to the share and alot of buffered IO due to the large memory config on the client, this may be significant in this case.
  • It is true that the nfsiod messages appeared but cleared on their own, they didn't seem to cause any noticeable problem and eventually the iozone tests finished. Therefore it didn't seem that nfsiod was permanently hung.

Environment

  • Red Hat Enterprise Linux 6 (NFS client)
    • kernel prior to kernel-2.6.32-642.el6
    • seen on kernels 2.6.32-431.23.3.el6 and 2.6.32-431.29.2.el6
  • any kernel subsystem using work queues (such as SCSI or NFS client)
  • seen with NFS, iozone workload, RAM: 1 TB, CPUs: 24

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In