NFS client system hangs with heavy IO load, rpciod stuck at nfs_commit_inode often from socket memory allocation

Solution Verified - Updated -

Issue

  • Randomly seeing "blocked for more than 120 seconds" and stacktrace on NFS client systems, which seems to occur during periods of heavy IO load.
  • See nfs: server [...] not responding, still trying indicating NFS client is having difficulty receiving responses from the NFS server
  • vmore shows rpciod, which processes NFS RPC task completions, can end up stuck waiting for NFS operations to complete, in nfs_commit_inode, from a memory allocation path from xs_create_sock, similar to:
PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
 #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
 #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
 #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]  <----------- waiting on NFS operations to complete
 #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]  <----------- calling back into NFS
 #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c   <------------------------------- memory allocation
#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]   <---------------------------- socket creation
#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0

Environment

  • Red Hat Enterprise Linux 6.1 - 6.3
    • kernel 2.6.32-131.*el6 up to 2.6.32-279.*el6
  • Red Hat Enterprise Linux 5
    • seen on kernel 2.6.18-194.el5
    • all versions believed to be affected
  • NFS client

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content