RHEL5: soft lockup in nfs4_reclaim_open_state called from reclaimer after NFSv4 server became unavailable
Issue
- NFSv4 client hung after the NFSv4 server, a NetApp Vfiler head, went through failback, with a large number of processes entering an uninterruptable state.
- System had a very high load average, due to the many processes in uninterruptible state.
- Messages similar to the following are seen in the log
Dec 21 21:21:29 linux kernel: nfs4_reclaim_open_state: unhandled error -116. Zeroing state
Dec 21 21:21:29 linux kernel: nfs4_reclaim_open_state: unhandled error -10026. Zeroing state
- Kernel oops message similar to the following:
BUG: soft lockup - CPU#2 stuck for 60s! [10.52.18.23-rec:11854]
...
Pid: 11854, comm: 5.25.81.32-rec Not tainted 2.6.18-238.9.1.el5 #1
RIP: 0010:[<ffffffff88614dfa>] [<ffffffff88614dfa>] :nfs:nfs4_reclaim_open_state+0x135/0x150
...
Call Trace:
ffffffff88614fb9 :nfs:reclaimer+0x1a4/0x2ac
ffffffff88614e15 :nfs:reclaimer+0x0/0x2ac
ffffffff80032afc kthread+0xfe/0x132
ffffffff8005dfb1 child_rip+0xa/0x11
ffffffff800a26db keventd_create_kthread+0x0/0xc4
ffffffff800329fe kthread+0x0/0x132
Environment
- Red Hat Enterprise Linux (RHEL) 5
- NFSv4 client
- Often seen with NetApp filers, and Ontap 8.1.2P3 7-Mode
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.