NFS v4.1 mount stuck, which freezes qemu-kvm
Issue
Check the kernel log and search for "task ... blocked for more than 120 seconds" using dmesg -T
or journalctl -b -t kernel
.
Here we can see that the qemu process was hung, and the stack trace reveals the NFS nfs_getattr
operation:
[Tue Dec 8 04:53:01 2020] INFO: task qemu-kvm:85258 blocked for more than 120 seconds.
[Tue Dec 8 04:53:01 2020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Dec 8 04:53:01 2020] qemu-kvm D ffff9f6936cec1c0 0 85258 1 0x000001a0
[Tue Dec 8 04:53:01 2020] Call Trace:
[Tue Dec 8 04:53:01 2020] [<ffffffff94b0598c>] ? avc_has_perm_flags+0xdc/0x1c0
[Tue Dec 8 04:53:01 2020] [<ffffffff94f7eb09>] schedule+0x29/0x70
[Tue Dec 8 04:53:01 2020] [<ffffffff94a66139>] inode_dio_wait+0xd9/0x100
[Tue Dec 8 04:53:01 2020] [<ffffffff948c6360>] ? wake_bit_function+0x40/0x40
[Tue Dec 8 04:53:01 2020] [<ffffffffc0784e46>] nfs_getattr+0x1b6/0x250 [nfs]
[Tue Dec 8 04:53:01 2020] [<ffffffff94a4e959>] vfs_getattr+0x49/0x80
[Tue Dec 8 04:53:01 2020] [<ffffffff94a4e9d5>] vfs_fstat+0x45/0x80
[Tue Dec 8 04:53:01 2020] [<ffffffff94a4ef44>] SYSC_newfstat+0x24/0x60
[Tue Dec 8 04:53:01 2020] [<ffffffff9493c8e4>] ? __audit_syscall_entry+0xb4/0x110
[Tue Dec 8 04:53:01 2020] [<ffffffff9483c07b>] ? syscall_trace_enter+0x16b/0x220
[Tue Dec 8 04:53:01 2020] [<ffffffff94a4f31e>] SyS_newfstat+0xe/0x10
[Tue Dec 8 04:53:01 2020] [<ffffffff94f8c172>] tracesys+0xa6/0xcc
The VM VGA console is not working, however on the serial console we can see the I/O errors:
[stack@undercloud.. ~]$openstack console log show 541728a1-1d6d-4878-8e84-b963decdf96f
[...]
vm login: [11179774.749141] blk_update_request: I/O error, dev vda, sector 139675920
[11179774.773597] blk_update_request: I/O error, dev vda, sector 139675920
[11179774.779591] blk_update_request: I/O error, dev vda, sector 85440272
[stack@undercloud.. ~]$
The mountpoint it is a NFS v4.1 file system:
[root@compute2 ~]# mount |grep nfs
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
192.168.1.241:/svm01_nfs_cinder_vol01 on /var/lib/nova/mnt/b673de1fa9b34054adb83e620007e491 type nfs4 (rw,relatime,vers=4.1,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.57,local_lock=none,addr=192.168.1.241)
The VM is still stuck in newfstat(2)
syscall, that is waiting for NFS subsystem. The numver 85258 is the PID of the qemu-kvm process.
[root@compute2 ~]# cat /proc/85258/stack
[<ffffffff94a66139>] inode_dio_wait+0xd9/0x100
[<ffffffffc0784e46>] nfs_getattr+0x1b6/0x250 [nfs]
[<ffffffff94a4e959>] vfs_getattr+0x49/0x80
[<ffffffff94a4e9d5>] vfs_fstat+0x45/0x80
[<ffffffff94a4ef44>] SYSC_newfstat+0x24/0x60
[<ffffffff94a4f31e>] SyS_newfstat+0xe/0x10
[<ffffffff94f8c172>] tracesys+0xa6/0xcc
[<ffffffffffffffff>] 0xffffffffffffffff
If you capture the network traffic towards the NFS server, you can see NFS v4.1 specific COMPUND operations like the one bellow
Frame 4: 4402 bytes on wire (35216 bits), 4402 bytes captured (35216 bits)
Network File System, Ops(3): SEQUENCE, PUTFH, WRITE
[Program Version: 4]
[V4 Procedure: COMPOUND (1)]
Tag: <EMPTY>
minorversion: 1
Operations (count: 3): SEQUENCE, PUTFH, WRITE
Opcode: SEQUENCE (53)
sessionid: 0000003765bb82000000000000000403
seqid: 0x00441ad2
slot id: 5
high slot id: 11
cache this?: Yes
Opcode: PUTFH (22)
Opcode: WRITE (38)
[Main Opcode: WRITE (38)]
The COMPOUND operation fails with NFS4ERR_SEQ_MISORDERED
error:
Frame 7: 114 bytes on wire (912 bits), 114 bytes captured (912 bits)
Network File System, Ops(1): SEQUENCE(NFS4ERR_SEQ_MISORDERED)
[Program Version: 4]
[V4 Procedure: COMPOUND (1)]
Status: NFS4ERR_SEQ_MISORDERED (10063)
Tag: <EMPTY>
Operations (count: 1)
Opcode: SEQUENCE (53)
Status: NFS4ERR_SEQ_MISORDERED (10063)
[Main Opcode: SEQUENCE (53)]
Environment
- Red Hat OpenStack Platform 10
- Red Hat Enterprise Linux 7
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.