Persistent zero windows during periods of high numbers of writes to Netapp FAS8080 cDOT cluster
Environment
- Red Hat Enterprise Linux 6.4 VM
- TCP SACK enabled
- Netapp FAS8080 cDOT cluster
Issue
- During heavy ingest periods, We are seeing a TCP zero window deadlock situation during NFS communication between RH 6.4 VMs and Netapp FAS8080 cDOT cluster.
Resolution
- It is believed that this is a bug in the way that the NetApp file server handles sequence numbers.
- Try and disable SACK on the RHEL side add the following line to /etc/sysctl.conf:
net.ipv4.tcp_sack=0
and execute sysctl -p
. to make the change take effect, Then remount the file server. This has not been verified but may help.
Root Cause
Engineering at Netapp has stated that the frames 256078 256079, 256083, 256084, and 256085 because they have a sequence number lower than the maximum already sent are causing some kind of internal deadlock which is preventing the file server from clearing the zero window. However the lower sequence numbers are mandated by the window size sent by the NetApp file server.
Diagnostic Steps
The following trace, taken on the NFS client (RHEL), shows the build up to the problem. In frame 255802 the NFS client sends bytes 452935648 through (but not including) 452999360, If you look at the ACKs from 192.168.1.10 you will see that the right edge of the SACK block is 452999360 so those bytes were received OK. But you can see the left edge of the SACK block starts at 452512088. In frame 255874 the client sends bytes 452510640 through 452512088 - the left edge of the SACK block. Receipt of this block will fill the gap identified by SACK. Frames 256057 through 256065 is the NetApp file server sending back data. Frames 256070 through 256077 is the NetApp file server ACKing data from the client. You can see ACK number going up. You can also see the window going down. The window shown is unscaled. the scale is 7 so you need to multiple the value by 128.
In frame 256078 and 256079 we see the client send 2 ACKs with a sequence number that is lower than the maximum it has already sent. These are not out of order packets. We can tell that because of the TS value is greater than the value in frame 255802. Also 256078 is ACKing the segment in frame 256057 and 256079 is ACKing the segment in frame 256059. The lower sequence numbers are required because the NetApp file server's window is shrinking. For example frame 256057 is ACKing sequnce number 452480232 with a window of 4055 so the maximum sequence number that frame 256078 can send is 452480232 + (128 * 4055) = 452999272, which is what it sends.
In frames 256080 and 256081 the NetApp file server sends 2 more ACKs, again note the window going down.
In frame 256082 the NetApp file server ACKs the segment in frame 255874. We know this because the ACK value matches the right edge of the SACK block and the SACK block is no longer in the frame. Also the segment's TS echo value matches the the segment's TS value in frame 255874. Note also the window is at 0.
The next 3 frames, 256083, 84 and 85 are all from the client, they ACK the segments in frames 256061, 63, and 65 again their sequence numbers are lower than the maximum sequence number of 452999360 but again that is required because of the shrinking window. The sequence number in frame 256085 is 452999312 and it is ACKing frame 256065 with an ACK of 452481680 and a window 4044 so the maximum sequence number that can be used is 452481680 + (4044 * 128) is 52999312 which is what is used.
Frames 256086 through 256090 are all ACKs from the NetApp file server, triggered by the lower sequence numbers.
At this point there is a cycle of window probes from the client and a response -- with a zero window from the NFS server.
$ tshark -r performance.pcap20 -Y "frame.number == 255802 || frame.number == 255874 || ( frame.number >= 256070 && frame.number <= 256100)" -T fields -e frame.number -e ip.src -e tcp.seq -e tcp.ack -e tcp.nxtseq -e tcp.window_size -e tcp.options.sack_le -e tcp.options.sack_re -e tcp.options.timestamp.tsval -e tcp.options.timestamp.tsecr
frame IP Src Seq ACK NxtSeq Window sack.le sackl.re TS value TS echo
255802 192.168.12.97 452935648 1772245499 452999360 13032 216769380 3157324732
. . .
255874 192.168.12.97 452510640 1772258491 452512088 13032 216769388 3157324747
. . .
256057 192.168.1.10 1772259927 452480232 1772261363 4055 452512088 452999360 3157324754 216769380
256059 192.168.1.10 1772262799 452480232 1772264235 4055 452512088 452999360 3157324754 216769380
256061 192.168.1.10 1772265671 452480232 1772267107 4055 452512088 452999360 3157324754 216769380
256063 192.168.1.10 1772268543 452480232 1772269979 4055 452512088 452999360 3157324754 216769380
256065 192.168.1.10 1772271415 452481680 1772271523 4044 452512088 452999360 3157324754 216769388
256070 192.168.1.10 1772271523 452490368 3976 452512088 452999360 3157324754 216769388
256071 192.168.1.10 1772271523 452491816 3965 452512088 452999360 3157324754 216769388
256072 192.168.1.10 1772271523 452494712 3942 452512088 452999360 3157324754 216769388
256073 192.168.1.10 1772271523 452496160 3931 452512088 452999360 3157324754 216769388
256074 192.168.1.10 1772271523 452499056 3908 452512088 452999360 3157324754 216769388
256075 192.168.1.10 1772271523 452501952 3886 452512088 452999360 3157324754 216769388
256076 192.168.1.10 1772271523 452503400 3874 452512088 452999360 3157324754 216769388
256077 192.168.1.10 1772271523 452504848 3863 452512088 452999360 3157324754 216769388
256078 192.168.12.97 452999272 1772261363 13032 216769395 3157324754
256079 192.168.12.97 452999272 1772264235 13032 216769395 3157324754
256080 192.168.1.10 1772271523 452507744 3840 452512088 452999360 3157324754 216769388
256081 192.168.1.10 1772271523 452510640 3818 452512088 452999360 3157324754 216769388
256082 192.168.1.10 1772271523 452999360 0 3157324754 216769388
256083 192.168.12.97 452999272 1772267107 13032 216769395 3157324754
256084 192.168.12.97 452999272 1772269979 13032 216769395 3157324754
256085 192.168.12.97 452999312 1772271523 13032 216769395 3157324754
256086 192.168.1.10 1772271523 452999360 0 3157324757 216769388
256087 192.168.1.10 1772271523 452999360 0 3157324757 216769388
256088 192.168.1.10 1772271523 452999360 0 3157324757 216769388
256089 192.168.1.10 1772271523 452999360 0 3157324757 216769388
256090 192.168.1.10 1772271523 452999360 0 3157324757 216769388
256091 192.168.12.97 452999359 1772271523 13032 216769607 3157324757
256092 192.168.1.10 1772271523 452999360 0 3157324966 216769388
256093 192.168.12.97 452999359 1772271523 13032 216770027 3157324966
256094 192.168.1.10 1772271523 452999360 0 3157325387 216769388
256095 192.168.12.97 452999359 1772271523 13032 216770868 3157325387
256096 192.168.1.10 1772271523 452999360 0 3157326227 216769388
256097 192.168.12.97 452999359 1772271523 13032 216772551 3157326227
256098 192.168.1.10 1772271523 452999360 0 3157327910 216769388
256099 192.168.12.97 452999359 1772271523 13032 216775911 3157327910
256100 192.168.1.10 1772271523 452999360 0 3157331270 216769388
$
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments