Persistent zero windows during periods of high numbers of writes to Netapp FAS8080 cDOT cluster

Solution In Progress - Updated -

Environment

  • Red Hat Enterprise Linux 6.4 VM
    • TCP SACK enabled
  • Netapp FAS8080 cDOT cluster

Issue

  • During heavy ingest periods, We are seeing a TCP zero window deadlock situation during NFS communication between RH 6.4 VMs and Netapp FAS8080 cDOT cluster.

Resolution

  • It is believed that this is a bug in the way that the NetApp file server handles sequence numbers.
  • Try and disable SACK on the RHEL side add the following line to /etc/sysctl.conf:
net.ipv4.tcp_sack=0

and execute sysctl -p. to make the change take effect, Then remount the file server. This has not been verified but may help.

Root Cause

Engineering at Netapp has stated that the frames 256078 256079, 256083, 256084, and 256085 because they have a sequence number lower than the maximum already sent are causing some kind of internal deadlock which is preventing the file server from clearing the zero window. However the lower sequence numbers are mandated by the window size sent by the NetApp file server.

Diagnostic Steps

The following trace, taken on the NFS client (RHEL), shows the build up to the problem. In frame 255802 the NFS client sends bytes 452935648 through (but not including) 452999360, If you look at the ACKs from 192.168.1.10 you will see that the right edge of the SACK block is 452999360 so those bytes were received OK. But you can see the left edge of the SACK block starts at 452512088. In frame 255874 the client sends bytes 452510640 through 452512088 - the left edge of the SACK block. Receipt of this block will fill the gap identified by SACK. Frames 256057 through 256065 is the NetApp file server sending back data. Frames 256070 through 256077 is the NetApp file server ACKing data from the client. You can see ACK number going up. You can also see the window going down. The window shown is unscaled. the scale is 7 so you need to multiple the value by 128.

In frame 256078 and 256079 we see the client send 2 ACKs with a sequence number that is lower than the maximum it has already sent. These are not out of order packets. We can tell that because of the TS value is greater than the value in frame 255802. Also 256078 is ACKing the segment in frame 256057 and 256079 is ACKing the segment in frame 256059. The lower sequence numbers are required because the NetApp file server's window is shrinking. For example frame 256057 is ACKing sequnce number 452480232 with a window of 4055 so the maximum sequence number that frame 256078 can send is 452480232 + (128 * 4055) = 452999272, which is what it sends.

In frames 256080 and 256081 the NetApp file server sends 2 more ACKs, again note the window going down.

In frame 256082 the NetApp file server ACKs the segment in frame 255874. We know this because the ACK value matches the right edge of the SACK block and the SACK block is no longer in the frame. Also the segment's TS echo value matches the the segment's TS value in frame 255874. Note also the window is at 0.

The next 3 frames, 256083, 84 and 85 are all from the client, they ACK the segments in frames 256061, 63, and 65 again their sequence numbers are lower than the maximum sequence number of 452999360 but again that is required because of the shrinking window. The sequence number in frame 256085 is 452999312 and it is ACKing frame 256065 with an ACK of 452481680 and a window 4044 so the maximum sequence number that can be used is 452481680 + (4044 * 128) is 52999312 which is what is used.

Frames 256086 through 256090 are all ACKs from the NetApp file server, triggered by the lower sequence numbers.

At this point there is a cycle of window probes from the client and a response -- with a zero window from the NFS server.

$ tshark -r performance.pcap20  -Y "frame.number == 255802 || frame.number == 255874 || ( frame.number >= 256070 && frame.number <= 256100)" -T fields -e frame.number -e ip.src -e tcp.seq -e tcp.ack -e tcp.nxtseq -e tcp.window_size -e tcp.options.sack_le -e tcp.options.sack_re -e tcp.options.timestamp.tsval -e tcp.options.timestamp.tsecr

frame   IP Src      Seq     ACK     NxtSeq      Window  sack.le     sackl.re    TS value    TS echo
255802  192.168.12.97   452935648   1772245499  452999360   13032                   216769380   3157324732
. . .
255874  192.168.12.97   452510640   1772258491  452512088   13032                   216769388   3157324747
. . . 
256057  192.168.1.10    1772259927  452480232   1772261363  4055    452512088   452999360   3157324754  216769380
256059  192.168.1.10    1772262799  452480232   1772264235  4055    452512088   452999360   3157324754  216769380
256061  192.168.1.10    1772265671  452480232   1772267107  4055    452512088   452999360   3157324754  216769380
256063  192.168.1.10    1772268543  452480232   1772269979  4055    452512088   452999360   3157324754  216769380
256065  192.168.1.10    1772271415  452481680   1772271523  4044    452512088   452999360   3157324754  216769388
256070  192.168.1.10    1772271523  452490368           3976    452512088   452999360   3157324754  216769388
256071  192.168.1.10    1772271523  452491816           3965    452512088   452999360   3157324754  216769388
256072  192.168.1.10    1772271523  452494712           3942    452512088   452999360   3157324754  216769388
256073  192.168.1.10    1772271523  452496160           3931    452512088   452999360   3157324754  216769388
256074  192.168.1.10    1772271523  452499056           3908    452512088   452999360   3157324754  216769388
256075  192.168.1.10    1772271523  452501952           3886    452512088   452999360   3157324754  216769388
256076  192.168.1.10    1772271523  452503400           3874    452512088   452999360   3157324754  216769388
256077  192.168.1.10    1772271523  452504848           3863    452512088   452999360   3157324754  216769388
256078  192.168.12.97   452999272   1772261363          13032                   216769395   3157324754
256079  192.168.12.97   452999272   1772264235          13032                   216769395   3157324754
256080  192.168.1.10    1772271523  452507744           3840    452512088   452999360   3157324754  216769388
256081  192.168.1.10    1772271523  452510640           3818    452512088   452999360   3157324754  216769388
256082  192.168.1.10    1772271523  452999360           0                   3157324754  216769388
256083  192.168.12.97   452999272   1772267107          13032                   216769395   3157324754
256084  192.168.12.97   452999272   1772269979          13032                   216769395   3157324754
256085  192.168.12.97   452999312   1772271523          13032                   216769395   3157324754
256086  192.168.1.10    1772271523  452999360           0                   3157324757  216769388
256087  192.168.1.10    1772271523  452999360           0                   3157324757  216769388
256088  192.168.1.10    1772271523  452999360           0                   3157324757  216769388
256089  192.168.1.10    1772271523  452999360           0                   3157324757  216769388
256090  192.168.1.10    1772271523  452999360           0                   3157324757  216769388
256091  192.168.12.97   452999359   1772271523          13032                   216769607   3157324757
256092  192.168.1.10    1772271523  452999360           0                   3157324966  216769388
256093  192.168.12.97   452999359   1772271523          13032                   216770027   3157324966
256094  192.168.1.10    1772271523  452999360           0                   3157325387  216769388
256095  192.168.12.97   452999359   1772271523          13032                   216770868   3157325387
256096  192.168.1.10    1772271523  452999360           0                   3157326227  216769388
256097  192.168.12.97   452999359   1772271523          13032                   216772551   3157326227
256098  192.168.1.10    1772271523  452999360           0                   3157327910  216769388
256099  192.168.12.97   452999359   1772271523          13032                   216775911   3157327910
256100  192.168.1.10    1772271523  452999360           0                   3157331270  216769388
$ 

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments