RHEL6: wget/ncat fails after a number of iterations
Environment
- Red hat Enterprise Linux 6.2
- kernel-2.6.32-358.18.1.el6
- arista switch single 10g uplink
- Greenplum Database
- 3rd party bna NIC driver
driver: bna
version: 3.2.0.0
firmware-version: 3.2.0.0
Issue
Our data loading application on the Greenplum cluster is failing due to network issues. We have narrowed down the issue to a simple wget - apache httpd connectivity issue over HTTP (our db app using HTTP for data loading over the network).
the test is simple. We run this one liner and the connection fails
for i in {1..100}; do wget http://100.x.x.x:80yy/test_file ; done | tee -a output
wget is stuck here :
--2014-12-29 17:15:57-- http://100.xx.xx.xx:80yy/test.dat
Connecting to 100.xx.xx.xx:80yy... connected.
HTTP request sent, awaiting response...
Resolution
- At this stage we suspect that there is problem somewhere on the network or NIC when packets are getting dropped.
- Try using the standard RHEL NIC driver.
Root Cause
At some stage all incoming packets from the server are dropped. But we are unsure what device is dropping the packets.
Diagnostic Steps
- We have a packet capture at the server end and at the RHEL client end. It shows that when this occurs the client is not receiving any packets from the server after the 3-way handshake. However the server is receiving all packets from the client. Packets on the clients ingress path are getting dropped.
5850 12:27:22.610132 0.000000 100.10.xx.xx 100.12.xx.xx TCP 76 54116 > 5000 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=1093540309 TSecr=0 WS=256
5853 12:27:22.610188 0.000056 100.12.xx.xx 100.10.xx.xx TCP 68 5000 > 54116 [SYN, ACK] Seq=0 Ack=1 Win=14600 Len=0 MSS=1460 SACK_PERM=1 WS=128
5854 12:27:22.610201 0.000013 100.10.xx.xx 100.12.xx.xx TCP 56 54116 > 5000 [ACK] Seq=1 Ack=1 Win=14848 Len=0
9732 12:28:49.172001 86.561800 100.10.xx.xx 100.12.xx.xx TCP 56 54116 > 5000 [FIN, ACK] Seq=1 Ack=1 Win=14848 Len=0
9739 12:28:49.372873 0.200872 100.10.xx.xx 100.12.xx.xx TCP 56 [TCP Retransmission] 54116 > 5000 [FIN, ACK] Seq=1 Ack=1 Win=14848 Len=0
9745 12:28:49.774942 0.402069 100.10.xx.xx 100.12.xx.xx TCP 56 [TCP Retransmission] 54116 > 5000 [FIN, ACK] Seq=1 Ack=1 Win=14848 Len=0
- Run a packet capture at all points (switches, NICs and hosts). Then determine exactly where the packets are dropped and rectify that device.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments