Connection reset by peer happens when upload big blob local image to ROSA using EC2 as client
Environment
- Red Hat OpenShift Service on AWS (ROSA)
- 4
Issue
- Connection reset by peer happens when using skopeo copy big blob local image to ROSA image-registry using EC2 as client
Resolution
-
Issue has been investigated in OCPBUGS-17547 and OHSS-25179
-
The final solution is to change NLB instead of using CLB
(For ROSA, NLB can be set using the below KCS
https://access.redhat.com/articles/7028653
https://access.redhat.com/solutions/6197341
) -
Workaround(1), If Skopeo large image fails, bypass the route, and using svc directly will help.
-
login to ROSA cluster
$ oc login -u <user> api.example.com:6443
-
port forward the internal registry to the local machine
$ oc -n openshift-image-registry port-forward svc/image-registry 5000:5000
-
check source image
$ podman images REPOSITORY TAG IMAGE ID CREATED SIZE localhost/my-app latest xxxxxx 41 hours ago 693 MB
-
Copy the image stored in the local container storage to the ROSA registry via localhost:5000
$ skopeo copy \ --dest-creds=<user>:$(oc whoami -t) \ --dest-tls-verify=false \ containers-storage: localhost/my-app\ docker://localhost:5000/foo/rest-http
-
-
Workaround(2), Using the tc command to limit network speed will help.
-
Install the "tc" command line tool on the Ec2 instance from which "skopeo copy" is run
(if not already done). -
Then run the following command to slow down the virtual network card:
sudo tc -force qdisc add dev eth0 root tbf rate 10mbps burst 128kbit latency 5ms ( Run "ifconfig" command to make sure "eth0" is the device on which the traffic is going through) Re-run the "skopeo copy" command. The command should upload the image in a slower way... but at least the upload should succeed.)
-
Root Cause
-
Issue has been investigated in OCPBUGS-17547 and OHSS-25179
-
The CLB resets the connection between client and server for a specific blob when pushing
the problematic image to the registry.AWS support admitted this to be a known issue with CLBs.
CLB Issue in AWS
Diagnostic Steps
- Detect the below error when using skopeo copy image to ROSA
$ time skopeo copy dir:/home/ec2-user/dir/xxxxx docker://${REGISTRY}/test/xxxxx:xxx Getting image source signatures
Copying blob yyyyy [=======================>--------------] 1.0GiB / 1.6GiB
FATA[0075] writing blob: Patch "https://default-route-openshift-image-registry.apps.rosa-xxxxxx.xxxx.p1.openshiftapps.com/v2/test/dirimage/blobs/uploads/yyyyy": write tcp xx.x.xxx.xxx:45356->zz.z.zz.zzz:443: write: connection reset by peer real 1m15.618s user 0m8.939s sys 0m2.474s
- Traffic analysis showed that the CLB was sending an RST to both client (ec2) and server (haproxy).
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments