Connection reset by peer happens when upload big blob local image to ROSA using EC2 as client

Solution Verified - Updated 2024-01-16T21:22:35+00:00 -

Environment

Red Hat OpenShift Service on AWS (ROSA)
- 4

Issue

Connection reset by peer happens when using skopeo copy big blob local image to ROSA image-registry using EC2 as client

Resolution

Issue has been investigated in OCPBUGS-17547 and OHSS-25179
The final solution is to change NLB instead of using CLB
(For ROSA, NLB can be set using the below KCS
https://access.redhat.com/articles/7028653
https://access.redhat.com/solutions/6197341
)

Workaround(1), If Skopeo large image fails, bypass the route, and using svc directly will help.

$ oc login -u <user> api.example.com:6443

port forward the internal registry to the local machine

$ oc -n openshift-image-registry port-forward svc/image-registry 5000:5000

check source image

$ podman images
  REPOSITORY TAG IMAGE ID CREATED SIZE
  localhost/my-app latest xxxxxx 41 hours ago 693 MB

Copy the image stored in the local container storage to the ROSA registry via localhost:5000

$ skopeo copy \
  --dest-creds=<user>:$(oc whoami -t) \
  --dest-tls-verify=false \
  containers-storage: localhost/my-app\
  docker://localhost:5000/foo/rest-http

Workaround(2), Using the tc command to limit network speed will help.

Install the "tc" command line tool on the Ec2 instance from which "skopeo copy" is run
(if not already done).

Then run the following command to slow down the virtual network card:

sudo tc -force qdisc add dev eth0 root tbf rate 10mbps burst 128kbit latency 5ms
( Run "ifconfig" command to make sure "eth0" is the device on which the traffic is going through)
  Re-run the "skopeo copy" command.
  The command should upload the image in a slower way... but at least the upload should succeed.)

Root Cause

Issue has been investigated in OCPBUGS-17547 and OHSS-25179
The CLB resets the connection between client and server for a specific blob when pushing
the problematic image to the registry.AWS support admitted this to be a known issue with CLBs.
CLB Issue in AWS

Diagnostic Steps

Detect the below error when using skopeo copy image to ROSA

$ time skopeo copy dir:/home/ec2-user/dir/xxxxx docker://${REGISTRY}/test/xxxxx:xxx Getting image source signatures 


Copying blob yyyyy [=======================>--------------] 1.0GiB / 1.6GiB 

FATA[0075] writing blob: Patch "https://default-route-openshift-image-registry.apps.rosa-xxxxxx.xxxx.p1.openshiftapps.com/v2/test/dirimage/blobs/uploads/yyyyy": write tcp xx.x.xxx.xxx:45356->zz.z.zz.zzz:443: write: connection reset by peer real    1m15.618s user    0m8.939s sys    0m2.474s

Traffic analysis showed that the CLB was sending an RST to both client (ec2) and server (haproxy).

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Select Your Language

Connection reset by peer happens when upload big blob local image to ROSA using EC2 as client

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links