Connection reset by peer happens when upload big blob local image to ROSA using EC2 as client

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Service on AWS (ROSA)
    • 4

Issue

  • Connection reset by peer happens when using skopeo copy big blob local image to ROSA image-registry using EC2 as client

Resolution

  • Issue has been investigated in OCPBUGS-17547 and OHSS-25179

  • The final solution is to change NLB instead of using CLB
    (For ROSA, NLB can be set using the below KCS
    https://access.redhat.com/articles/7028653
    https://access.redhat.com/solutions/6197341
    )

  • Workaround(1), If Skopeo large image fails, bypass the route, and using svc directly will help.

    • login to ROSA cluster

      $ oc login -u <user> api.example.com:6443
      
    • port forward the internal registry to the local machine

      $ oc -n openshift-image-registry port-forward svc/image-registry 5000:5000
      
    • check source image

      $ podman images
        REPOSITORY TAG IMAGE ID CREATED SIZE
        localhost/my-app latest xxxxxx 41 hours ago 693 MB
      
    • Copy the image stored in the local container storage to the ROSA registry via localhost:5000

      $ skopeo copy \
        --dest-creds=<user>:$(oc whoami -t) \
        --dest-tls-verify=false \
        containers-storage: localhost/my-app\
        docker://localhost:5000/foo/rest-http
      
  • Workaround(2), Using the tc command to limit network speed will help.

    • Install the "tc" command line tool on the Ec2 instance from which "skopeo copy" is run
      (if not already done).

    • Then run the following command to slow down the virtual network card:

      sudo tc -force qdisc add dev eth0 root tbf rate 10mbps burst 128kbit latency 5ms
      ( Run "ifconfig" command to make sure "eth0" is the device on which the traffic is going through)
        Re-run the "skopeo copy" command.
        The command should upload the image in a slower way... but at least the upload should succeed.)
      

Root Cause

  • Issue has been investigated in OCPBUGS-17547 and OHSS-25179

  • The CLB resets the connection between client and server for a specific blob when pushing
    the problematic image to the registry.AWS support admitted this to be a known issue with CLBs.
    CLB Issue in AWS

Diagnostic Steps

  • Detect the below error when using skopeo copy image to ROSA
$ time skopeo copy dir:/home/ec2-user/dir/xxxxx docker://${REGISTRY}/test/xxxxx:xxx Getting image source signatures 


Copying blob yyyyy [=======================>--------------] 1.0GiB / 1.6GiB 

FATA[0075] writing blob: Patch "https://default-route-openshift-image-registry.apps.rosa-xxxxxx.xxxx.p1.openshiftapps.com/v2/test/dirimage/blobs/uploads/yyyyy": write tcp xx.x.xxx.xxx:45356->zz.z.zz.zzz:443: write: connection reset by peer real    1m15.618s user    0m8.939s sys    0m2.474s 
  • Traffic analysis showed that the CLB was sending an RST to both client (ec2) and server (haproxy).

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments