The 'upstream connect error or disconnect/reset before headers. reset reason: connection failure' error when using javaagent in OpenShift infrastructure

Comments

We are running a jaeger collector (1.48.1) in OpenShift, and we send the telemetry from our Java Spring applications using opentelemetry javaagent ver 1.28.0. We are using service endpoints in OpenShift as in http://jaeger-collector-headless.qa-app-monitoring.svc:4317, and the issue is when the collector pod restarts the javaagent can't reconnect again. What happens instead the javaagent reports the error below continuously:

ERROR io.opentelemetry.exporter.internal.grpc.OkHttpGrpcExporter - Failed to export spans. Server is UNAVAILABLE. Make sure your collector is running and reachable from this network. Full error message: upstream connect error or disconnect/reset before headers. reset reason: connection failure

What could be the cause of the issue in terms of how the error maps to the networking/infrastructure problems?

If I simulate the situation locally, i.e. I run the local jaeger-all-in-one container then drop it, restart it, the error is different, and the javaagent restores the connection successfully, that's the error on local that I get:

[otel.javaagent 2023-12-27 20:19:37:802 -0800] [OkHttp http://localhost:4317/...] ERROR io.opentelemetry.exporter.internal.grpc.OkHttpGrpcExporter - Failed to export spans. The request could not be executed. Full error message: Failed to connect to localhost/0:0:0:0:0:0:0:1:4317

It's straightforward. It can't connect as the container is intentionally down, then it restores the connection as soon as I restart it.

What is different with the "upstream connect error or disconnect/reset before headers. reset reason: connection failure" compare to "Failed to connect to localhost/0:0:0:0:0:0:0:1:4317" error?

PS: To the javaagent we are passing OTEL_TRACES_EXPORTER=otlp OTEL_METRICS_EXPORTER=none OTEL_EXPORTER_OTLP_ENDPOINT=http://our-openshift.svc:4317 OTEL_EXPORTER_OTLP_PROTOCOL=grpc

Started 2024-01-11T01:10:18+00:00 by

Ary Ivano

Newbie 5 points

Select Your Language

The 'upstream connect error or disconnect/reset before headers. reset reason: connection failure' error when using javaagent in OpenShift infrastructure

Responses

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Responses

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links