Troubleshooting "failed to WebSocket dial: expected handshake response status code 101 but got 500. Retrying" error message in the RHACS Sensor logs
Environment
- StackRox Version - 3.0.52.1.
- Orchestrator - Amazon Elastic Kubernetes Service (Amazon EKS).
- Cloud Provider - Amazon Web Services (AWS)
Issue
If you are conducting a deployment of the secured-cluster-services
components (Sensor + Collector) to remote Kubernetes cluster and you are using the WebSocket Secure protocol > wss
to connect to the Central endpoint , you may see the following error message in the Sensor logs;
2021-01-04T18:05:29.200609287Z common/sensor: 2021/01/04 18:05:29.200485 sensor.go:273: Info: Check Central status failed: rpc error: code = Unavailable desc = transport: connecting to gRPC server "https://<ROX_ENDPOINT>:443/v1.MetadataService/GetMetadata": failed to WebSocket dial: expected handshake response status code 101 but got 500. Retrying...
Resolution
- If you are using NGINX as a HTTP Load Balancer behind an Amazon Network Load Balancer - NLB, a recommendation would be to check and ensure that your
nginx-config.yaml
is explicitly configured to support thewss
protocol using the requiredUpgrade
andConnection
headers;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
- Below is an example of an NGINX
location
block demonstrating, how to explicitly set the headers in yournginx-config.yaml
file;
location /{
proxy_pass https://central-loadbalancer.stackrox:443/;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_set_header Host $host;
}
Root Cause
The two main causes of this error message in an AWS environment is due to;
- The Amazon Classic Load Balancer - CLB not supporting the
gRPC
andwss
protocols. You can find more information on the features supported on different Amazon load balancers in AWS official documentation. The recommendation would be to use a load balancer that supportsgRPC
and thewss
protocol. - The configuration of your NGINX HTTP Load balancer has not been explicitly configured to support the WebSocket Secure protocol.
Diagnostic Steps
- Check the Sensor logs using the following command to identify the error message;
# For OpenShift users
$ oc logs <sensor_pod_name> -n stackrox
# For Kubernetes users
$ kubectl logs <sensor_pod_name> -n stackrox
- Check if you are using an Classic Load Balancer instead of a Network Load Balancer.
- Check if your NGINX HTTP load balancer configuration file
nginx-config.yaml
has the correct headers to support WebSocket Secure protocol.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments