Troubleshooting "failed to WebSocket dial: expected handshake response status code 101 but got 500. Retrying" error message in the RHACS Sensor logs

Solution Verified - Updated -

Environment

  • StackRox Version - 3.0.52.1.
  • Orchestrator - Amazon Elastic Kubernetes Service (Amazon EKS).
  • Cloud Provider - Amazon Web Services (AWS)

Issue

If you are conducting a deployment of the secured-cluster-services components (Sensor + Collector) to remote Kubernetes cluster and you are using the WebSocket Secure protocol > wss to connect to the Central endpoint , you may see the following error message in the Sensor logs;

2021-01-04T18:05:29.200609287Z common/sensor: 2021/01/04 18:05:29.200485 sensor.go:273: Info: Check Central status failed: rpc error: code = Unavailable desc = transport: connecting to gRPC server "https://<ROX_ENDPOINT>:443/v1.MetadataService/GetMetadata": failed to WebSocket dial: expected handshake response status code 101 but got 500. Retrying...

Resolution

proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
  • Below is an example of an NGINX location block demonstrating, how to explicitly set the headers in your nginx-config.yaml file;
location /{
    proxy_pass https://central-loadbalancer.stackrox:443/;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "Upgrade";
    proxy_set_header Host $host;
}

Root Cause

The two main causes of this error message in an AWS environment is due to;

  1. The Amazon Classic Load Balancer - CLB not supporting the gRPC and wss protocols. You can find more information on the features supported on different Amazon load balancers in AWS official documentation. The recommendation would be to use a load balancer that supports gRPC and the wss protocol.
  2. The configuration of your NGINX HTTP Load balancer has not been explicitly configured to support the WebSocket Secure protocol.

Diagnostic Steps

  1. Check the Sensor logs using the following command to identify the error message;
# For OpenShift users
$ oc logs <sensor_pod_name> -n stackrox

# For Kubernetes users
$ kubectl logs <sensor_pod_name> -n stackrox
  1. Check if you are using an Classic Load Balancer instead of a Network Load Balancer.
  2. Check if your NGINX HTTP load balancer configuration file nginx-config.yaml has the correct headers to support WebSocket Secure protocol.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments