Troubleshooting Sensor to Central connectivity issues
Most common issues around Central - Sensor connectivity issues reside within Ingress configurations within the OpenShift cluster. Determining the reason requires inspecting the logs for the Sensor pod in question:
# oc -n stackrox logs <SENSOR POD NAME>
Sensor exited with error: <BRIEF ERROR MESSAGE> rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: <INTERNAL ERROR>
or
Sensor exited with error: <BRIEF ERROR MESSAGE> rpc error: code = Unavailable desc = <DESCRIPTION>
Below is a list of more common error messages and what they generally refer to as their cause.
Double checking compatibility
First and foremost, ensure the Ingress in use for the cluster is listed within the ACS - Ingress Compatibility Matrix and the annotation is absolutely correct. Ingress configurations can be very particular about the annotations that are used; in light of this, please make sure documentation for the Ingress in question is thoroughly reviewed to ensure the setup and configurations are correct. If the Ingress in use is not listed, Red Hat will still provide commercially reasonable support, however, we strongly recommend engaging the Ingress vendor in parallel.
Kong Ingress
A Sensor component setup behind Kong Ingress without the correct annotation can cause failure with logs similar to the following:
# oc -n stackrox logs sensor-123456789a-abcde
[...]
common/sensor: 2021/01/26 20:46:15.028509 sensor.go:323: Error: Sensor reported an error: receiving initial metadata: rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: PROTOCOL_ERROR
main: 2021/01/26 20:46:15.028603 main.go:63: Fatal: Sensor exited with error: receiving initial metadata: rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: PROTOCOL_ERROR
If the above is observed, please review the knowledge-base article Expose Central behind Kong Ingress.
HAProxy Ingress
A Sensor component behind HAProxy can fail to connect if the Ingress controller does not explicitly use HTTP/2. The logs can look like the following if this happens:
# oc -n stackrox logs sensor-123456789a-abcde
common/sensor: 2021/06/02 09:44:52.725305 sensor.go:320: Error: Sensor reported an error: opening stream: rpc error: code = Unavailable desc = connection closed
main: 2021/06/02 09:44:52.725347 main.go:66: Fatal: Sensor exited with error: opening stream: rpc error: code = Unavailable desc = connection closed
If the noted symptoms and setup is observed, please review the knowledge-base article Connecting a secured-cluster Sensor to Central behind a HAProxy Ingress.
Nginx Ingress as an HTTP load balancer and WSS
When deploying a Sensor and Collector to a cluster using the WebSocker Secure (WSS) protocol and Nginx is the Ingress backend, connection failures can occur if the WSS protocol is not set in the Nginx config. Failures can look like the following:
# oc -n stackrox logs sensor-123456789a-abcde
2021-01-04T18:05:29.200609287Z common/sensor: 2021/01/04 18:05:29.200485 sensor.go:273: Info: Check Central status failed: rpc error: code = Unavailable desc = transport: connecting to gRPC server "https://<ROX_ENDPOINT>:443/v1.MetadataService/GetMetadata": failed to WebSocket dial: expected handshake response status code 101 but got 500. Retrying...
If the configuration and logs are observed, please review the knowledge-base article Troubleshooting "failed to WebSocket dial: expected handshake response status code 101 but got 500. Retrying" error message in the RHACS Sensor logs
Comments