Routing problem with Service Mesh Operator and Open Liberty Operator

Latest response

We're currently on OCP 4.5.latest, and have the latest official Service Mesh and community Open Liberty Operators installed.

I'm trying to use sidecar and route-exposure injection annotations, which are working on another Liberty application that wasn't installed with the OL operator.

apiVersion: openliberty.io/v1beta1
kind: OpenLibertyApplication
metadata:
  name: checkfreepay-api
  # Leave blank to set in CLI oc execution via --namespace or using current Project
  namespace:
  labels:
    app: checkfreepay-api
  annotations:
    # for Service Mesh/Istio injection
    sidecar.istio.io/inject: 'true'
    maistra.io/expose-route: 'true'

spec:
  service:
    type: ClusterIP
    port: 9080
    portName: http-9080
  expose: true
...

The annotations seem to be also injected fine into the objects created by the OLO, and the Envoy sidecar is definitely up and running and not logging any errors. Further, the OL app container itself is up and running and not logging any errors either.

The expected Service and Route exist and seem correct, yet I cannot reach the Route, instead getting the too-familiar OCP "Application is not available" error page, "The application is currently not serving requests at this endpoint. It may not have been started or is still starting."

Oddly, at some point, I did reach the route/host, shortly after we had discovered and restarted a hung worker node, so I thought that had been the root cause, and that things were working after all. But that apparently wasn't the issue, as I can't get to the host/route again. I have no idea how/why I had a brief time of things apparently working. Kind-of seems like a race condition.

I see no errors in the web UI anywhere I look, or in log files, so I don't know how to further troubleshoot.

I'm also in contact with someone from the OLO development team, where they wonder about the SM operator conflicting with object management, but I thought I'd start here as well, at least to see if I'm missing a technique to get more detail on where the failure is.

I can certainly provide more detail & samples as useful. I don't want to start just throwing out everything I have.

Responses

FWIW, we just updated our OCP cluster to 4.6.26, and currently my routing is working. But I more suspect it's the same "timing" condition that enabled it to work briefly, once, on 4.5 after some level of node restarts. Will update here if I learn different/more.

Yep, now after updating cluster to 4.7.8, the route not responding again.