OCP 4 - egress behaviour when node reboots (e.g. during cluster-update)

Comments

Hey folks,

our new ocp-cluster is in production since a couple of weeks and we have now performed our first update (4.7.9 -> 4.7.21). Unfortunately, we had a short service-outage while the workernodes were rebooting.
The failed service (two replicas on different nodes) needs a constant jdbc-connection to an external database, so we have already implemented an according liveness-check which already restarts the pod succesfully.
The problem is, that this connection interrupts and there is still a timeperiod between the first failed connection and the liveness initiated pod restarts.
We use projectbased egressIPs, annotated to our four worker-nodes, for outgoing traffic ( oc patch netnamespace mynamespace--type=merge -p '{"egressIPs": ["A.B.C.D"]}' )

My assumptions:
1. An egressIP can only exist on one node
2. After or while the egressIP floats from one node to another, the clusteroperator "network" is processing.
3. There is a time period while the egressIP can`t be used by the assigned namespaces

My questions:
1. Are my assumptions right?
2. What can we do to prevent this situation? We want to update our cluster without any service downtime.

Maybe the problem is on a total different place, if so, please let me know.
Thank you!

Kind regards,
Sascha

Started 2021-08-10T07:52:08+00:00 by

Sascha Gruen

Active Contributor 195 points

Select Your Language

OCP 4 - egress behaviour when node reboots (e.g. during cluster-update)

Responses

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Responses

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links