Upgrade Process Reliabilty

Posted on

Hi all, looking for a bit of feedback here. I have been working this last 12 months with OCP 4.x and one are I have found frustrating is the upgrade process. Whilst ostensibly trivial to initiate an upgrade, here are some of the issues we seem to repeatedly run into:

  • upgrade fails due to lack of compute. We have enough compute to run our clusters but when we do an in place upgrade it fails because we do not have enough compute meaning we have to add nodes into the cluster or increase compute vertically. This often means we would go over our license allotment for 1 hr or so just to get past the upgrade. This is not a good pattern for obvious reasons.

  • upgrade fails due to bugs such as these https://bugzilla.redhat.com/show_bug.cgi?id=1834194. In my experience there have been many of these.

Confidence in in-place upgrade is low currently on our team .

Now, we could of course abandon in-place upgrades and build out a new cluster each time we need to use a later version but then we run the risk of going over our license allotment.

Wondering what people are doing in this are and if they have come across the same issues/challenges I have.