Failure to install a private OCP 4 cluster on Azure due to organization policies

Solution In Progress - Updated -

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4
  • Azure

Issue

  • The installation of a private OCP 4 in Azure fails due to Azure policies that have been put in place on the Azure subscription.
  • Error creating Azure Storage Account:

    Reasons stated for error 1: Storage account public access should be disallowed, and Azure Storage should have the minimal TLS version of 1.2
    
  • Error Creating/Updating Network Security Rule bootstrap_ssh_in:

    Reason stated for error 2: SSH access from the Internet should be blocked with Deny
    

Resolution

Starting with OCP 4.8.2, the installation requires Azure Storage Account with TLS 1.2 as part of BZ 1943175, and the bootstrap ssh rule was removed for private clusters as part of BZ 1943219.

For other issues or older OCP versions, the workaround is:

  1. Disable policies preventing installation during bootstrap, enable afterwards.
  2. Install the cluster using the User Provisioned Infrastructure (UPI) instead of IPI.

Root Cause

The resources the policies are pushing up against are only needed at bootstrap time.

The bootstrap_ssh_in NSG will be removed once bootstrap is removed.
https://github.com/openshift/installer/blob/5ae96abf0000f35c0f352469ae845e4fdf025115/data/data/azure/bootstrap/main.tf#L218-L231

With a private installation (publish: Internal) and the bootstrap won't get a public IP. The reasoning in the policy is inaccurate because there can't be any access from the Internet.

We require SSH at bootstrap to be able to debug any issues during installation time and it is not possible to disable it through config. We can investigate why this is triggering a rule when it's a private cluster, and see if we can't get this fixed.

There are at least 2 storage accounts involved for installation:
1. https://github.com/openshift/installer/blob/5ae96abf0000f35c0f352469ae845e4fdf025115/data/data/azure/main.tf#L147-L1532.
2. https://github.com/openshift/installer/blob/master/data/data/azure/bootstrap/main.tf#L7-L37

The first one hosts the RHCOS image & acts as the serial console target for bootstrap & masters (for debbuging) and the second one hosts the ignition file for bootstrap and again will be destroyed when bootstrap is done.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments