Node time drift in OpenShift 4 on Azure

Solution Verified - Updated -

Environment

  • Red Hat Openshift Container Platform (RHOCP)
    • 4
  • Azure Red Hat OpenShift (ARO)
    • 4
  • OpenShift Managed (Azure)
    • 4
  • Microsoft Azure

Issue

  • After cluster installation, the controlplane virtual machines are unable to reach public NTP servers
  • The chrony daemon which is running on the RHCOS virtual machines, seems to be configured to use 2.rhel.pool.ntp.org pool as an NTP source.
  • NTP synchronisation fails, since no NTP peers are reachable

Resolution

There was a known issue addresed in BZ 1765609, and already fixed starting with OCP 4.5.

Workarounds

If the fix is not working, there is a possible workaround to this issue: allow the outgoing traffic to the NTP servers:

  • Disable outgoing snat on the public load balancer, and create an explicit outbound rule which allows all traffic:

    az network lb rule update  -g cluster01-example-rg --lb-name cluster01-example-public-lb --name api-internal --disable-outbound-snat true
    az network lb outbound-rule  create -g cluster01-example-rg --lb-name cluster01-example-public-lb --frontend-ip-configs public-lb-ip --protocol All --address-pool cluster01-example-public-lb-control-plane --name AllowOutbound
    

Root Cause

By default, virtual machines which have been added to a public load balancer, are unable to send outgoing UDP packets (both NTP and DNS).

Diagnostic Steps

Check chronyc sources:

$ oc debug node/[node_name]
[...]
sh-4.4# chroot /host bash
[root@node_name /]# chronyc sources
210 Number of sources = 8
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^? ntp.example.com     0   9     0     -     +0ns[   +0ns] +/-    0ns

Check outbound DNS requests:

$ oc debug node/[node_name]
[...]
sh-4.4# chroot /host bash
[root@node_name /]# dig www.redhat.com @8.8.8.8

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-17.P2.el8_0.1 <<>> www.redhat.com @8.8.8.8
;; global options: +cmd
;; connection timed out; no servers could be reached

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments