System hangs during boot due to race between NetworkManager and network

Solution In Progress - Updated -

Issue

  • We have undercloud nodes that have all network adapters controlled by the network templates, and thus configured via os-net-config and started by network.service.

  • However, the nodes also start NetworkManager (cloud-init? default from image?), and NetworkManager starts before network.service but has afaik nothing to configure (all interfaces have a NM_CONTROLLED=no in the ifcfg files)

  • Anyway, at the very least this is always a slow boot, introducing some NetworkManager services that can only timeout (eg NetworkManager-wait-online)

  • However, sometimes things get messy and it seems network.service is not reached and the system hangs (the console prints 'Starting firewall ..' for both ipv4 and ipv6, but those units appear to have no timeout (that they have no timeout is not the issue)).

  • The console might look similar to this:

[OK] Started NTP client/server.
[OK] [  158.099842] Started Initial cloud-init job (pre-networking).cloud-init
[16913]: Cloud-init v. 18.5 running 'init-local' at Thu, 02 Apr 2020 12:34:45 +0000. Up 157.41 seconds.
[OK] Reached target Network (Pre).
         Starting Network Manager...
[OK] Started Network Manager.
         Starting Hostname Service...
[*] (1 of 2) A start job is running for…all with iptables (32s / no limit)
[*] (1 of 2) A start job is running for…all with iptables (33s / no limit)
[*] (1 of 2) A start job is running for…all with iptables (33s / no limit)

Environment

  • Red Hat OpenStack Platform 16.0 (RHOSP)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In