OpenShift AWS private cluster installation fails with 'connect: connection refused'
Issue
- The openshift-install tool with the debug logs enabled, the install fails with the following error:
# openshift-install create cluster --log-level=debug
...
DEBUG Built from commit 310205b3cee9e166544b882e8ea8b321af198b6f
INFO Waiting up to 20m0s for the Kubernetes API at https://api.ocp4.example.com:6443...
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.example.com:6443/version?timeout=32s: dial tcp 192.168.1.12:6443: i/o timeout
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.example.com:6443/version?timeout=32s: dial tcp 192.168.1.12:6443: connect: connection refused
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.example.com:6443/version?timeout=32s: dial tcp 192.168.1.12:6443: connect: connection refused
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.example.com:6443/version?timeout=32s: dial tcp 192.168.2.4:6443: connect: connection refused
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.example.com:6443/version?timeout=32s: dial tcp 192.168.1.12:6443: connect: connection refused
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.example.com:6443/version?timeout=32s: dial tcp 192.168.2.4:6443: connect: connection refused
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.example.com:6443/version?timeout=32s: dial tcp 192.168.1.12:6443: connect: connection refused
INFO Pulling debug logs from the bootstrap machine
DEBUG Added /tmp/bootstrap-ssh976485374 to installer's internal agent
DEBUG Added /home/user/.ssh/id_rsa to installer's internal agent
ERROR Attempted to gather debug logs after installation failure: failed to create SSH client: dial tcp 192.168.1.4:22: connect: connection refused
FATAL Bootstrap failed to complete: failed waiting for Kubernetes API: Get https://api.ocp4.example.com:6443/version?timeout=32s: dial tcp 192.168.1.12:6443: i/o timeout
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.example.com:6443/version?timeout=32s: dial tcp 192.168.2.4:6443: i/o timeout
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.example.com.com:6443/version?timeout=32s: dial tcp 192.168.1.11:6443: connect: connection refused
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.example.com.com:6443/version?timeout=32s: dial tcp 192.168.2.4:6443: connect: connection refused
DEBUG Still waiting for the Kubernetes API: Get https://api.ocp4.example.com:6443/version?timeout=32s: dial tcp 192.168.1.11:6443: connect: connection refused
- The bootstrap machine boot logs shows many tries to download the bootstrap.ign file from an AWS S3 resource and system gets infinite reboot loop : (select AWS Console > EC2 > EC2 Instance > Instance Settings > Get System Log in order to get instances logs):
[OK] Reached target Remote File Systems (Pre).
[OK] Reached target Remote File Systems.
[10.820527] systemd[1]: Starting dracut pre-mount hook...
[10.824952] systemd[1]: Reached target Remote File Systems (Pre).
[OK] Started dracut pre-mount hook.
[10.839883] systemd[1]: Reached target Remote File Systems.
[10.845620] systemd[1]: Started dracut pre-mount hook.
[10.974167] ignition[820]: GET http://169.254.169.254/2009-04-04/user-data: attempt #5
[10.982529] ignition[820]: GET result: OK
[*] A start job is running for Ignition (fetch) (11s / no limit)
[**] A start job is running for Ignition (fetch) (11s / no limit)
[**] A start job is running for Ignition (fetch) (12s / no limit)
[**] A start job is running for Ignition (fetch) (13s / no limit)
[**] A start job is running for Ignition (fetch) (13s / no limit)
...
[** ] A start job is running fo[?25l[m[H[J[1;1H[20;7H[mUse the ^ and v keys to change the selection.
Press 'e' to edit the selected item, or 'c' for a command prompt.[4;80H [7m[4;1HRed Hat Enterprise Linux CoreOS 45.82.202008010929-0 (Ootpa) (ostree:0) [m[4;79H[m[m[5;1H [m[5;79H[m[m[6;1H [m[6;79H[m[m[7;1H [m[7;79H[m[m[8;1H [m[8;79H[m[m[9;1H [m[9;79H[m[m[10;1H [m[10;79H[m[m[11;1H [m[11;79H[m[m[12;1H [m[12;79H[m[m[13;1H [m[13;79H[m[m[14;1H [m[14;79H[m[m[15;1H [m[15;79H[m[m[16;1H [m[16;79H[m[m[17;1H [m[17;79H[m[m[18;1H [m[18;79H[m[18;80H [4;79H[22;1HThe selected entry will be started automatically in 1s.[4;79H[22;1HThe selected entry will be started automatically in 0s.[4;79H[?25h[H[J[1;1H[H[J[1;1H[ 0.000000] Linux version 4.18.0-193.14.3.el8_2.x86_64 (mockbuild@x86-vm-07.build.eng.bos.redhat.com) (gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)) #1 SMP Mon Jul 20 15:02:29 UTC 2020
[ 0.000000] Command line: BOOT_IMAGE=(hd0,gpt1)/ostree/rhcos-cfdd324ec49fa0c33c9f0391807b6a35cdac3221f5a2f95c99598b9f3974b9c4/vmlinuz-4.18.0-193.14.3.el8_2.x86_64 rhcos.root=crypt_rootfs random.trust_cpu=on console=tty0 console=ttyS0,115200n8 rd.luks.options=discard ignition.firstboot rd.neednet=1 ip=dhcp,dhcp6 ostree=/ostree/boot.1/rhcos/cfdd324ec49fa0c33c9f0391807b6a35cdac3221f5a2f95c99598b9f3974b9c4/0 ignition.platform.id=aws
...
Rebooting.
[ 432.383227] systemd[1]: Shutting down.
[ 432.394910] printk: systemd-shutdow: 28 output lines suppressed due to ratelimiting
[ 432.455090] systemd-shutdown[1]: Syncing filesystems and block devices.
[ 432.459690] systemd-shutdown[1]: Sending SIGTERM to remaining processes...
[ 432.465990] systemd-journald[924]: Received SIGTERM from PID 1 (systemd-shutdow).
[ 432.475134] systemd-shutdown[1]: Sending SIGKILL to remaining processes...
[ 432.481650] systemd-shutdown[1]: Unmounting file systems.
[ 432.486458] [943]: Remounting '/' read-only in with options 'size=4039748k,nr_inodes=1009937'.
[ 432.493551] systemd-shutdown[1]: All filesystems unmounted.
[ 432.497606] systemd-shutdown[1]: Deactivating swaps.
[ 432.501373] systemd-shutdown[1]: All swaps deactivated.
[ 432.505116] systemd-shutdown[1]: Detaching loop devices.
[ 432.509011] systemd-shutdown[1]: All loop devices detached.
[ 432.512954] systemd-shutdown[1]: Detaching DM devices.
[ 432.516688] systemd-shutdown[1]: All DM devices detached.
[ 432.520546] systemd-shutdown[1]: All filesystems, swaps, loop devices and DM devices detached.
[ 432.527191] systemd-shutdown[1]: Syncing filesystems and block devices.
[ 432.531799] systemd-shutdown[1]: Rebooting.
[ 432.535922] xenbus: xenbus_dev_shutdown: device/pci/0: Initialising != Connected, skipping
[ 432.578103] reboot: Restarting system
[ 432.581430] reboot: machine restart
- The bootstrap machine is unable to connect by SSH and fails with the following error:
# ssh ip-xxx-xxx-xx-xx.example.internal (Private IPv4 DNS)
ssh: connect to host ip-xxx-xxx-xx-xx.example.internal port 22: Connection refused
Environment
- Red Hat Openshift Container Platform 4.5 (OCP)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.