Why OpenShift worker nodes based on Red Hat CoreOS 8.6 hangs, get disconnected from the network and then later recovers ?
Issue
- Why OpenShift worker nodes based on Red Hat CoreOS 8.6 hangs, get disconnected from the network and then later recovers ?
From the journal log after recovery, we see the systemd-journald crashing. During the hang period, no IO happens and collectl or other tools does not log anything
[Fri Sep 15 07:50:30 UTC 2023] systemd-coredump[61362]: Stack trace of thread 1592821:
[Fri Sep 15 07:50:30 UTC 2023] systemd-coredump[61362]: #0 0x00007f0608f74a13 journal_file_append_data (libsystemd-shared-239.so)
[Fri Sep 15 07:50:30 UTC 2023] systemd-coredump[61362]: #1 0x00007f0608f76c61 journal_file_append_entry (libsystemd-shared-239.so)
[Fri Sep 15 07:50:30 UTC 2023] systemd-coredump[61362]: #2 0x0000563afe280d6c dispatch_message_real (systemd-journald)
[Fri Sep 15 07:50:30 UTC 2023] systemd-coredump[61362]: #3 0x0000563afe2863ed stdout_stream_log (systemd-journald)
[Fri Sep 15 07:50:30 UTC 2023] systemd-coredump[61362]: #4 0x0000563afe2866a4 stdout_stream_line (systemd-journald)
[Fri Sep 15 07:50:30 UTC 2023] systemd-coredump[61362]: #5 0x0000563afe286e3f stdout_stream_scan (systemd-journald)
[Fri Sep 15 07:50:30 UTC 2023] systemd-coredump[61362]: #6 0x0000563afe28734e stdout_stream_process (systemd-journald)
Environment
- Red Hat OpenShift Container Platform 4.11
- Red Hat CoreOS 8.6
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.