Why does docker container crash all of a sudden ?

Posted on

I'm running an ESB container on a RHEL 8 machine and when putting a mild load on it of 10 parallel threads it suddenly goes down without any errors in de console log of the docker container.

Only this appears in de /var/log/messages of the host system:

Aug  9 13:59:17 cliq-eiwrk1 containerd[1164]: time="2024-08-09T13:59:17.196644333+02:00" level=info msg="shim disconnected" id=984afe313ef609d0b46fa716ae9cfcb4953682cd90ec8bb7aeedc7                               47274c7f8b
Aug  9 13:59:17 cliq-eiwrk1 containerd[1164]: time="2024-08-09T13:59:17.196832101+02:00" level=warning msg="cleaning up after shim disconnected" id=984afe313ef609d0b46fa716ae9cfcb49                               53682cd90ec8bb7aeedc747274c7f8b namespace=moby
Aug  9 13:59:17 cliq-eiwrk1 containerd[1164]: time="2024-08-09T13:59:17.196841251+02:00" level=info msg="cleaning up dead shim"
Aug  9 13:59:17 cliq-eiwrk1 dockerd[1532]: time="2024-08-09T13:59:17.197054276+02:00" level=info msg="ignoring event" container=984afe313ef609d0b46fa716ae9cfcb4953682cd90ec8bb7aeedc                               747274c7f8b module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Aug  9 13:59:17 cliq-eiwrk1 containerd[1164]: time="2024-08-09T13:59:17.204396039+02:00" level=warning msg="cleanup warnings time=\"2024-08-09T13:59:17+02:00\" level=info msg=\"star                               ting signal loop\" namespace=moby pid=66620 runtime=io.containerd.runc.v2\n"
Aug  9 13:59:17 cliq-eiwrk1 kernel: veth0b87eaa: renamed from eth0
Aug  9 13:59:17 cliq-eiwrk1 kernel: br-1d2b5768be6a: port 1(veth32c7414) entered disabled state
Aug  9 13:59:17 cliq-eiwrk1 NetworkManager[1129]: <info>  [1723204757.3967] manager: (veth0b87eaa): new Veth device (/org/freedesktop/NetworkManager/Devices/70)
Aug  9 13:59:17 cliq-eiwrk1 systemd-udevd[66671]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Aug  9 13:59:17 cliq-eiwrk1 kernel: br-1d2b5768be6a: port 1(veth32c7414) entered disabled state
Aug  9 13:59:17 cliq-eiwrk1 kernel: device veth32c7414 left promiscuous mode
Aug  9 13:59:17 cliq-eiwrk1 kernel: br-1d2b5768be6a: port 1(veth32c7414) entered disabled state
Aug  9 13:59:17 cliq-eiwrk1 systemd[1]: run-docker-netns-7a121b6f7353.mount: Succeeded.
Aug  9 13:59:17 cliq-eiwrk1 systemd[1]: var-lib-docker-overlay2-8fa722d107bd4d96a870a1d18219dab1951de7f49e0fb96cc3949977b33afc69-merged.mount: Succeeded.
Aug  9 13:59:17 cliq-eiwrk1 kernel: br-1d2b5768be6a: port 1(veth0c4dd1e) entered blocking state
Aug  9 13:59:17 cliq-eiwrk1 kernel: br-1d2b5768be6a: port 1(veth0c4dd1e) entered disabled state
Aug  9 13:59:17 cliq-eiwrk1 kernel: device veth0c4dd1e entered promiscuous mode
Aug  9 13:59:17 cliq-eiwrk1 kernel: br-1d2b5768be6a: port 1(veth0c4dd1e) entered blocking state
Aug  9 13:59:17 cliq-eiwrk1 kernel: br-1d2b5768be6a: port 1(veth0c4dd1e) entered forwarding state
Aug  9 13:59:17 cliq-eiwrk1 systemd-udevd[66679]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Aug  9 13:59:17 cliq-eiwrk1 systemd-udevd[66679]: Could not generate persistent MAC address for veth6fe3685: No such file or directory
Aug  9 13:59:17 cliq-eiwrk1 NetworkManager[1129]: <info>  [1723204757.4857] manager: (veth6fe3685): new Veth device (/org/freedesktop/NetworkManager/Devices/71)
Aug  9 13:59:17 cliq-eiwrk1 NetworkManager[1129]: <info>  [1723204757.4863] manager: (veth0c4dd1e): new Veth device (/org/freedesktop/NetworkManager/Devices/72)
Aug  9 13:59:17 cliq-eiwrk1 systemd-udevd[66680]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Aug  9 13:59:17 cliq-eiwrk1 systemd-udevd[66680]: Could not generate persistent MAC address for veth0c4dd1e: No such file or directory
Aug  9 13:59:17 cliq-eiwrk1 containerd[1164]: time="2024-08-09T13:59:17.725562558+02:00" level=info msg="loading plugin \"io.containerd.event.v1.publisher\"..." runtime=io.container                               d.runc.v2 type=io.containerd.event.v1
Aug  9 13:59:17 cliq-eiwrk1 containerd[1164]: time="2024-08-09T13:59:17.725649141+02:00" level=info msg="loading plugin \"io.containerd.internal.v1.shutdown\"..." runtime=io.contain                               erd.runc.v2 type=io.containerd.internal.v1
Aug  9 13:59:17 cliq-eiwrk1 containerd[1164]: time="2024-08-09T13:59:17.725664923+02:00" level=info msg="loading plugin \"io.containerd.ttrpc.v1.task\"..." runtime=io.containerd.run                               c.v2 type=io.containerd.ttrpc.v1
Aug  9 13:59:17 cliq-eiwrk1 containerd[1164]: time="2024-08-09T13:59:17.725909283+02:00" level=info msg="starting signal loop" namespace=moby path=/run/containerd/io.containerd.runt                               ime.v2.task/moby/984afe313ef609d0b46fa716ae9cfcb4953682cd90ec8bb7aeedc747274c7f8b pid=66788 runtime=io.containerd.runc.v2
Aug  9 13:59:17 cliq-eiwrk1 systemd[1]: run-docker-runtime\x2drunc-moby-984afe313ef609d0b46fa716ae9cfcb4953682cd90ec8bb7aeedc747274c7f8b-runc.8f69Np.mount: Succeeded.
Aug  9 13:59:17 cliq-eiwrk1 kernel: eth0: renamed from veth6fe3685
Aug  9 13:59:17 cliq-eiwrk1 NetworkManager[1129]: <info>  [1723204757.8242] device (veth0c4dd1e): carrier: link connected
Aug  9 14:00:00 cliq-eiwrk1 systemd[1]: Starting system activity accounting tool...
Aug  9 14:00:00 cliq-eiwrk1 systemd[1]: sysstat-collect.service: Succeeded.
Aug  9 14:00:00 cliq-eiwrk1 systemd[1]: Started system activity accounting tool.
Aug  9 14:00:01 cliq-eiwrk1 systemd[1]: Created slice User Slice of UID 0.
Aug  9 14:00:01 cliq-eiwrk1 systemd[1]: Starting User runtime directory /run/user/0...
Aug  9 14:00:01 cliq-eiwrk1 systemd[1]: Started User runtime directory /run/user/0.
Aug  9 14:00:01 cliq-eiwrk1 systemd[1]: Starting User Manager for UID 0...
Aug  9 14:00:01 cliq-eiwrk1 systemd[67444]: Starting D-Bus User Message Bus Socket.
Aug  9 14:00:01 cliq-eiwrk1 systemd[67444]: Reached target Timers.
Aug  9 14:00:01 cliq-eiwrk1 systemd[67444]: Reached target Paths.
Aug  9 14:00:01 cliq-eiwrk1 systemd[67444]: Listening on D-Bus User Message Bus Socket.
Aug  9 14:00:01 cliq-eiwrk1 systemd[67444]: Reached target Sockets.
Aug  9 14:00:01 cliq-eiwrk1 systemd[67444]: Reached target Basic System.
Aug  9 14:00:01 cliq-eiwrk1 systemd[67444]: Reached target Default.
Aug  9 14:00:01 cliq-eiwrk1 systemd[67444]: Startup finished in 52ms.
Aug  9 14:00:01 cliq-eiwrk1 systemd[1]: Started User Manager for UID 0.
Aug  9 14:00:01 cliq-eiwrk1 systemd[1]: Started Session 39 of user root.
Aug  9 14:00:01 cliq-eiwrk1 systemd[1]: session-39.scope: Succeeded.
Aug  9 14:00:11 cliq-eiwrk1 systemd[1]: Stopping User Manager for UID 0...
Aug  9 14:00:11 cliq-eiwrk1 systemd[67444]: Stopped target Default.
Aug  9 14:00:11 cliq-eiwrk1 systemd[67444]: Stopped target Basic System.
Aug  9 14:00:11 cliq-eiwrk1 systemd[67444]: Stopped target Sockets.
Aug  9 14:00:11 cliq-eiwrk1 systemd[67444]: Closed D-Bus User Message Bus Socket.
Aug  9 14:00:11 cliq-eiwrk1 systemd[67444]: Stopped target Paths.
Aug  9 14:00:11 cliq-eiwrk1 systemd[67444]: Reached target Shutdown.
Aug  9 14:00:11 cliq-eiwrk1 systemd[67444]: Stopped target Timers.
Aug  9 14:00:11 cliq-eiwrk1 systemd[67444]: Started Exit the Session.
Aug  9 14:00:11 cliq-eiwrk1 systemd[67444]: Reached target Exit the Session.
Aug  9 14:00:11 cliq-eiwrk1 systemd[1]: user@0.service: Succeeded.
Aug  9 14:00:11 cliq-eiwrk1 systemd[1]: Stopped User Manager for UID 0.
Aug  9 14:00:11 cliq-eiwrk1 systemd[1]: Stopping User runtime directory /run/user/0...
Aug  9 14:00:11 cliq-eiwrk1 systemd[1]: run-user-0.mount: Succeeded.
Aug  9 14:00:11 cliq-eiwrk1 systemd[1]: user-runtime-dir@0.service: Succeeded.
Aug  9 14:00:11 cliq-eiwrk1 systemd[1]: Stopped User runtime directory /run/user/0.
Aug  9 14:00:11 cliq-eiwrk1 systemd[1]: Removed slice User Slice of UID 0.

And a single line in journcalctl -u docker output:

Aug 09 13:59:17 cliq-eiwrk1.LOCGOV.NL dockerd[1532]: time="2024-08-09T13:59:17.197054276+02:00" level=info msg="ignoring event" container=984afe313ef609d0b46fa716ae9cfcb4953682

Some details about the installed packages:

containerd.io-1.6.32-3.1.el8.x86_64
container-selinux-2.229.0-2.module+el8.10.0+21962+8143777b.noarch
docker-buildx-plugin-0.14.0-1.el8.x86_64
docker-ce-26.1.3-1.el8.x86_64
docker-ce-cli-26.1.3-1.el8.x86_64
docker-ce-rootless-extras-26.1.3-1.el8.x86_64
docker-compose-plugin-2.27.0-1.el8.x86_64
docker-scan-plugin-0.23.0-3.el8.x86_64
kernel-4.18.0-553.16.1.el8_10.x86_64
kernel-4.18.0-553.5.1.el8_10.x86_64
kernel-4.18.0-553.8.1.el8_10.x86_64
kernel-core-4.18.0-553.16.1.el8_10.x86_64
kernel-core-4.18.0-553.5.1.el8_10.x86_64
kernel-core-4.18.0-553.8.1.el8_10.x86_64
kernel-modules-4.18.0-553.16.1.el8_10.x86_64
kernel-modules-4.18.0-553.5.1.el8_10.x86_64
kernel-modules-4.18.0-553.8.1.el8_10.x86_64
kernel-tools-4.18.0-553.16.1.el8_10.x86_64
kernel-tools-libs-4.18.0-553.16.1.el8_10.x86_64

Of course I expect the system to stay running.

I tried to increase java memory on the process inside the container, but it didn't help. Although there are no indications that the system needs extra memory.

Since it's so vague. It's hard to say what should be tried.

It looks like something weird is happening with the network devices of the container, but whether that's also the cause or a consequence of whatever is going wrong I don't dare to say.

Anyone any ideas ?

Responses