CRI-O service constantly crashes in an endless loop in Red Hat OpenShift 4
Issue
- The Infra/Worker node is not booting and is not part of the cluster.
- The worker node disables scheduling and stays in the NotReady state.
- The kubelet service is restarting continuously on a worker node.
-
CRI-O is continuously killed by a SIG ABRT generating the following stack trace:
May 24 07:22:01 odf-01.qa.ocp.example.com systemd[1]: crio.service: Main process exited, code=killed, status=6/ABRT May 24 07:22:01 odf-01.qa.ocp.example.com systemd[1]: crio.service: Failed with result 'signal'. May 24 07:22:01 odf-01.qa.ocp.example.com systemd[1]: crio.service: Consumed 861ms CPU time May 24 07:22:01 odf-01.qa.ocp.example.com systemd-coredump[1276211]: Process 1276062 (crio) of user 0 dumped core. Stack trace of thread 1276205: #0 0x000055f0c63e7961 runtime.raise (crio) #1 0x000055f0c63c35f1 runtime.sigfwdgo (crio) #2 0x000055f0c63c1df4 runtime.sigtrampgo (crio) #3 0x000055f0c63e7ce3 runtime.sigtramp (crio) #4 0x00007fc0a0899b20 __restore_rt (libpthread.so.0) #5 0x000055f0c63e7961 runtime.raise (crio) #6 0x000055f0c63ab62e runtime.fatalpanic (crio) #7 0x000055f0c63aaf65 runtime.gopanic (crio) #8 0x000055f0c6ea0e4b github.com/cri-o/cri-o/vendor/go.etcd.io/bbolt.(*freelist).read (crio) #9 0x000055f0c6eab597 github.com/cri-o/cri-o/vendor/go.etcd.io/bbolt.(*DB).loadFreelist.func1 (crio) #10 0x000055f0c63ff1ce sync.(*Once).doSlow (crio) #11 0x000055f0c6e9b68c github.com/cri-o/cri-o/vendor/go.etcd.io/bbolt.(*DB).loadFreelist (crio) #12 0x000055f0c6e9b12f github.com/cri-o/cri-o/vendor/go.etcd.io/bbolt.Open (crio) #13 0x000055f0c6eadf95 github.com/cri-o/cri-o/vendor/github.com/containers/image/v5/pkg/blobinfocache/boltdb.(*cache).update (crio) #14 0x000055f0c6eae68f github.com/cri-o/cri-o/vendor/github.com/containers/image/v5/pkg/blobinfocache/boltdb.(*cache).RecordKnownLocation (crio) #15 0x000055f0c7195849 github.com/cri-o/cri-o/vendor/github.com/containers/image/v5/docker.(*dockerImageSource).GetBlob (crio) #16 0x000055f0c70e354a github.com/cri-o/cri-o/vendor/github.com/containers/image/v5/copy.(*imageCopier).copyLayer (crio) #17 0x000055f0c70ebda5 github.com/cri-o/cri-o/vendor/github.com/containers/image/v5/copy.(*imageCopier).copyLayers.func1 (crio) #18 0x000055f0c63e6141 runtime.goexit (crio) May 24 07:22:01 odf-01.qa.ocp.example.com systemd[1]: crio.service: Service RestartSec=100ms expired, scheduling restart. May 24 07:22:01 odf-01.qa.ocp.example.com systemd[1]: crio.service: Scheduled restart job, restart counter is at 54727. May 24 07:22:01 odf-01.qa.ocp.example.com systemd[1]: Stopping Kubernetes Kubelet...
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4.9
- Container runtime
- CRI-O
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.