Nodes in a cluster become "NotReady" and Etcd panic shows up in openshift-oauth-apiserver pods
Environment
- Red Hat OpenShift Container Platform 4.x.
Issue
-
All the nodes in the cluster show up as
NotReady
. -
openshift-oauth-apiserver
pods return the following Etcd panic messages:
I1004 00:21:11.961861 1 healthz.go:244] etcd check failed: healthz
[-]etcd failed: error getting data from etcd: context deadline exceeded
E1004 00:21:11.962290 1 timeout.go:128] Header called after Handler finished
goroutine 132445950 [running]:
k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1.1(0xc0065e8fc0)
k8s.io/apiserver@v0.21.0/pkg/server/filters/timeout.go:102 +0x125
panic(0x2210a60, 0x2909660)
runtime/panic.go:965 +0x1b9
golang.org/x/net/http2.(*responseWriter).Header(0xc0064b7b60, 0x3943100)
golang.org/x/net@v0.0.0-20210224082022-3d97a244fca7/http2/server.go:2601 +0x99
net/http.Error(0x296b000, 0xc0064b7b60, 0xc001a42820, 0x18b, 0x1f4)
net/http/server.go:2059 +0x3b
k8s.io/apiserver/pkg/server/healthz.handleRootHealth.func1(0x2975830, 0xc006333d58, 0xc0053ad500)
k8s.io/apiserver@v0.21.0/pkg/server/healthz/healthz.go:245 +0x6c5
k8s.io/apiserver/pkg/endpoints/metrics.InstrumentHandlerFunc.func1(0x7f9c1c09b140, 0xc006333d48, 0xc0053ad500)
k8s.io/apiserver@v0.21.0/pkg/endpoints/metrics/metrics.go:453 +0x2be
net/http.HandlerFunc.ServeHTTP(0xc0003d2820, 0x7f9c1c09b140, 0xc006333d48, 0xc0053ad500)
net/http/server.go:2049 +0x44
k8s.io/apiserver/pkg/server/mux.(*pathHandler).ServeHTTP(0xc0013f0000, 0x7f9c1c09b140, 0xc006333d48, 0xc0053ad500)
k8s.io/apiserver@v0.21.0/pkg/server/mux/pathrecorder.go:241 +0x77a
k8s.io/apiserver/pkg/server/mux.(*PathRecorderMux).ServeHTTP(0xc00062b2d0, 0x7f9c1c09b140, 0xc006333d48, 0xc0053ad500)
k8s.io/apiserver@v0.21.0/pkg/server/mux/pathrecorder.go:234 +0x8c
k8s.io/apiserver/pkg/server.director.ServeHTTP(0x261d5f1, 0xf, 0xc0017c0ab0, 0xc00062b2d0, 0x7f9c1c09b140, 0xc006333d48, 0xc0053ad500)
k8s.io/apiserver@v0.21.0/pkg/server/handler.go:154 +0x914
k8s.io/apiserver/pkg/endpoints/filterlatency.trackCompleted.func1(0x7f9c1c09b140, 0xc006333d48, 0xc0053ad500)
k8s.io/apiserver@v0.21.0/pkg/endpoints/filterlatency/filterlatency.go:95 +0x193
net/http.HandlerFunc.ServeHTTP(0xc0019774d0, 0x7f9c1c09b140, 0xc006333d48, 0xc0053ad500)
net/http/server.go:2049 +0x44
k8s.io/apiserver/pkg/endpoints/filters.WithAuthorization.func1(0x7f9c1c09b140, 0xc006333d48, 0xc0053ad500)
k8s.io/apiserver@v0.21.0/pkg/endpoints/filters/authorization.go:64 +0x603
net/http.HandlerFunc.ServeHTTP(0xc0019574c0, 0x7f9c1c09b140, 0xc006333d48, 0xc0053ad500)
net/http/server.go:2049 +0x44
k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1(0x7f9c1c09b140, 0xc006333d48, 0xc0053ad500)
k8s.io/apiserver@v0.21.0/pkg/endpoints/filterlatency/filterlatency.go:71 +0x186
net/http.HandlerFunc.ServeHTTP(0xc001957500, 0x7f9c1c09b140, 0xc006333d48, 0xc0053ad500)
net/http/server.go:2049 +0x44
k8s.io/apiserver/pkg/endpoints/filterlatency.trackCompleted.func1(0x7f9c1c09b140, 0xc006333d48, 0xc0053ad500)
k8s.io/apiserver@v0.21.0/pkg/endpoints/filterlatency/filterlatency.go:95 +0x193
net/http.HandlerFunc.ServeHTTP(0xc001977500, 0x7f9c1c09b140, 0xc006333d48, 0xc0053ad500)
net/http/server.go:2049 +0x44
k8s.io/apiserver/pkg/server/filters.WithPriorityAndFairness.func1.4()
k8s.io/apiserver@v0.21.0/pkg/server/filters/priority-and-fairness.go:127 +0x1ba
k8s.io/apiserver/pkg/util/flowcontrol.(*configController).Handle.func2()
k8s.io/apiserver@v0.21.0/pkg/util/flowcontrol/apf_filter.go:176 +0x222
k8s.io/apiserver/pkg/util/flowcontrol.immediateRequest.Finish(...)
k8s.io/apiserver@v0.21.0/pkg/util/flowcontrol/apf_controller.go:752
k8s.io/apiserver/pkg/util/flowcontrol.(*configController).Handle(0xc00027c100, 0x297b338, 0xc005337830, 0xc00178cc60, 0x297b920, 0xc006144000, 0xc002a46120, 0xc002a46130, 0xc0020ca9c0)
k8s.io/apiserver@v0.21.0/pkg/util/flowcontrol/apf_filter.go:166 +0x907
k8s.io/apiserver/pkg/server/filters.WithPriorityAndFairness.func1(0x7f9c1c09b140, 0xc006333d48, 0xc0053ad400)
k8s.io/apiserver@v0.21.0/pkg/server/filters/priority-and-fairness.go:130 +0x606
net/http.HandlerFunc.ServeHTTP(0xc001977530, 0x7f9c1c09b140, 0xc006333d48, 0xc0053ad400)
net/http/server.go:2049 +0x44
k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1(0x7f9c1c09b140, 0xc006333d48, 0xc0053ad400)
k8s.io/apiserver@v0.21.0/pkg/endpoints/filterlatency/filterlatency.go:71 +0x186
net/http.HandlerFunc.ServeHTTP(0xc001957540, 0x7f9c1c09b140, 0xc006333d48, 0xc0053ad400)
net/http/server.go:2049 +0x44
k8s.io/apiserver/pkg/endpoints/filterlatency.trackCompleted.func1(0x7f9c1c09b140, 0xc006333d48, 0xc0053ad400)
k8s.io/apiserver@v0.21.0/pkg/endpoints/filterlatency/filterlatency.go:95 +0x193
net/http.HandlerFunc.ServeHTTP(0xc001977560, 0x7f9c1c09b140, 0xc006333d48, 0xc0053ad400)
net/http/server.go:2049 +0x44
k8s.io/apiserver/pkg/endpoints/filters.WithImpersonation.func1(0x7f9c1c09b140, 0xc006333d48, 0xc0053ad400)
k8s.io/apiserver@v0.21.0/pkg/endpoints/filters/impersonation.go:50 +0x240d
net/http.HandlerFunc.ServeHTTP(0xc001957580, 0x7f9c1c09b140, 0xc006333d48, 0xc0053ad400)
net/http/server.go:2049 +0x44
k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1(0x7f9c1c09b140, 0xc006333d48, 0xc0053ad400)
k8s.io/apiserver@v0.21.0/pkg/endpoints/filterlatency/filterlatency.go:71 +0x186
net/http.HandlerFunc.ServeHTTP(0xc0019575c0, 0x7f9c1c09b140, 0xc006333d48, 0xc0053ad400)
net/http/server.go:2049 +0x44
k8s.io/apiserver/pkg/endpoints/filterlatency.trackCompleted.func1(0x7f9c1c09b140, 0xc006333d48, 0xc0053ad400)
k8s.io/apiserver@v0.21.0/pkg/endpoints/filterlatency/filterlatency.go:95 +0x193
net/http.HandlerFunc.ServeHTTP(0xc001977590, 0x7f9c1c09b140, 0xc006333d48, 0xc0053ad400)
net/http/server.go:2049 +0x44
k8s.io/apiserver/pkg/endpoints/filters.WithAudit.func1(0x7f9c1c09b140, 0xc006333d48, 0xc0053ad400)
k8s.io/apiserver@v0.21.0/pkg/endpoints/filters/audit.go:55 +0x7d5
net/http.HandlerFunc.ServeHTTP(0xc001957600, 0x7f9c1c09b140, 0xc006333d48, 0xc0053ad400)
net/http/server.go:2049 +0x44
k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1(0x7f9c1c09b140, 0xc006333d48, 0xc0053ad400)
k8s.io/apiserver@v0.21.0/pkg/endpoints/filterlatency/filterlatency.go:71 +0x186
net/http.HandlerFunc.ServeHTTP(0xc001957640, 0x7f9c1c09b140, 0xc006333d48, 0xc0053ad400)
net/http/server.go:2049 +0x44
k8s.io/apiserver/pkg/endpoints/filterlatency.trackCompleted.func1(0x7f9c1c09b140, 0xc006333d48, 0xc0053ad400)
k8s.io/apiserver@v0.21.0/pkg/endpoints/filterlatency/filterlatency.go:95 +0x193
net/http.HandlerFunc.ServeHTTP(0xc0019775f0, 0x7f9c1c09b140, 0xc006333d48, 0xc0053ad400)
net/http/server.go:2049 +0x44
k8s.io/apiserver/pkg/endpoints/filters.withAuthentication.func1(0x7f9c1c09b140, 0xc006333d48, 0xc0053ad400)
k8s.io/apiserver@v0.21.0/pkg/endpoints/filters/authentication.go:80 +0x75c
net/http.HandlerFunc.ServeHTTP(0xc00197c8a0, 0x7f9c1c09b140, 0xc006333d48, 0xc0053ad300)
net/http/server.go:2049 +0x44
k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1(0x7f9c1c09b140, 0xc006333d48, 0xc0053ad200)
k8s.io/apiserver@v0.21.0/pkg/endpoints/filterlatency/filterlatency.go:80 +0x38c
net/http.HandlerFunc.ServeHTTP(0xc0019576c0, 0x7f9c1c09b140, 0xc006333d48, 0xc0053ad200)
net/http/server.go:2049 +0x44
8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1(0xc0065e8fc0, 0xc000506648, 0x297b9c8, 0xc006333d48, 0xc0053ad200)
k8s.io/apiserver@v0.21.0/pkg/server/filters/timeout.go:107 +0xb8
created by k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP
k8s.io/apiserver@v0.21.0/pkg/server/filters/timeout.go:93 +0x1f4
I1004 00:21:15.224470 1 healthz.go:244] etcd check failed: readyz
[-]etcd failed: error getting data from etcd: rpc error: code = Unknown desc = context deadline exceeded
E1004 00:21:15.224910 1 timeout.go:128] Header called after Handler finished
-
After some minutes, around half an hour in the clusters where this issue was found, the problem stops and everything starts working normally again.
-
The master nodes of both clusters where the problem has been detected run on Red Hat Enterprise Virtualization.
-
The versions where this problem has been detected are 4.8 and 4.9.
-
The latter two statements do not necessarily imply that the problem cannot happen in other versions or platforms.
Resolution
- At the time when this is being written, the problem is being still investigated on OCPBUGS-2136. Please, feel free to track it and also to submit a support case at Red Hat Support in case you are experiencing the issue.
Diagnostic Steps
- At the time when this is being written, no reproduction steps for this problem have been found.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments