OVN-master pod in crashloop, failing to bring up SDBD container - openshift 4.x
Issue
- OVNkube-master pod(s) are in CrashLoopBackOff state after cluster restart.
- Cluster may have been in power-off state and all nodes are re-initializing after a power on process.
- OpenShift Networking Cluster Operator is degraded, possibly after update.
- control file: /var/run/ovn/ovnsb_db.ctl does not exist on host node
oc -n openshift-ovn-kubernetes get pod
NAME READY STATUS RESTARTS AGE
ovnkube-master-5lzdm 6/6 Running 0 133m
ovnkube-master-mgjdn 4/6 CrashLoopBackOff 30 133m
ovnkube-master-nqbbl 6/6 Running 0 133m
ovnkube-node-2qk6z 3/3 Running 0 189d
ovnkube-node-gzd95 3/3 Running 0 189d
ovnkube-node-pccwk 3/3 Running 0 189d
$ oc -n openshift-ovn-kubernetes logs ovnkube-master-mgjdn -p -c ovn-dbchecker
+ [[ -f /env/_master ]]
++ date '+%m%d %H:%M:%S.%N'
+ echo 'I0816 20:27:33.016391157 - ovn-dbchecker - start ovn-dbchecker'
I0816 20:27:33.016391157 - ovn-dbchecker - start ovn-dbchecker
+ exec /usr/bin/ovndbchecker --config-file=/run/ovnkube-config/ovnkube.conf --loglevel 4 --sb-address ssl:10.xxx.xxx.xxx:9642,ssl:10.xxx.xxx.xxx:9642,ssl:10.xxx.xxx.xxx:9642 --sb-client-privkey /ovn-cert/tls.key --sb
-client-cert /ovn-cert/tls.crt --sb-client-cacert /ovn-ca/ca-bundle.crt --sb-cert-common-name ovn --nb-address ssl:10.xxx.xxx.xxx:9641,ssl:10.xxx.xxx.xxx:9641,ssl:10.xxx.xxx.xxx:9641 --nb-client-privkey /ovn-cert/tls
.key --nb-client-cert /ovn-cert/tls.crt --nb-client-cacert /ovn-ca/ca-bundle.crt --nb-cert-common-name ovn
I0816 20:27:33.027709 1 config.go:1321] Parsed config file /run/ovnkube-config/ovnkube.conf
I0816 20:27:33.027795 1 config.go:1322] Parsed config: {Default:{MTU:1400 ConntrackZone:64000 EncapType:geneve EncapIP: EncapPort:6081 InactivityProbe:100000 OpenFlowProbe:180 RawClusterSubnets:10.128.0.0
/14/23 ClusterSubnets:[]} Logging:{File: CNIFile: Level:4 LogFileMaxSize:100 LogFileMaxBackups:5 LogFileMaxAge:5} CNI:{ConfDir:/etc/cni/net.d Plugin:ovn-k8s-cni-overlay} OVNKubernetesFeature:{EnableEgressIP:tru
e} Kubernetes:{Kubeconfig: CACert: CAData:[] APIServer:https://api-int.mycluster.com:6443 Token: CompatServiceCIDR: RawServiceCIDRs:172.30.0.0/16 ServiceCIDRs:[] OVNConfigNamespace:openshift-ovn-kubernet
es MetricsBindAddress: OVNMetricsBindAddress: MetricsEnablePprof:false OVNEmptyLbEvents:false PodIP: RawNoHostSubnetNodes: NoHostSubnetNodes:nil HostNetworkNamespace:openshift-host-network} OvnNorth:{Address: P
rivKey: Cert: CACert: CertCommonName: Scheme: northbound:false externalID: exec:<nil>} OvnSouth:{Address: PrivKey: Cert: CACert: CertCommonName: Scheme: northbound:false externalID: exec:<nil>} Gateway:{Mode:lo
cal Interface: NextHop: VLANID:0 NodeportEnable:true DisableSNATMultipleGWs:false V4JoinSubnet:10.xxx.xxx.xxx/16 V6JoinSubnet:fd98::/64} MasterHA:{ElectionLeaseDuration:60 ElectionRenewDeadline:30 ElectionRetryPeri
od:20} HybridOverlay:{Enabled:false RawClusterSubnets: ClusterSubnets:[] VXLANPort:4789}}
F0816 20:27:33.119117 1 ovndbchecker.go:118] unable to turn on memory trimming for SB DB, stderr: 2022-08-16T20:27:33Z|00001|unixctl|WARN|failed to connect to /var/run/ovn/ovnsb_db.ctl
ovn-appctl: cannot connect to "/var/run/ovn/ovnsb_db.ctl" (No such file or directory)
, error: OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnsb_db.ctl ovsdb-server/memory-trim-on-compaction on' failed: exit status 1
Logs from SBDB container on failing node:
$ oc logs pod/ovnkube-master-mgjdn -c sbdb -n openshift-ovn-kubernetes
2022-08-16T14:02:15.413881339Z + [[ -f /env/_master ]]
2022-08-16T14:02:15.413881339Z + ovn_kubernetes_namespace=openshift-ovn-kubernetes
2022-08-16T14:02:15.413881339Z + ovndb_ctl_ssl_opts='-p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt'
2022-08-16T14:02:15.413881339Z + transport=ssl
2022-08-16T14:02:15.413881339Z + ovn_raft_conn_ip_url_suffix=
2022-08-16T14:02:15.413881339Z + [[ 10.xxx.xxx.xxx == *\:* ]]
2022-08-16T14:02:15.413881339Z + db=sb
2022-08-16T14:02:15.413881339Z + db_port=9642
2022-08-16T14:02:15.413881339Z + ovn_db_file=/etc/ovn/ovnsb_db.db
2022-08-16T14:02:15.413881339Z + CLUSTER_INITIATOR_IP=10.xxx.xxx.xxx
2022-08-16T14:02:15.414910749Z ++ date -Iseconds
2022-08-16T14:02:15.416112836Z + echo '2022-08-16T14:02:15+00:00 - starting sbdb CLUSTER_INITIATOR_IP=10.xxx.xxx.xxx'
2022-08-16T14:02:15.416162331Z 2022-08-16T14:02:15+00:00 - starting sbdb CLUSTER_INITIATOR_IP=10.xxx.xxx.xxx
2022-08-16T14:02:15.416208357Z + initial_raft_create=true
2022-08-16T14:02:15.416245131Z + initialize=false
2022-08-16T14:02:15.416291140Z + [[ ! -e /etc/ovn/ovnsb_db.db ]]
2022-08-16T14:02:15.416631103Z + [[ false == \t\r\u\e ]]
2022-08-16T14:02:15.417067006Z ++ bracketify 10.xxx.xxx.xxx
2022-08-16T14:02:15.417087515Z ++ case "$1" in
2022-08-16T14:02:15.417099947Z ++ echo 10.xxx.xxx.xxx
2022-08-16T14:02:15.417296178Z + exec /usr/share/ovn/scripts/ovn-ctl --db-sb-cluster-local-port=9644 --db-sb-cluster-local-addr=10.xxx.xxx.xxx --no-monitor --db-sb-cluster-local-proto=ssl --ovn-sb-db-ssl-key=/ovn-cert/tls.key --ovn-sb-db-ssl-cert=/ovn-cert/tls.crt --ovn-sb-db-ssl-ca-cert=/ovn-ca/ca-bundle.crt '--ovn-sb-log=-vconsole:info -vfile:off' run_sb_ovsdb
2022-08-16T14:02:15.556716029Z 2022-08-16T14:02:15Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-sb.log
2022-08-16T14:02:15.790204390Z ovsdb-server: 2022-08-16T14:02:15.790260613Z ovsdb error: error reading record 13565 from OVN_Southbound log: record 13565 with index 4503382 skips past expected index 4478022
ovsdb-server: 2022-08-16T14:02:15.790260613Z ovsdb error: error reading record 13565 from OVN_Southbound log: record 13565 with index 4503382 skips past expected index 4478022
Environment
- OpenShift Container Platform (OCP) 4.7+
- Using Openshift-OVN-Kubernetes as CNI for platform
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.