OVN-master pod in crashloop, failing to bring up SDBD container - openshift 4.x

Solution Verified - Updated -

Issue

  • OVNkube-master pod(s) are in CrashLoopBackOff state after cluster restart.
  • Cluster may have been in power-off state and all nodes are re-initializing after a power on process.
  • OpenShift Networking Cluster Operator is degraded, possibly after update.
  • control file: /var/run/ovn/ovnsb_db.ctl does not exist on host node
oc -n openshift-ovn-kubernetes get pod
NAME                   READY   STATUS             RESTARTS   AGE
ovnkube-master-5lzdm   6/6     Running            0          133m
ovnkube-master-mgjdn   4/6     CrashLoopBackOff   30         133m
ovnkube-master-nqbbl   6/6     Running            0          133m
ovnkube-node-2qk6z     3/3     Running            0          189d
ovnkube-node-gzd95     3/3     Running            0          189d
ovnkube-node-pccwk     3/3     Running            0          189d
$ oc -n openshift-ovn-kubernetes logs ovnkube-master-mgjdn -p -c ovn-dbchecker
+ [[ -f /env/_master ]]
++ date '+%m%d %H:%M:%S.%N'
+ echo 'I0816 20:27:33.016391157 - ovn-dbchecker - start ovn-dbchecker'
I0816 20:27:33.016391157 - ovn-dbchecker - start ovn-dbchecker
+ exec /usr/bin/ovndbchecker --config-file=/run/ovnkube-config/ovnkube.conf --loglevel 4 --sb-address ssl:10.xxx.xxx.xxx:9642,ssl:10.xxx.xxx.xxx:9642,ssl:10.xxx.xxx.xxx:9642 --sb-client-privkey /ovn-cert/tls.key --sb
-client-cert /ovn-cert/tls.crt --sb-client-cacert /ovn-ca/ca-bundle.crt --sb-cert-common-name ovn --nb-address ssl:10.xxx.xxx.xxx:9641,ssl:10.xxx.xxx.xxx:9641,ssl:10.xxx.xxx.xxx:9641 --nb-client-privkey /ovn-cert/tls
.key --nb-client-cert /ovn-cert/tls.crt --nb-client-cacert /ovn-ca/ca-bundle.crt --nb-cert-common-name ovn
I0816 20:27:33.027709       1 config.go:1321] Parsed config file /run/ovnkube-config/ovnkube.conf
I0816 20:27:33.027795       1 config.go:1322] Parsed config: {Default:{MTU:1400 ConntrackZone:64000 EncapType:geneve EncapIP: EncapPort:6081 InactivityProbe:100000 OpenFlowProbe:180 RawClusterSubnets:10.128.0.0
/14/23 ClusterSubnets:[]} Logging:{File: CNIFile: Level:4 LogFileMaxSize:100 LogFileMaxBackups:5 LogFileMaxAge:5} CNI:{ConfDir:/etc/cni/net.d Plugin:ovn-k8s-cni-overlay} OVNKubernetesFeature:{EnableEgressIP:tru
e} Kubernetes:{Kubeconfig: CACert: CAData:[] APIServer:https://api-int.mycluster.com:6443 Token: CompatServiceCIDR: RawServiceCIDRs:172.30.0.0/16 ServiceCIDRs:[] OVNConfigNamespace:openshift-ovn-kubernet
es MetricsBindAddress: OVNMetricsBindAddress: MetricsEnablePprof:false OVNEmptyLbEvents:false PodIP: RawNoHostSubnetNodes: NoHostSubnetNodes:nil HostNetworkNamespace:openshift-host-network} OvnNorth:{Address: P
rivKey: Cert: CACert: CertCommonName: Scheme: northbound:false externalID: exec:<nil>} OvnSouth:{Address: PrivKey: Cert: CACert: CertCommonName: Scheme: northbound:false externalID: exec:<nil>} Gateway:{Mode:lo
cal Interface: NextHop: VLANID:0 NodeportEnable:true DisableSNATMultipleGWs:false V4JoinSubnet:10.xxx.xxx.xxx/16 V6JoinSubnet:fd98::/64} MasterHA:{ElectionLeaseDuration:60 ElectionRenewDeadline:30 ElectionRetryPeri
od:20} HybridOverlay:{Enabled:false RawClusterSubnets: ClusterSubnets:[] VXLANPort:4789}}
F0816 20:27:33.119117       1 ovndbchecker.go:118] unable to turn on memory trimming for SB DB, stderr: 2022-08-16T20:27:33Z|00001|unixctl|WARN|failed to connect to /var/run/ovn/ovnsb_db.ctl
ovn-appctl: cannot connect to "/var/run/ovn/ovnsb_db.ctl" (No such file or directory)
, error: OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnsb_db.ctl ovsdb-server/memory-trim-on-compaction on' failed: exit status 1

Logs from SBDB container on failing node:
$ oc logs pod/ovnkube-master-mgjdn -c sbdb -n openshift-ovn-kubernetes

2022-08-16T14:02:15.413881339Z + [[ -f /env/_master ]]
2022-08-16T14:02:15.413881339Z + ovn_kubernetes_namespace=openshift-ovn-kubernetes
2022-08-16T14:02:15.413881339Z + ovndb_ctl_ssl_opts='-p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt'
2022-08-16T14:02:15.413881339Z + transport=ssl
2022-08-16T14:02:15.413881339Z + ovn_raft_conn_ip_url_suffix=
2022-08-16T14:02:15.413881339Z + [[ 10.xxx.xxx.xxx == *\:* ]]
2022-08-16T14:02:15.413881339Z + db=sb
2022-08-16T14:02:15.413881339Z + db_port=9642
2022-08-16T14:02:15.413881339Z + ovn_db_file=/etc/ovn/ovnsb_db.db
2022-08-16T14:02:15.413881339Z + CLUSTER_INITIATOR_IP=10.xxx.xxx.xxx
2022-08-16T14:02:15.414910749Z ++ date -Iseconds
2022-08-16T14:02:15.416112836Z + echo '2022-08-16T14:02:15+00:00 - starting sbdb  CLUSTER_INITIATOR_IP=10.xxx.xxx.xxx'
2022-08-16T14:02:15.416162331Z 2022-08-16T14:02:15+00:00 - starting sbdb  CLUSTER_INITIATOR_IP=10.xxx.xxx.xxx
2022-08-16T14:02:15.416208357Z + initial_raft_create=true
2022-08-16T14:02:15.416245131Z + initialize=false
2022-08-16T14:02:15.416291140Z + [[ ! -e /etc/ovn/ovnsb_db.db ]]
2022-08-16T14:02:15.416631103Z + [[ false == \t\r\u\e ]]
2022-08-16T14:02:15.417067006Z ++ bracketify 10.xxx.xxx.xxx
2022-08-16T14:02:15.417087515Z ++ case "$1" in
2022-08-16T14:02:15.417099947Z ++ echo 10.xxx.xxx.xxx
2022-08-16T14:02:15.417296178Z + exec /usr/share/ovn/scripts/ovn-ctl --db-sb-cluster-local-port=9644 --db-sb-cluster-local-addr=10.xxx.xxx.xxx --no-monitor --db-sb-cluster-local-proto=ssl --ovn-sb-db-ssl-key=/ovn-cert/tls.key --ovn-sb-db-ssl-cert=/ovn-cert/tls.crt --ovn-sb-db-ssl-ca-cert=/ovn-ca/ca-bundle.crt '--ovn-sb-log=-vconsole:info -vfile:off' run_sb_ovsdb
2022-08-16T14:02:15.556716029Z 2022-08-16T14:02:15Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-sb.log
2022-08-16T14:02:15.790204390Z ovsdb-server: 2022-08-16T14:02:15.790260613Z ovsdb error: error reading record 13565 from OVN_Southbound log: record 13565 with index 4503382 skips past expected index 4478022

ovsdb-server: 2022-08-16T14:02:15.790260613Z ovsdb error: error reading record 13565 from OVN_Southbound log: record 13565 with index 4503382 skips past expected index 4478022

Environment

  • OpenShift Container Platform (OCP) 4.7+
  • Using Openshift-OVN-Kubernetes as CNI for platform

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content