[OSP 17.1] OVN DB Fails to Start - Remove and Recreate OVN Cluster Membership for Southbound or Northbound DB
Issue
- The following errors in OVN Northbound or Southbound DB logs:
stderr F ovsdb-server: ovsdb error: error reading record XXXX from OVN_Northbound log: record XXXX with index XXXXXX skips past expected index XXXXXX
stderr F ovsdb-server: ovsdb error: error reading record XXXX from OVN_Southbound log: record XXXX advances commit index to XXXXXX but last log index is XXXXXX
- During the stack update we are seeing the error below:
2024-10-10 23:00:46,954 p=945413 u=stack n=ansible | 2024-10-10 23:00:46.954209 | 525400ca-0bde-4685-bcf9-00000009b61b | FATAL | Set connection | controller-0 | error={"changed": true, "cmd": "podman exec ovn_cluster_north_db_server bash -c \"ovn-nbctl --no-leader-only --inactivity-probe=60000 set-connection ptcp:6641:0.0.0.0\"\npodman exec ovn_cluster_south_db_server bash -c \"ovn-sbctl --no-leader-only --inactivity-probe=60000 set-connection ptcp:6642:0.0.0.0\"\n", "delta": "0:00:00.182219", "end": "2024-10-10 17:30:45.535521", "msg": "non-zero return code", "rc": 1, "start": "2024-10-10 17:30:45.353302", "stderr": "ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (Connection refused)\novn-sbctl: unix:/var/run/ovn/ovnsb_db.sock: database connection failed ()", "stderr_lines": ["ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (Connection refused)", "ovn-sbctl: unix:/var/run/ovn/ovnsb_db.sock: database connection failed ()"], "stdout": "", "stdout_lines": []}
-
tripleo_ovn_cluster_north_db_server
never stays up more than a few seconds. -
In
/var/log/containers/stdouts/tripleo_ovn_cluster_north_db_server.log
we can see the following:
2024-10-15T04:18:33.089432313+00:00 stderr F + echo 'Running command: '\''bash -c $* -- eval source /etc/sysconfig/ovn_cluster; exec /usr/local/bin/start-nb-db-server ${OVN_NB_DB_OPTS}'\'''
2024-10-15T04:18:33.089438084+00:00 stdout F Running command: 'bash -c $* -- eval source /etc/sysconfig/ovn_cluster; exec /usr/local/bin/start-nb-db-server ${OVN_NB_DB_OPTS}'
2024-10-15T04:18:33.089448068+00:00 stderr F + umask 0022
2024-10-15T04:18:33.089510548+00:00 stderr F + exec bash -c '$*' -- eval source '/etc/sysconfig/ovn_cluster;' exec /usr/local/bin/start-nb-db-server '${OVN_NB_DB_OPTS}'
2024-10-15T04:18:33.167340642+00:00 stderr F ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (Connection refused)
2024-10-15T04:18:33.171213567+00:00 stdout P Waiting for OVN_Northbound to come up
2024-10-15T04:18:33.173736404+00:00 stderr F 2024-10-15T04:18:33Z|00001|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connecting...
2024-10-15T04:18:33.173736404+00:00 stderr F 2024-10-15T04:18:33Z|00002|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connection attempt failed (Connection refused)
2024-10-15T04:18:33.274104446+00:00 stderr F ovsdb-server: ovsdb error: error reading record 10938 from OVN_Northbound log: record 10938 with index 127132 skips past expected index 122968
2024-10-15T04:18:34.175018743+00:00 stderr F 2024-10-15T04:18:34Z|00003|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connecting...
2024-10-15T04:18:34.175049146+00:00 stderr F 2024-10-15T04:18:34Z|00004|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connection attempt failed (Connection refused)
2024-10-15T04:18:34.175049146+00:00 stderr F 2024-10-15T04:18:34Z|00005|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: waiting 2 seconds before reconnect
2024-10-15T04:18:36.177312004+00:00 stderr F 2024-10-15T04:18:36Z|00006|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connecting...
2024-10-15T04:18:36.177312004+00:00 stderr F 2024-10-15T04:18:36Z|00007|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connection attempt failed (Connection refused)
2024-10-15T04:18:36.177312004+00:00 stderr F 2024-10-15T04:18:36Z|00008|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: waiting 4 seconds before reconnect
2024-10-15T04:18:40.181610183+00:00 stderr F 2024-10-15T04:18:40Z|00009|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connecting...
2024-10-15T04:18:40.181610183+00:00 stderr F 2024-10-15T04:18:40Z|00010|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connection attempt failed (Connection refused)
2024-10-15T04:18:40.181610183+00:00 stderr F 2024-10-15T04:18:40Z|00011|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: continuing to reconnect in the background but suppressing further logging
2024-10-15T04:19:03.181682156+00:00 stderr F 2024-10-15T04:19:03Z|00012|fatal_signal|WARN|terminating with signal 14 (Alarm clock)
2024-10-15T04:19:03.182277770+00:00 stderr F /etc/init.d/functions: line 589: 87 Alarm clock "$@"
2024-10-15T04:19:03.182386622+00:00 stdout P [
2024-10-15T04:19:03.182393994+00:00 stdout P FAILED
2024-10-15T04:19:03.182398243+00:00 stdout P ]
2024-10-15T04:19:03.182486324+00:00 stdout F
Environment
- Red Hat OpenStack Platform 17.1 (RHOSP)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.