[OSP 17.1] OVN DB Fails to Start - Remove and Recreate OVN Cluster Membership for Southbound or Northbound DB

Solution In Progress - Updated -

Issue

  • The following errors in OVN Northbound or Southbound DB logs:
stderr F ovsdb-server: ovsdb error: error reading record XXXX from OVN_Northbound log: record XXXX with index XXXXXX skips past expected index XXXXXX
stderr F ovsdb-server: ovsdb error: error reading record XXXX from OVN_Southbound log: record XXXX advances commit index to XXXXXX but last log index is XXXXXX
  • During the stack update we are seeing the error below:
2024-10-10 23:00:46,954 p=945413 u=stack n=ansible | 2024-10-10 23:00:46.954209 | 525400ca-0bde-4685-bcf9-00000009b61b |      FATAL | Set connection | controller-0 | error={"changed": true, "cmd": "podman exec ovn_cluster_north_db_server bash -c \"ovn-nbctl --no-leader-only --inactivity-probe=60000 set-connection ptcp:6641:0.0.0.0\"\npodman exec ovn_cluster_south_db_server bash -c \"ovn-sbctl --no-leader-only --inactivity-probe=60000 set-connection ptcp:6642:0.0.0.0\"\n", "delta": "0:00:00.182219", "end": "2024-10-10 17:30:45.535521", "msg": "non-zero return code", "rc": 1, "start": "2024-10-10 17:30:45.353302", "stderr": "ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (Connection refused)\novn-sbctl: unix:/var/run/ovn/ovnsb_db.sock: database connection failed ()", "stderr_lines": ["ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (Connection refused)", "ovn-sbctl: unix:/var/run/ovn/ovnsb_db.sock: database connection failed ()"], "stdout": "", "stdout_lines": []}
  • tripleo_ovn_cluster_north_db_server never stays up more than a few seconds.

  • In /var/log/containers/stdouts/tripleo_ovn_cluster_north_db_server.log we can see the following:

2024-10-15T04:18:33.089432313+00:00 stderr F + echo 'Running command: '\''bash -c $* -- eval source /etc/sysconfig/ovn_cluster; exec /usr/local/bin/start-nb-db-server ${OVN_NB_DB_OPTS}'\'''
2024-10-15T04:18:33.089438084+00:00 stdout F Running command: 'bash -c $* -- eval source /etc/sysconfig/ovn_cluster; exec /usr/local/bin/start-nb-db-server ${OVN_NB_DB_OPTS}'
2024-10-15T04:18:33.089448068+00:00 stderr F + umask 0022
2024-10-15T04:18:33.089510548+00:00 stderr F + exec bash -c '$*' -- eval source '/etc/sysconfig/ovn_cluster;' exec /usr/local/bin/start-nb-db-server '${OVN_NB_DB_OPTS}'
2024-10-15T04:18:33.167340642+00:00 stderr F ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (Connection refused)
2024-10-15T04:18:33.171213567+00:00 stdout P Waiting for OVN_Northbound to come up 
2024-10-15T04:18:33.173736404+00:00 stderr F 2024-10-15T04:18:33Z|00001|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connecting...
2024-10-15T04:18:33.173736404+00:00 stderr F 2024-10-15T04:18:33Z|00002|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connection attempt failed (Connection refused)
2024-10-15T04:18:33.274104446+00:00 stderr F ovsdb-server: ovsdb error: error reading record 10938 from OVN_Northbound log: record 10938 with index 127132 skips past expected index 122968
2024-10-15T04:18:34.175018743+00:00 stderr F 2024-10-15T04:18:34Z|00003|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connecting...
2024-10-15T04:18:34.175049146+00:00 stderr F 2024-10-15T04:18:34Z|00004|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connection attempt failed (Connection refused)
2024-10-15T04:18:34.175049146+00:00 stderr F 2024-10-15T04:18:34Z|00005|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: waiting 2 seconds before reconnect
2024-10-15T04:18:36.177312004+00:00 stderr F 2024-10-15T04:18:36Z|00006|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connecting...
2024-10-15T04:18:36.177312004+00:00 stderr F 2024-10-15T04:18:36Z|00007|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connection attempt failed (Connection refused)
2024-10-15T04:18:36.177312004+00:00 stderr F 2024-10-15T04:18:36Z|00008|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: waiting 4 seconds before reconnect
2024-10-15T04:18:40.181610183+00:00 stderr F 2024-10-15T04:18:40Z|00009|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connecting...
2024-10-15T04:18:40.181610183+00:00 stderr F 2024-10-15T04:18:40Z|00010|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connection attempt failed (Connection refused)
2024-10-15T04:18:40.181610183+00:00 stderr F 2024-10-15T04:18:40Z|00011|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: continuing to reconnect in the background but suppressing further logging
2024-10-15T04:19:03.181682156+00:00 stderr F 2024-10-15T04:19:03Z|00012|fatal_signal|WARN|terminating with signal 14 (Alarm clock)
2024-10-15T04:19:03.182277770+00:00 stderr F /etc/init.d/functions: line 589:    87 Alarm clock             "$@"
2024-10-15T04:19:03.182386622+00:00 stdout P [
2024-10-15T04:19:03.182393994+00:00 stdout P FAILED
2024-10-15T04:19:03.182398243+00:00 stdout P ]
2024-10-15T04:19:03.182486324+00:00 stdout F

Environment

  • Red Hat OpenStack Platform 17.1 (RHOSP)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content