Performing Maintenance on a Growing PostgreSQL Database
Environment
- Red Hat OpenShift Container Platform (RCOCP) v4.x
- Red Hat OpenShift Data Foundations (ODF) v4.18 and below
- Red Hat OpenShift Data Foundations (ODF) v4.19 and higher (see minor change in commands)
- Red Hat Quay (RHQ) v3.x
Issue
Some users report that their db-noobaa-db-pg-0 PVC has grown too large. Faster than expected, and have had to expand the PVC multiple times.
A rapidly growing database can be the result of the wrong collation. Please execute the following commands:
$ oc exec -n openshift-storage -it noobaa-db-pg-0 -- bash -c 'psql'
postgres=# \l+
postgres=# \q
If the Collate column does not have C for the nbcore database row, please stop here and review the steps in the Change the Multi-Cloud Object Gateway Database's Collation Locale to C solution.
If you're reviewing this article after approaching/passing the 50Gi mark, it may be necessary to expand the db-noobaa-db-pg-0 PVC only by following the Expanding the db-noobaa-db-pg-0 PVC solution. This solution is written for users seeing ~100Gi or higher and the nbcore database is Collate C.
-
The resolution steps can be applied also in ODF 4.19 or higher were noobaa-db is in HA mode with CNPG, with two noobaa db pods running, in this example
noobaa-db-pg-cluster-1is the primary , so commands will be performed only on the primary pod.% oc get cluster NAME AGE INSTANCES READY STATUS PRIMARY noobaa-db-pg-cluster 2d10h 2 2 Cluster in healthy state noobaa-db-pg-cluster-1 noobaa-db-pg-cluster-1 1/1 Running 0 2d10h noobaa-db-pg-cluster-2 1/1 Running 0 2d10hODF 4.19 or higher: On the Resolution section below , some commands need a minor change: Where it says:
noobaa-db-pg-0change to the primary pod, in this example:noobaa-db-pg-cluster-1/var/lib/pgsql/data/change to/var/lib/postgres/data/
Related Articles:
Investigating and Fixing Duplicates inside NooBaa-DB PostgreSQL DB Causing Backingstore Flapping or Performance Degradation in OpenShift Data Foundation (ODF)
Configure Lifecycle Policies on Noobaa/RGW buckets
Defining retention for logs in LokiStack on RHOCP 4
Resolution
Warning: Performing Maintenance on a database carries risk. It is crucial to not only backup the database inside the pod, but also follow the step that copies the database outside of the pod therefore creating two backups, one inside the pod for a restore if needed and one outside of the pod in the event the pod goes into a degraded state and entering the pod is not possible. Optional steps such as OADP backup, PVC snapshot, etc., backups once the NooBaa services have been stopped, are some ways to increase redundancy as well.
Note: If you're executing the steps in this article, most likely the database is ~100GB or higher. Please note that the database dump and $ oc cp process to copy the database outside of the pod can take a significant amount of time, ~1 hour or more. Please plan maintenance windows if necessary.
Procedure:
- The less aggressive way to accomplish this process is to implement the following two articles and monitor the database for data reduction.
After days/weeks, if there is a significant drop in data usage on the PVC, stop here.
If a significant usage drop is not observed, duplicates may be at play; continue to step 2.
-
Capture the output of the database size and the tables size by following the steps outlined in the How to Check the Size/Consumption of the PostgreSQL Database in the db-noobaa-db-pg-0 PVC article and save those outputs in the event Red Hat Support requests them. Additionally, use the outputs to ensure enough free space in the
db-noobaa-db-pg-0PVC exists for the database dump (backup). If there isn't enough free space, please expand the PVC by following the steps outlined in the Expanding the db-noobaa-db-pg-0 PVC solution. -
Stop all NooBaa services except for the
noobaa-db-pg-0pod.$ oc scale deployment -n openshift-storage noobaa-operator noobaa-endpoint --replicas=0 $ oc scale sts -n openshift-storage noobaa-core --replicas=0 -
Perform a backup of the db and copy the db out of the pod and onto your bastion for safe-keeping (change the target directory name to match your bastion):
NOTE: Larger DBs will take an excessive amount of time.$ oc exec -n openshift-storage -it noobaa-db-pg-0 -- bash -c 'pg_dumpall -U postgres > /var/lib/pgsql/data/dump.sql' -
Validate the dump was successful:
$ oc exec -n openshift-storage -it noobaa-db-pg-0 -- bash -c 'tail -n100 /var/lib/pgsql/data/dump.sql' -- PostgreSQL database dump complete <--------------- SHOULD SEEE THIS -- -- PostgreSQL database cluster dump complete -
Copy the dump outside of the pod:
$ oc cp -n openshift-storage noobaa-db-pg-0:/var/lib/pgsql/data/dump.sql /<target-directory-name>/dump.sql -
Download both scripts attached to this solution, and copy the scripts into the db pod:
$ oc cp -n openshift-storage /<source-directory-name>/fix_id_dups_0.sql noobaa-db-pg-0:/var/lib/pgsql/data/fix_id_dups_0.sql $ oc cp -n openshift-storage /<source-directory-name>/fix_objmd_dups.sql noobaa-db-pg-0:/var/lib/pgsql/data/fix_objmd_dups.sqlNote: To ensure/validate data integrity, the checksum sha values will be below (this script fix_id_dups_0.sql is exactly the same as fix_id_dups.sql in KCS 7110605)
md5sum fix_id_dups_0.sql 99839229a035dd01027feab1e1773fd2 fix_id_dups_0.sql md5sum fix_objmd_dups.sql 59482ce7274248cb4931cb4abbb98d41 fix_objmd_dups.sql -
rsh into the
noobad-db-pg-0pod and execute the two scripts in this order:$ oc -n openshift-storage rsh noobaa-db-pg-0 $ cd /var/lib/pgsql/data/ $ psql -d nbcore -f fix_id_dups_0.sql $ psql -d nbcore -f fix_objmd_dups.sql -
Login as postgres connected to the nbcore database and run a vacuum:
$ psql -U postgres nbcore $ vacuum full verbose; -
Logout and reconnect to nbcore to set the tables user back to user=
noobaa, and rebuild the indexes:a. Logout:
$ \qb. Connect:
$ psql -d nbcorec. Set user back to noobaa:
copy/pasteDO $$ DECLARE row record; BEGIN FOR row IN SELECT schemaname, tablename FROM pg_tables WHERE schemaname NOT IN ('pg_catalog', 'information_schema') LOOP EXECUTE format('ALTER TABLE %I.%I OWNER TO noobaa', row.schemaname, row.tablename); END LOOP; END; $$;d. Reinex the nbcore database:
REINDEX (VERBOSE) DATABASE nbcore; -
Disconnect from nbcore, exit the db pod, then scale up noobaa services to
replicas=1:$ \q $ exit $ oc scale deployment -n openshift-storage noobaa-operator noobaa-endpoint --replicas=1 $ oc scale sts -n openshift-storage noobaa-core --replicas=1 -
Allow a few minutes for the pods to come running. Once all of the NooBaa pods are fully running for ~5 minutes, run the following commands, both should show a
Readyphase:$ oc get noobaa -n openshift-storage $ oc get backingstore -n openshift-storage -
Optional step if step 9 does not yield "Ready" phase: If NooBaa is not "Ready", give one final restart ALL noobaa pods IAW: 12.1. Restoring the Multicloud Object Gateway product documentation.
-
Validate NooBaa is healthy with the following commands:
$ oc get noobaa -n openshift-storage $ oc get backingstore -n openshift-storage
Root Cause
In many cases, growing databases can be a result of duplicates and failed uploads. Usually due to insufficient resources on a noobaa resource causing a restart of a pod in the NooBaa stack and failed uploads to the bucket. See the following article if performance tuning has not been applied:
Performance tuning guide for Multicloud Object Gateway (NooBaa)
Standalone NooBaa Users ONLY, see below:
Standalone NooBaa Multicloud Object Gateway (MCG) Performance Tuning Guide (without StorageCluster CR) - Quay
Diagnostic Steps
$ oc logs noobaa-db-pg-0
waiting for server to start....2022-08-25 19:48:38.185 UTC [22] FATAL: could not write lock file "postmaster.pid": No space left on device
stopped waiting
pg_ctl: could not start server
The PersistentVolume claimed by db-noobaa-db-pg-0 in Namespace openshift-storage is only 2.431% free.
or
May cause NooBaa to enter a "Connecting" phase:
$ oc logs -n openshift-storage noobaa-core-0
detail: "Key ((data ->> '_id'::text))=(673ffc6d2d66e1002c08548f) is duplicated."
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments