Performing Maintenance on a Growing PostgreSQL Database

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Container Platform (RCOCP) v4.x
  • Red Hat OpenShift Data Foundations (ODF) v4.18 and below
  • Red Hat OpenShift Data Foundations (ODF) v4.19 and higher (see minor change in commands)
  • Red Hat Quay (RHQ) v3.x

Issue

Some users report that their db-noobaa-db-pg-0 PVC has grown too large. Faster than expected, and have had to expand the PVC multiple times.

A rapidly growing database can be the result of the wrong collation. Please execute the following commands:

$ oc exec -n openshift-storage -it noobaa-db-pg-0 -- bash -c 'psql'
postgres=# \l+
postgres=# \q

If the Collate column does not have C for the nbcore database row, please stop here and review the steps in the Change the Multi-Cloud Object Gateway Database's Collation Locale to C solution.

If you're reviewing this article after approaching/passing the 50Gi mark, it may be necessary to expand the db-noobaa-db-pg-0 PVC only by following the Expanding the db-noobaa-db-pg-0 PVC solution. This solution is written for users seeing ~100Gi or higher and the nbcore database is Collate C.

  • The resolution steps can be applied also in ODF 4.19 or higher were noobaa-db is in HA mode with CNPG, with two noobaa db pods running, in this example noobaa-db-pg-cluster-1 is the primary , so commands will be performed only on the primary pod.

    % oc get cluster
    NAME                   AGE     INSTANCES   READY   STATUS                     PRIMARY
    noobaa-db-pg-cluster   2d10h   2           2       Cluster in healthy state   noobaa-db-pg-cluster-1
    
    noobaa-db-pg-cluster-1                                            1/1     Running     0               2d10h
    noobaa-db-pg-cluster-2                                            1/1     Running     0               2d10h
    

    ODF 4.19 or higher: On the Resolution section below , some commands need a minor change: Where it says:

    noobaa-db-pg-0 change to the primary pod, in this example: noobaa-db-pg-cluster-1

    /var/lib/pgsql/data/ change to /var/lib/postgres/data/

Related Articles:

Investigating and Fixing Duplicates inside NooBaa-DB PostgreSQL DB Causing Backingstore Flapping or Performance Degradation in OpenShift Data Foundation (ODF)
Configure Lifecycle Policies on Noobaa/RGW buckets
Defining retention for logs in LokiStack on RHOCP 4

Resolution

Warning: Performing Maintenance on a database carries risk. It is crucial to not only backup the database inside the pod, but also follow the step that copies the database outside of the pod therefore creating two backups, one inside the pod for a restore if needed and one outside of the pod in the event the pod goes into a degraded state and entering the pod is not possible. Optional steps such as OADP backup, PVC snapshot, etc., backups once the NooBaa services have been stopped, are some ways to increase redundancy as well.

Note: If you're executing the steps in this article, most likely the database is ~100GB or higher. Please note that the database dump and $ oc cp process to copy the database outside of the pod can take a significant amount of time, ~1 hour or more. Please plan maintenance windows if necessary.

Procedure:

  1. The less aggressive way to accomplish this process is to implement the following two articles and monitor the database for data reduction.

How to reduce the retention period of deleted metadata in the ODF Multicloud Gateway MCG Noobaa DB database

How to increase the frequency of deleting metadata in the ODF Multicloud Gateway MCG Noobaa DB database when the Noobaa DB is constantly growing

After days/weeks, if there is a significant drop in data usage on the PVC, stop here.

If a significant usage drop is not observed, duplicates may be at play; continue to step 2.

  1. Capture the output of the database size and the tables size by following the steps outlined in the How to Check the Size/Consumption of the PostgreSQL Database in the db-noobaa-db-pg-0 PVC article and save those outputs in the event Red Hat Support requests them. Additionally, use the outputs to ensure enough free space in the db-noobaa-db-pg-0 PVC exists for the database dump (backup). If there isn't enough free space, please expand the PVC by following the steps outlined in the Expanding the db-noobaa-db-pg-0 PVC solution.

  2. Stop all NooBaa services except for the noobaa-db-pg-0 pod.

     $ oc scale deployment -n openshift-storage noobaa-operator noobaa-endpoint --replicas=0
     $ oc scale sts -n openshift-storage noobaa-core --replicas=0
    
  3. Perform a backup of the db and copy the db out of the pod and onto your bastion for safe-keeping (change the target directory name to match your bastion):
    NOTE: Larger DBs will take an excessive amount of time.

    $ oc exec -n openshift-storage -it noobaa-db-pg-0 -- bash -c 'pg_dumpall -U postgres > /var/lib/pgsql/data/dump.sql'
    
  4. Validate the dump was successful:

    $ oc exec -n openshift-storage -it noobaa-db-pg-0 -- bash -c 'tail -n100 /var/lib/pgsql/data/dump.sql'
    -- PostgreSQL database dump complete <--------------- SHOULD SEEE THIS
    --
    -- PostgreSQL database cluster dump complete
    
  5. Copy the dump outside of the pod:

    $ oc cp -n openshift-storage noobaa-db-pg-0:/var/lib/pgsql/data/dump.sql /<target-directory-name>/dump.sql
    
  6. Download both scripts attached to this solution, and copy the scripts into the db pod:

    $ oc cp -n openshift-storage /<source-directory-name>/fix_id_dups_0.sql noobaa-db-pg-0:/var/lib/pgsql/data/fix_id_dups_0.sql
    $ oc cp -n openshift-storage /<source-directory-name>/fix_objmd_dups.sql noobaa-db-pg-0:/var/lib/pgsql/data/fix_objmd_dups.sql
    

    Note: To ensure/validate data integrity, the checksum sha values will be below (this script fix_id_dups_0.sql is exactly the same as fix_id_dups.sql in KCS 7110605)

    md5sum fix_id_dups_0.sql     
    99839229a035dd01027feab1e1773fd2  fix_id_dups_0.sql
    
    md5sum fix_objmd_dups.sql
    59482ce7274248cb4931cb4abbb98d41  fix_objmd_dups.sql
    
  7. rsh into the noobad-db-pg-0 pod and execute the two scripts in this order:

    $ oc -n openshift-storage rsh noobaa-db-pg-0
    $ cd /var/lib/pgsql/data/
    $ psql -d nbcore -f fix_id_dups_0.sql
    $ psql -d nbcore -f fix_objmd_dups.sql
    
  8. Login as postgres connected to the nbcore database and run a vacuum:

    $ psql -U postgres nbcore
    $ vacuum full verbose;
    
  9. Logout and reconnect to nbcore to set the tables user back to user=noobaa, and rebuild the indexes:

    a. Logout:

    $ \q
    

    b. Connect:

    $ psql -d nbcore
    

    c. Set user back to noobaa:
    copy/paste

    DO $$
    DECLARE
        row record;
    BEGIN
        FOR row IN 
            SELECT schemaname, tablename 
            FROM pg_tables 
            WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
        LOOP
            EXECUTE format('ALTER TABLE %I.%I OWNER TO noobaa', row.schemaname, row.tablename);
        END LOOP;
    END;
    $$;
    

    d. Reinex the nbcore database:

    REINDEX (VERBOSE) DATABASE nbcore;
    
  10. Disconnect from nbcore, exit the db pod, then scale up noobaa services to replicas=1:

    $ \q
    $ exit
    $ oc scale deployment -n openshift-storage noobaa-operator noobaa-endpoint --replicas=1
    $ oc scale sts -n openshift-storage noobaa-core --replicas=1
    
  11. Allow a few minutes for the pods to come running. Once all of the NooBaa pods are fully running for ~5 minutes, run the following commands, both should show a Ready phase:

    $ oc get noobaa -n openshift-storage
    $ oc get backingstore -n openshift-storage
    
  12. Optional step if step 9 does not yield "Ready" phase: If NooBaa is not "Ready", give one final restart ALL noobaa pods IAW: 12.1. Restoring the Multicloud Object Gateway product documentation.

  13. Validate NooBaa is healthy with the following commands:

    $ oc get noobaa -n openshift-storage
    $ oc get backingstore -n openshift-storage
    

Root Cause

In many cases, growing databases can be a result of duplicates and failed uploads. Usually due to insufficient resources on a noobaa resource causing a restart of a pod in the NooBaa stack and failed uploads to the bucket. See the following article if performance tuning has not been applied:

Performance tuning guide for Multicloud Object Gateway (NooBaa)

Standalone NooBaa Users ONLY, see below:
Standalone NooBaa Multicloud Object Gateway (MCG) Performance Tuning Guide (without StorageCluster CR) - Quay

Diagnostic Steps

$ oc logs noobaa-db-pg-0
waiting for server to start....2022-08-25 19:48:38.185 UTC [22] FATAL:  could not write lock file "postmaster.pid": No space left on device
 stopped waiting
pg_ctl: could not start server


The PersistentVolume claimed by db-noobaa-db-pg-0 in Namespace openshift-storage is only 2.431% free.

or

May cause NooBaa to enter a "Connecting" phase:

$ oc logs -n openshift-storage noobaa-core-0

detail: "Key ((data ->> '_id'::text))=(673ffc6d2d66e1002c08548f) is duplicated."

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments