RHN Satellite Backup strategies?

Latest response

A previous employee implemented a bash script which stops the satellite, performs a database backup, starts the satellite, and runs statistics. This script has various drawbacks, but rather than go into detail I thought I would check with the community: what are your backup strategies? How often do you run database backups? Do you have scripts you are willing/able to share?

Thanks!

Responses

It's not exactly complicated. It's right there in the manual.

/usr/sbin/rhn-satellite stop

su oracle -c "/usr/bin/db-control backup <backup target dir>"

/usr/sbin/rhn-satellite start

 

I fire the cron job every 3 days.

 

Personally I would put the DB in Archive-Log mode and do a live backup or even an Export from time to time. But you should have that conversation with your DBA.

Yep, and those three steps are more or less what's in the script the coworker wrote. Ideally this Oracle database would be handled by an external DBA group, but it's not in our case.

I'm looking for things people have done beyond those basics in the manual. If you're handling it (the database, backups) in-house, how are you handling rotation? How are you doing sanity checking on each of the above steps? Our script routinely fails, and while I'm fairly confident I know what I need to change, I want to see what others are doing.

Cheers!

We've been doing "those three steps" daily -- with a little more overhead -- for years and have had very few failures. Two as I recall. Those weren't script failures, but bad copies of the database. The script itself has never failed.

While I don't think it should result in a failure, I have changed from the su syntax to

/sbin/runuser oracle -c "db-control backup ${BackupDir}/$DayOfRun" # see next para for $DayOfRun explanation

Rotation: I back up to disk to directories with day names (Sunday, Monday, ...). I think this is overkill for ready-access backups, but I've got the disk space so am not concerned. I rsync this to a host in a different city, plus a monthly tape goes off site; so I think I am well covered for DR.

Each Sunday a cron restores the database rsync'd to the remote location. So I have a reasonably current check on whether I will be able to restore from backup. Over the past 5 years one of those restores has failed.

If you are pre-5.4 (going on recall) you may want to insert some code to rotate listener logs on occasion (/opt/apps/oracle/web/product/10.2.0/db_1/network/log/). Same for /var/log/rhn/rhn_web_api.log (again, pre-5.4) if you have that feature turned on.

My favorite pass-time: BugZilla  https://bugzilla.redhat.com/show_bug.cgi?id=914863

Hi Cristóbal,

When we had the satellite server first setup the Red Hat guy who did it gave us this:

#!/bin/sh

# This script performs a hot backup of an embedded RHN Oracle database. It takes
# a single argument which is either 'full' or 'incremental' which determines
# what kind of backup will be performed. An incremental backup is incremental to
# the most recent full backup.

# The script cleans up the catalog in the db control file if files are deleted
# from the disk outside of rman.

# The script also deletes obsolete archive logs and backups. The retention
# policy for configuring this was set in the prep script.

# Matthew Booth <mbooth@redhat.com> 22/09/2006

function errormsg() {
    echo $* > /dev/stderr
}

# Make sure we're root
# Could just be Oracle, but root is simpler for administrators
if [ `id -u` -ne 0 ]; then
    errormsg This script must be executed as root
    exit 1
fi

# Check whether this is a full or incremental backup
case $1 in
    full)
        inc_level=0
        ;;
    incremental)
        inc_level=1
        ;;
    
    *)
        errormsg Usage: $0 '(full|incremental)'
        exit 1
        ;;
esac

# Check the rhn database is running
#if ! /sbin/service rhn-database status > /dev/null; then
if ! service oracle status|grep rhnsat > /dev/null; then
    errormsg The rhn-database service must be running to perform a hot backup
    exit 1
fi

# Become the oracle user
su - oracle -c "/bin/bash -s $1 $inc_level" <<-'ORACLE' >> /var/log/rhn/db_backup.log
    export ORACLE_SID=rhnsat
    inc_label=$1
    inc_level=$2

    echo ===================================================================
    echo Starting $inc_label backup at `date`
    echo ===================================================================

    # Check for archivelogs and backups which were manually deleted from disk
    # Remove manually deleted archivelogs and backups from the catalog
    # Perform an rman backup
    # Remove backups and archivelogs which are not required by the retention
    # policy
    rman target / nocatalog <<-EOF
        crosscheck archivelog all;
        crosscheck backup;

        delete noprompt expired archivelog all;
        delete noprompt expired backup;

        run {
            allocate channel d1 device type disk
                format '/rhnsat/backup/data/%U';

            backup incremental level = $inc_level database
                include current controlfile plus archivelog delete all input;
        }

        delete noprompt obsolete;
    EOF

    # Verify that the backup is valid
    nvalid=`rman target / nocatalog <<-EOF | grep "^Finished restore" | wc -l
        restore database validate;
        restore controlfile to '/tmp/rhndb-cf' validate;
        restore spfile to '/tmp/rhndb-spfile' validate;
    EOF
    `

    # If we couldn't validate database, controlfile and spfile, write an error
    # message to stderr
    if [ $nvalid -ne 3 ]; then
        echo "DATABASE BACKUP IS INVALID!"
        echo
        exit 1
    else
        echo "Backup is valid"
        echo
    fi
ORACLE

if [ $? -ne 0 ]; then
    errormsg An error occurred while backing up the database. Check /var/log/rhn/db_backup.log for details.
    exit 1
fi

# vim: ts=4 sw=4 smartindent noexpandtab


This can do a hot backup with out stopping the DB, it appears to work fine for us, and one the rare occasions we have needed to, backups restore ok. We call it in a cron job like this:

0 19 * * 0 root /opt/tivoli/tsm/client/hot-backup.sh full
0 19 * * 1-6 root /opt/tivoli/tsm/client/hot-backup.sh incremental

Best regards,

Tris

Tristan this is great; thanks! It brings up several questions for me. I suppose one is for Red Hat (given that an @redhat.com person wrote it), and that's whether we might see this included in and better documented in a future point release. Tristan, would you be willing to attach the script to an RFE in Bugzilla?

Another comes up reading this bit:

The retention policy for configuring this was set in the prep script.

What prep script? Do you have that to share? Also, how do clients behave while the backup is running? I notice you run it after midnight local time to avoid any major impact, but possibly you've attempted various interactions (setting actions via the web interface, running yum updates, etc.) during the backup as a test? Relatedly, approximately how long does your backup take? I know that won't be terribly meaningful for comparison without architecture details (size of your satellite database, hardware details such as disk latency, etc.), but I'd still like to have reference points from other installations. I may get the opportunity to make changes here to improve our satellite, and the more I hear from folks like you the better my chances. :)

Thanks for posting this, Tristan. This should certainly be more widely available. I'll look into it.

Hi Cristóbal,

I'd totally forgoten about the prep script! I've had a look through my documentation and found the tar file we were given. I'm not sure where I can host a copy of it, as it also covers restores. This was from before Oracle 10, but it is easy to work out how to make it relevent to the current versions of Satellite.

With regard to server performance, we have not done any quantative testing, but I have not noticed it to be any slower for things like Yum, and looking at our satellite-sync logs the impact is not huge. (10-20 minutes to sync vs 5-10 munites the rest of the time)

This is our DB size:

-bash-3.2$ db-control report
Tablespace Size  Used   Avail  Use%
DATA_TBS   13.6G 11.9G  1.6G   88%
SYSAUX     500M  204M   295.9M 41%
SYSTEM     400M  255.5M 144.4M 64%
TEMP_TBS   1000M 0B     1000M  0%
UNDO_TBS   1000M 481.2M 518.7M 48%
USERS      128M  64K    127.9M 0%

The backups are very quick, as an example:

time /opt/tivoli/tsm/client/hot-backup.sh full
real    5m9.406s

This is on a VM, with 4 vCPU's 8GB of ram, using a an XIV SAN.

Regards,

Tris

pastebin is your friend.

Monthly I do one of these "/usr/bin/db-control shrink-segments"

I also altered my database files so that they auto-grow. I was amazed that by default that is turned off and that there is absolutely no script paying attention to embedded database health; tablespace usage in particular. I filed a couple of BugZillas to that effect.

Ok, all the documents are now on pastbin. You can find them here:

http://pastebin.com/bUf99NHU - hotbackup.sh
http://pastebin.com/R9Th5e4H - prep.sh
http://pastebin.com/MmdUdKRe - RESTORE.txt
http://pastebin.com/PF4UTJrg - README.txt

Tris

Hello everyone, great topic! Thanks for sharing your experiences and feedback.

Regarding hot database backups, please note that this is currently not officially supported with embedded database Satellites (nor is archivelogging), and the script above is likely a custom solution developed for specific customer.  However, I'm happy to share the news that we (Red Hat) do *plan to provide a solution for online Satellite database backups in the future* due to popular demand.

You may also find the following Knowledgebase solutions helpful about overall Satellite backup:

https://access.redhat.com/knowledge/solutions/11069 "What files and directories are important during migration/backup of my Red Hat Network (RHN) Satellite Server?"

https://access.redhat.com/knowledge/solutions/227743 "How to back up and verify RHN Satellite database?"

 

 

 

This script still uses the service start/stop as per Red Hat support but also grabs important Satellite files not in the database.  I've successfully restored from the backups as well :)

#!/bin/bash
DATE=`date +%F`
PDIR=/var/satellite/backup
DIR=$PDIR/$DATE
SERVICE=/usr/sbin/rhn-satellite
OLDEST=`ls -tr $PDIR | head -1`

FILES="/etc/sudoers
/etc/tnsnames.ora
/etc/sysconfig/rhn/
/etc/rhn/
/root/.gnupg/
/root/ssl-build/
/tftpboot/
/var/lib/cobbler/
/var/lib/nocpulse/
/var/lib/rhn/kickstarts/
/var/www/html/pub/
/var/www/cobbler"

su - oracle -c "mkdir $DIR"

$SERVICE stop
tar -cjf $DIR/$DATE-satfiles.tar.bz2 $FILES
su - oracle -c "db-control backup $DIR"
su - oracle -c "db-control verify $DIR"
$SERVICE start

rm -r $PDIR/$OLDEST

This provides a very simple rotation and cleanup in a target disk location.

Here's mine. It does a cold db backup and also grabs rendered versions of the KS profiles so we have a history of what they used to look like (since there is no revision history for profiles).  It maintains a rolling window of the last three backups.

 

#!/bin/bash
#
# This script manages the automated backups of the Satellite DB and
# the export of the rendered KS profiles from cobbler.  The cold DB
# backup is to recover satellite, and the KS profiles are so that
# we can have an historical view of the profiles over time.
#

DATE=$(date +%Y%m%d_%H%M%S)
DBDIR=/san-storage/sat-backups/db
KSDIR=/san-storage/sat-backups/ks
COBBLER=/usr/bin/cobbler
RHNSAT=/usr/sbin/rhn-satellite

echo "---------------------------------------------------"
echo "Backup started at $(date "+%Y-%m-%d %H:%M:%S")"
echo
su - oracle -c "db-control report"

# Satellite DB cold backup
$RHNSAT stop
if [ "$?" -eq "0" ]
then
    echo "Successfully stopped satellite, starting DB backup"
    echo "Creating $DBDIR/$DATE"
    echo
    su - oracle -c "mkdir $DBDIR/$DATE"
    if [ "$?" -eq "0" ]
    then
        su - oracle -c "db-control backup $DBDIR/$DATE"
        if [ "$?" -eq "0" ]
        then
            echo "DB Backup completed.  Restarting Satellite"
        else
            echo "There was a problem running db-control. Aborting and restarting Satellite"
        fi
    else
        echo "There was a problem creating the backup directory. Aborting and restarting Satellite"
    fi
else
    echo "There was a problem shutting down Satellite. Trying to restart Satellite now"
fi
$RHNSAT start


# Oracle embedded DB cold backup file cleanup (these get big fast)
cd $DBDIR
DIRS=($(ls -rt | tr '\n' ' '))
NUM_OF_DIRS=${#DIRS[*]}
if [ "$NUM_OF_DIRS" -gt "3" ]
then
    echo "Cleaning up old DB backups.  We keep the 3 most recent"
    echo "by creation date, NOT the directory name."
    echo
    for (( i=0; i<$((${NUM_OF_DIRS} - 3)); i++))
    do
        echo "Removing ${DIRS[${i}]}"
        rm -rf ${DIRS[${i}]}
    done
else
    echo "There's nothing to remove right now."
    echo
fi


# Kickstart Profile backups
if [ ! -d "${KSDIR}/${DATE}" ]
then
    mkdir ${KSDIR}/${DATE}
    if [ "$?" -eq "0" ]
    then
        cd ${KSDIR}/${DATE}
        if [ "$?" -eq "0" ]
        then
            echo "Collecting KS profile data $(date "+%Y-%m-%d %H:%M:%S")"
            echo
            for i in $(${COBBLER} profile list)
            do
                ${COBBLER} profile getks --name=$i > $i
            done
        else
            echo "There was a problem changing to ${KSDIR}/${DATE}"
        fi
    else
        echo "There was a problem creating ${KSDIR}/${DATE}"
    fi
else
    # This should never happen
    echo ${KSDIR}/${DATE} already exists, something is wrong.
fi

echo
echo "Backup completed at $(date "+%Y-%m-%d %H:%M:%S")"
echo

Additionally, these are the rsyncs I used as backups for a system migration to new server hardware. I'm planning on modifying this to create a backup of all the config files listed in the "what files are important" KB

rsync -avzi /home/ root@newsatserver:/home/
rsync -avzi --delete /etc/dhcpd.conf root@newsatserver:/etc/dhcpd.conf
rsync -avzi --delete /etc/sysconfig/rhn/ root@newsatserver:/etc/sysconfig/rhn/
rsync -avzi --delete /etc/rhn/ root@newsatserver:/etc/rhn/
rsync -avzi --delete /etc/tnsnames.ora root@newsatserver:/etc/tnsnames.ora
rsync -avzi --delete /etc/cobbler/ root@newsatserver:/etc/cobbler/
rsync -avzi --delete /etc/httpd/ root@newsatserver:/etc/httpd/
rsync -avzi --delete /etc/tomcat5/ root@newsatserver:/etc/tomcat5/
rsync -avzi --delete /etc/jabberd/ root@newsatserver:/etc/jabberd/
rsync -avzi --delete /etc/sudoers root@newsatserver:/etc/sudoers
rsync -avzi --delete /var/www/ root@newsatserver:/var/www/
rsync -avzi --delete /root/ssl-build/ root@newsatserver:/root/ssl-build/
rsync -avzi --delete /root/.gnupg/ root@newsatserver:/root/.gnupg/
rsync -avzi --delete /var/satellite/ root@newsatserver:/var/satellite/
rsync -avzi --delete /opt/apps/oracle/config/10.2.0/ root@newsatserver:/opt/apps/oracle/config/10.2.0/
rsync -avzi --delete /nocpulse/ root@newsatserver:/nocpulse/
rsync -avzi --delete /etc/nocpulse/ root@newsatserver:/etc/nocpulse/
rsync -avzi --delete /tftpboot/ root@newsatserver:/tftpboot/
rsync -avzi --delete /var/lib/cobbler/ root@newsatserver:/var/lib/cobbler/
rsync -avzi --delete /var/lib/rhn/kickstarts/ root@newsatserver:/var/lib/rhn/kickstarts/
rsync -avzi --delete /var/lib/nocpulse/ root@newsatserver:/var/lib/rhn/nocpulse/

Here's our backup script, stops satellite, does the backup, removes old backups, starts satellite.  We run this from cron daily and it works good. Never had a problem and have tested restore.

 

#!/bin/bash
# Author - KWC 9/19/2012
# Purpose
# Stops all Satellite services and performs a cold backup of imbedded Oracle DB                         rhnsat
# Removes old backups
# Starts all Satellite services
# Notifies people in mailto list in the event of failure

hostname=`hostname`
logdir=/logs/techserv
backupdir=/software/backups
d=db-backup-$(date "+%F")
log=$logdir/$d.log
backuplist="/tmp/backup_db_clean.lst.tmp"
backupkeep="2"
mailto="myemail@mycompany.com"

/bin/date > $log

# Stop Satellite server
/usr/sbin/rhn-satellite stop >> $log 2>&1

/bin/ps -ef |grep rhn |grep -v rhnsd  |grep -v grep
if [ $? -eq 1 ];then

# Backup imbedded Oracle DB
su - oracle -c 'logdir=/logs/techserv;
                backupdir=/software/backups;
                d=db-backup-$(date "+%F");
                log=$logdir/$d.log;
                mkdir -p $backupdir/$d;
                db-control backup $backupdir/$d;
                db-control verify $backupdir/$d' >> $log 2>&1

else
   errtxt="RHN Satellite did not shut down completely, backup aborted"
   echo $errtxt >> $log
   echo $errtxt | mailx -s "RHNSAT DB Backup failed on $hostname" $mailto
   exit 1
fi

#check backup
backupfiles=`ls $backupdir/$d/*.gz |wc -l`
verfiles=`grep verified $log |wc -l`

if [ $backupfiles -ne $verfiles ];then
   cat $log | mailx -s "RHNSAT DB Backup failed on $hostname" $mailto
else
   echo "Files backed up = $backupfiles, Files verified = $verfiles, all is well                        " >> $log
fi

#Clean up old backup directories

if [ -d $backupdir ]
   then
      ls -ldt ${backupdir}/db-backup-[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]                         |awk '{print $9}'>$backuplist
      count=`wc -l $backuplist | awk '{print $1}'`
      if [ $count -gt $backupkeep ]
      then
         let rmcount=${count}-${backupkeep}
         echo "\nRemoving $rmcount old backup directories:" >> $log 2>&1
         for i in `tail -$rmcount $backuplist`
         do
            echo "$i" >>$log 2>&1
            rm -rf $i >>$log 2>&1
         done
      fi
else
   echo "\nCould not find $backupdir\n"  >>$logfile 2>&1
fi

if [ -f $backuplist ]
   then
      rm $backuplist
fi

# Start Satellite server
/usr/sbin/rhn-satellite start >> $log 2>&1
 

Lots of helpful scripts here, keep 'em coming!