RHN Satellite Backup strategies?
A previous employee implemented a bash script which stops the satellite, performs a database backup, starts the satellite, and runs statistics. This script has various drawbacks, but rather than go into detail I thought I would check with the community: what are your backup strategies? How often do you run database backups? Do you have scripts you are willing/able to share?
Thanks!
Responses
It's not exactly complicated. It's right there in the manual.
/usr/sbin/rhn-satellite stop
su oracle -c "/usr/bin/db-control backup <backup target dir>"
/usr/sbin/rhn-satellite start
I fire the cron job every 3 days.
Personally I would put the DB in Archive-Log mode and do a live backup or even an Export from time to time. But you should have that conversation with your DBA.
We've been doing "those three steps" daily -- with a little more overhead -- for years and have had very few failures. Two as I recall. Those weren't script failures, but bad copies of the database. The script itself has never failed.
While I don't think it should result in a failure, I have changed from the su syntax to
/sbin/runuser oracle -c "db-control backup ${BackupDir}/$DayOfRun" # see next para for $DayOfRun explanation
Rotation: I back up to disk to directories with day names (Sunday, Monday, ...). I think this is overkill for ready-access backups, but I've got the disk space so am not concerned. I rsync this to a host in a different city, plus a monthly tape goes off site; so I think I am well covered for DR.
Each Sunday a cron restores the database rsync'd to the remote location. So I have a reasonably current check on whether I will be able to restore from backup. Over the past 5 years one of those restores has failed.
If you are pre-5.4 (going on recall) you may want to insert some code to rotate listener logs on occasion (/opt/apps/oracle/web/product/10.2.0/db_1/network/log/). Same for /var/log/rhn/rhn_web_api.log (again, pre-5.4) if you have that feature turned on.
My favorite pass-time: BugZilla https://bugzilla.redhat.com/show_bug.cgi?id=914863
Hi Cristóbal,
When we had the satellite server first setup the Red Hat guy who did it gave us this:
#!/bin/sh # This script performs a hot backup of an embedded RHN Oracle database. It takes # a single argument which is either 'full' or 'incremental' which determines # what kind of backup will be performed. An incremental backup is incremental to # the most recent full backup. # The script cleans up the catalog in the db control file if files are deleted # from the disk outside of rman. # The script also deletes obsolete archive logs and backups. The retention # policy for configuring this was set in the prep script. # Matthew Booth <mbooth@redhat.com> 22/09/2006 function errormsg() { echo $* > /dev/stderr } # Make sure we're root # Could just be Oracle, but root is simpler for administrators if [ `id -u` -ne 0 ]; then errormsg This script must be executed as root exit 1 fi # Check whether this is a full or incremental backup case $1 in full) inc_level=0 ;; incremental) inc_level=1 ;; *) errormsg Usage: $0 '(full|incremental)' exit 1 ;; esac # Check the rhn database is running #if ! /sbin/service rhn-database status > /dev/null; then if ! service oracle status|grep rhnsat > /dev/null; then errormsg The rhn-database service must be running to perform a hot backup exit 1 fi # Become the oracle user su - oracle -c "/bin/bash -s $1 $inc_level" <<-'ORACLE' >> /var/log/rhn/db_backup.log export ORACLE_SID=rhnsat inc_label=$1 inc_level=$2 echo =================================================================== echo Starting $inc_label backup at `date` echo =================================================================== # Check for archivelogs and backups which were manually deleted from disk # Remove manually deleted archivelogs and backups from the catalog # Perform an rman backup # Remove backups and archivelogs which are not required by the retention # policy rman target / nocatalog <<-EOF crosscheck archivelog all; crosscheck backup; delete noprompt expired archivelog all; delete noprompt expired backup; run { allocate channel d1 device type disk format '/rhnsat/backup/data/%U'; backup incremental level = $inc_level database include current controlfile plus archivelog delete all input; } delete noprompt obsolete; EOF # Verify that the backup is valid nvalid=`rman target / nocatalog <<-EOF | grep "^Finished restore" | wc -l restore database validate; restore controlfile to '/tmp/rhndb-cf' validate; restore spfile to '/tmp/rhndb-spfile' validate; EOF ` # If we couldn't validate database, controlfile and spfile, write an error # message to stderr if [ $nvalid -ne 3 ]; then echo "DATABASE BACKUP IS INVALID!" echo exit 1 else echo "Backup is valid" echo fi ORACLE if [ $? -ne 0 ]; then errormsg An error occurred while backing up the database. Check /var/log/rhn/db_backup.log for details. exit 1 fi # vim: ts=4 sw=4 smartindent noexpandtab
This can do a hot backup with out stopping the DB, it appears to work fine for us, and one the rare occasions we have needed to, backups restore ok. We call it in a cron job like this:
0 19 * * 0 root /opt/tivoli/tsm/client/hot-backup.sh full 0 19 * * 1-6 root /opt/tivoli/tsm/client/hot-backup.sh incremental
Best regards,
Tris
Hi Cristóbal,
I'd totally forgoten about the prep script! I've had a look through my documentation and found the tar file we were given. I'm not sure where I can host a copy of it, as it also covers restores. This was from before Oracle 10, but it is easy to work out how to make it relevent to the current versions of Satellite.
With regard to server performance, we have not done any quantative testing, but I have not noticed it to be any slower for things like Yum, and looking at our satellite-sync logs the impact is not huge. (10-20 minutes to sync vs 5-10 munites the rest of the time)
This is our DB size:
-bash-3.2$ db-control report Tablespace Size Used Avail Use% DATA_TBS 13.6G 11.9G 1.6G 88% SYSAUX 500M 204M 295.9M 41% SYSTEM 400M 255.5M 144.4M 64% TEMP_TBS 1000M 0B 1000M 0% UNDO_TBS 1000M 481.2M 518.7M 48% USERS 128M 64K 127.9M 0%
The backups are very quick, as an example:
time /opt/tivoli/tsm/client/hot-backup.sh full real 5m9.406s
This is on a VM, with 4 vCPU's 8GB of ram, using a an XIV SAN.
Regards,
Tris
pastebin is your friend.
Monthly I do one of these "/usr/bin/db-control shrink-segments"
I also altered my database files so that they auto-grow. I was amazed that by default that is turned off and that there is absolutely no script paying attention to embedded database health; tablespace usage in particular. I filed a couple of BugZillas to that effect.
Ok, all the documents are now on pastbin. You can find them here:
http://pastebin.com/bUf99NHU - hotbackup.sh
http://pastebin.com/R9Th5e4H - prep.sh
http://pastebin.com/MmdUdKRe - RESTORE.txt
http://pastebin.com/PF4UTJrg - README.txt
Tris
Hello everyone, great topic! Thanks for sharing your experiences and feedback.
Regarding hot database backups, please note that this is currently not officially supported with embedded database Satellites (nor is archivelogging), and the script above is likely a custom solution developed for specific customer. However, I'm happy to share the news that we (Red Hat) do *plan to provide a solution for online Satellite database backups in the future* due to popular demand.
You may also find the following Knowledgebase solutions helpful about overall Satellite backup:
https://access.redhat.com/knowledge/solutions/11069 "What files and directories are important during migration/backup of my Red Hat Network (RHN) Satellite Server?"
https://access.redhat.com/knowledge/solutions/227743 "How to back up and verify RHN Satellite database?"
This script still uses the service start/stop as per Red Hat support but also grabs important Satellite files not in the database. I've successfully restored from the backups as well :)
#!/bin/bash DATE=`date +%F` PDIR=/var/satellite/backup DIR=$PDIR/$DATE SERVICE=/usr/sbin/rhn-satellite OLDEST=`ls -tr $PDIR | head -1` FILES="/etc/sudoers /etc/tnsnames.ora /etc/sysconfig/rhn/ /etc/rhn/ /root/.gnupg/ /root/ssl-build/ /tftpboot/ /var/lib/cobbler/ /var/lib/nocpulse/ /var/lib/rhn/kickstarts/ /var/www/html/pub/ /var/www/cobbler" su - oracle -c "mkdir $DIR" $SERVICE stop tar -cjf $DIR/$DATE-satfiles.tar.bz2 $FILES su - oracle -c "db-control backup $DIR" su - oracle -c "db-control verify $DIR" $SERVICE start rm -r $PDIR/$OLDEST
This provides a very simple rotation and cleanup in a target disk location.
Here's mine. It does a cold db backup and also grabs rendered versions of the KS profiles so we have a history of what they used to look like (since there is no revision history for profiles). It maintains a rolling window of the last three backups.
#!/bin/bash # # This script manages the automated backups of the Satellite DB and # the export of the rendered KS profiles from cobbler. The cold DB # backup is to recover satellite, and the KS profiles are so that # we can have an historical view of the profiles over time. # DATE=$(date +%Y%m%d_%H%M%S) DBDIR=/san-storage/sat-backups/db KSDIR=/san-storage/sat-backups/ks COBBLER=/usr/bin/cobbler RHNSAT=/usr/sbin/rhn-satellite echo "---------------------------------------------------" echo "Backup started at $(date "+%Y-%m-%d %H:%M:%S")" echo su - oracle -c "db-control report" # Satellite DB cold backup $RHNSAT stop if [ "$?" -eq "0" ] then echo "Successfully stopped satellite, starting DB backup" echo "Creating $DBDIR/$DATE" echo su - oracle -c "mkdir $DBDIR/$DATE" if [ "$?" -eq "0" ] then su - oracle -c "db-control backup $DBDIR/$DATE" if [ "$?" -eq "0" ] then echo "DB Backup completed. Restarting Satellite" else echo "There was a problem running db-control. Aborting and restarting Satellite" fi else echo "There was a problem creating the backup directory. Aborting and restarting Satellite" fi else echo "There was a problem shutting down Satellite. Trying to restart Satellite now" fi $RHNSAT start # Oracle embedded DB cold backup file cleanup (these get big fast) cd $DBDIR DIRS=($(ls -rt | tr '\n' ' ')) NUM_OF_DIRS=${#DIRS[*]} if [ "$NUM_OF_DIRS" -gt "3" ] then echo "Cleaning up old DB backups. We keep the 3 most recent" echo "by creation date, NOT the directory name." echo for (( i=0; i<$((${NUM_OF_DIRS} - 3)); i++)) do echo "Removing ${DIRS[${i}]}" rm -rf ${DIRS[${i}]} done else echo "There's nothing to remove right now." echo fi # Kickstart Profile backups if [ ! -d "${KSDIR}/${DATE}" ] then mkdir ${KSDIR}/${DATE} if [ "$?" -eq "0" ] then cd ${KSDIR}/${DATE} if [ "$?" -eq "0" ] then echo "Collecting KS profile data $(date "+%Y-%m-%d %H:%M:%S")" echo for i in $(${COBBLER} profile list) do ${COBBLER} profile getks --name=$i > $i done else echo "There was a problem changing to ${KSDIR}/${DATE}" fi else echo "There was a problem creating ${KSDIR}/${DATE}" fi else # This should never happen echo ${KSDIR}/${DATE} already exists, something is wrong. fi echo echo "Backup completed at $(date "+%Y-%m-%d %H:%M:%S")" echo
Additionally, these are the rsyncs I used as backups for a system migration to new server hardware. I'm planning on modifying this to create a backup of all the config files listed in the "what files are important" KB
rsync -avzi /home/ root@newsatserver:/home/ rsync -avzi --delete /etc/dhcpd.conf root@newsatserver:/etc/dhcpd.conf rsync -avzi --delete /etc/sysconfig/rhn/ root@newsatserver:/etc/sysconfig/rhn/ rsync -avzi --delete /etc/rhn/ root@newsatserver:/etc/rhn/ rsync -avzi --delete /etc/tnsnames.ora root@newsatserver:/etc/tnsnames.ora rsync -avzi --delete /etc/cobbler/ root@newsatserver:/etc/cobbler/ rsync -avzi --delete /etc/httpd/ root@newsatserver:/etc/httpd/ rsync -avzi --delete /etc/tomcat5/ root@newsatserver:/etc/tomcat5/ rsync -avzi --delete /etc/jabberd/ root@newsatserver:/etc/jabberd/ rsync -avzi --delete /etc/sudoers root@newsatserver:/etc/sudoers rsync -avzi --delete /var/www/ root@newsatserver:/var/www/ rsync -avzi --delete /root/ssl-build/ root@newsatserver:/root/ssl-build/ rsync -avzi --delete /root/.gnupg/ root@newsatserver:/root/.gnupg/ rsync -avzi --delete /var/satellite/ root@newsatserver:/var/satellite/ rsync -avzi --delete /opt/apps/oracle/config/10.2.0/ root@newsatserver:/opt/apps/oracle/config/10.2.0/ rsync -avzi --delete /nocpulse/ root@newsatserver:/nocpulse/ rsync -avzi --delete /etc/nocpulse/ root@newsatserver:/etc/nocpulse/ rsync -avzi --delete /tftpboot/ root@newsatserver:/tftpboot/ rsync -avzi --delete /var/lib/cobbler/ root@newsatserver:/var/lib/cobbler/ rsync -avzi --delete /var/lib/rhn/kickstarts/ root@newsatserver:/var/lib/rhn/kickstarts/ rsync -avzi --delete /var/lib/nocpulse/ root@newsatserver:/var/lib/rhn/nocpulse/
Here's our backup script, stops satellite, does the backup, removes old backups, starts satellite. We run this from cron daily and it works good. Never had a problem and have tested restore.
#!/bin/bash
# Author - KWC 9/19/2012
# Purpose
# Stops all Satellite services and performs a cold backup of imbedded Oracle DB rhnsat
# Removes old backups
# Starts all Satellite services
# Notifies people in mailto list in the event of failure
hostname=`hostname`
logdir=/logs/techserv
backupdir=/software/backups
d=db-backup-$(date "+%F")
log=$logdir/$d.log
backuplist="/tmp/backup_db_clean.lst.tmp"
backupkeep="2"
mailto="myemail@mycompany.com"
/bin/date > $log
# Stop Satellite server
/usr/sbin/rhn-satellite stop >> $log 2>&1
/bin/ps -ef |grep rhn |grep -v rhnsd |grep -v grep
if [ $? -eq 1 ];then
# Backup imbedded Oracle DB
su - oracle -c 'logdir=/logs/techserv;
backupdir=/software/backups;
d=db-backup-$(date "+%F");
log=$logdir/$d.log;
mkdir -p $backupdir/$d;
db-control backup $backupdir/$d;
db-control verify $backupdir/$d' >> $log 2>&1
else
errtxt="RHN Satellite did not shut down completely, backup aborted"
echo $errtxt >> $log
echo $errtxt | mailx -s "RHNSAT DB Backup failed on $hostname" $mailto
exit 1
fi
#check backup
backupfiles=`ls $backupdir/$d/*.gz |wc -l`
verfiles=`grep verified $log |wc -l`
if [ $backupfiles -ne $verfiles ];then
cat $log | mailx -s "RHNSAT DB Backup failed on $hostname" $mailto
else
echo "Files backed up = $backupfiles, Files verified = $verfiles, all is well " >> $log
fi
#Clean up old backup directories
if [ -d $backupdir ]
then
ls -ldt ${backupdir}/db-backup-[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] |awk '{print $9}'>$backuplist
count=`wc -l $backuplist | awk '{print $1}'`
if [ $count -gt $backupkeep ]
then
let rmcount=${count}-${backupkeep}
echo "\nRemoving $rmcount old backup directories:" >> $log 2>&1
for i in `tail -$rmcount $backuplist`
do
echo "$i" >>$log 2>&1
rm -rf $i >>$log 2>&1
done
fi
else
echo "\nCould not find $backupdir\n" >>$logfile 2>&1
fi
if [ -f $backuplist ]
then
rm $backuplist
fi
# Start Satellite server
/usr/sbin/rhn-satellite start >> $log 2>&1