Speeding up Red Hat Satellite 6.1 backup by using LVM snapshots

Latest response

Hi everyone,

katello-backup script is recommended backup procedure for the satellite 6.x requires the satellite service to go off-line before creating file level backup of relevant directories :

/etc/katello
/etc/elasticsearch
/etc/candlepin
/etc/pulp
/etc/grinder
/etc/tomcat
/etc/pki/katello
/etc/pki/pulp
/etc/qpidd.conf
/etc/sysconfig/katello
/etc/sysconfig/elasticsearch
/root/ssl-build
/var/www/html/pub/*
/var/lib/katello
/usr/share/katello/candlepin-cert.crt
/var/lib/mongodb
/var/lib/pgsql/data/
/var/lib/elasticsearch
/var/lib/pulp
/var/www/pub

With /var/lib/pulp being large that 1/2 of Terabyte, this methodology will require a downtime that is rather undesirable.

I am wondering if the outage can be reduced by first snapshoting and then creating a backup of the file system snapshots instead. Consider following steps:

  1. Stop the application
  2. Snapshot file system containing directories which contain file which in turn store an application state e.g.
    /var/www/html/pub/*
    /var/lib/pgsql
    /var/lib/mongodb
    /var/lib/pulp
    /var/lib/katello ?
    /var/www/pub ?
    3 Start The Satellite services
  3. Run system backup using standard backup tool ( we user Networker )
  4. Merge snapshots

Any comments and/or suggestions are greatly appreciated.

Ivan.

Responses

Ivan, See if you think the following is an advancement. I really like what you are proposing and we can modify the RH provided script to automatically determine the LVs to snap. We need a parameter to set the new top level mount, eg. /mnt/satelliteSnap_var Ref: http://tldp.org/HOWTO/LVM-HOWTO/snapshotintro.html

  1. Stop Satellite 6
  2. Create read-only Snapshot devices for each file system device containing the following directories: /var/www/html/pub/* /var/lib/pgsql /var/lib/mongodb /var/lib/pulp /var/lib/katello ? /var/www/pub ?
  3. Start Satellite 6
  4. Run standard backup tool over the read-only snapshot file systems.
  5. Unmount the snapshot file systems
  6. Remove the snapshot devices

I am advised that /var/lib/pulp/content, which holds all the packages/rpms, can be removed from the regular backup. Imagine a weekly backup that does all and a daily backup that excludes "/var/lib/pulp/content". I believe what I am being told is that "/var/lib/pulp/content" will be "fixed" at the next synchronisation event if you restored from last weeks backup of that directory. (I wonder if that means it is safe to backup /var/lib/pulp/content without shutting down Satellite 6?)

Anyhow it gives options to those not wanting to mess with LVM snapshots while reducing the backup time most of the time.

People are working on verifying that the backup was successful too. Another important aspect.

If I understand this correctly, there are 3 options being considered here to reduce the time taken for the "katello-backup", during which Satellite 6's services are not available.

  1. Use LVM's snapshot feature to take a snapshot of the directories relevant to Satellite 6, then backup the LVM snapshot. Does this mean that there is no downtime for Satellite 6.1, or is there still some downtime?
  2. Exclude the contents of the "/var/lib/pulp/content" directory from the daily backup (as an example), instead backing up this directory once a week. The content of this directory can be recreated since it contains all the RPMs previously obtained from the Red Hat CDN. If they were not backed up, or were out of date following the restoration of Satellite 6, then they would be downloaded as necessary as part of the Satellite synchronisation task. Excluding this content from the backup would be a problem. though, for a disconnected Satellite 6 instance since it can't download the RPMs directly.
  3. Backup the contents of the "/var/lib/pulp/content" directory without having Satellite 6 shut down for the duration.

From what I have read above, these ideas are being proposed because it is the "/var/lib/pulp/content" directory which occupies the largest percentage of Satellite 6 disk space, therefore contributes most to the backup's duration.

Yes, I agree with a lot of what's been said here. If I recall correctly, the vast majority of the backup duration is spent creating a tar archive of /var/lib/pulp . As Ivan says, this directory can be very large. Given the tar file is re-created with every backup run, the script essentially ends up performing a 'full' (level 0) backup every time.

I think it's fair to assume that most/all of us are subsequently using a third-party backup tool to write this data to tape. In this case, I am not entirely clear on why a tar archive of /var/lib/pulp is necessary at all. It's just a regular directory (though admittedly one filled with many hardlinks) and any third-party backup tool will be capable of performing an online, incremental backup of it directly, without needing to worry about tar archives or LVM snapshots or anything else.

The only arguments I can see for creating a tar file as per the current backup script are: 1) As the tar archive is created at the same time as the DB dump, this ensures an atomic, consistent, point-in-time backup is written to tape. Otherwise, deltas may exist between when the DB is dumped and the tape backup of the filesystem is run. 2) tar archives are hard-link aware. I guess it's possible that some third-party backup tools may not be.

Am I missing any other use cases that benefit from the tar file?

Justin, Many backup solutions have the smarts on the servers side that control control when and what to backup so can't be integrated into a scripted backup process on the host being backed up so a stable copy of the content to be backed up has to be created ahead of time. They may truck the data over the hosts data network.

Some virtual machine backup solutions will backup at the storage or hypervisor level with the VM's client just co-ordinating when to quiesce a disk and suspend any processes that attempt to write to that disk while it is being backed up. Again a stable copy of the content to be backed up has to be created ahead of time. This time all the backing up happens on the storage network.

Some backup solutions use a "smart" client that actively tracks "dirty" files that require backing up and will incorporate de-duplication technology so if a file has ever been backed up previously by another server then only the hash is sent of the network. It can also be possible it initiate manual backup on the host of a preset profile. In this case I absolutely agree the tar-ball of the Satellite application data can be avoided if the backup profile traverses the snapshots that are mounted at an alternate location.

I'm using rsync for /var/lib/pulp backup, it's faster than tar -czvf (since rpms are already compressed) and it's faster still on subsequent runs as only new content is transferred. I basically cp'd katello-backup/katello-restore to 'smart-backup/smart-restore', and changed only the lines pertaining to /var/lib/pulp. e.g.

echo "Backing up Pulp data... "
#tar --selinux -cf pulp_data.tar /var/lib/pulp/ /var/www/pub/
rsync -aAXP /var/lib/pulp/ .
rsync -aAXP /var/www/pub/ .
echo "Done."
echo ""

Abe, Good idea to use rsync for /var/lib/pulp and /var/www/pub as that will save a lot of disk writes. Still it must take a long time for rsync to walk the Pulp directory structure and to have Satellite offline for that long is undesirable.

Justin's backup product looks like he can avoid the spooling stage and backup directly to the backup server and save 0.5TB per Satellite/Capsule, if he uses LVM snapshots. In his case that is a few terrabytes of managed enterprise storage.

I see https://github.com/RedHatEMEA/satellite6-backup uses LVM snapshot for Pulp but it does not make the MongoDB backup at exactly the same time. I mean "katello-service stop" has not been performed when MongoDB backup is taken and I believe MongoDB and Pulp need to be backed up as an atomic operation so I will have to quiz the developers.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.