How much disk space do I need for reposync?

Latest response

Hello there,

I would like to create a simple mirror using reposync and createrepo. With these tools I like to sync the following repos which are included in a standard subscription:

  • Red Hat Software Collections (for RHEL Server)
  • Oracle Java (for RHEL Server)
  • Red Hat Enterprise Linux Server
  • Red Hat Developer Toolset (for RHEL Server)

Does someone know how much disk space is required to sync the repos mentioned above? Please tell me, if you know.

Best regards,
Joerg

Responses

Hello, it is not possible to give exact sizes as packages have dependencies and new package version are released in Errata from time to time. So the repo size keeps growing. The Satellite 6 Installation Guide gives some guidelines in the Storage Requirements and Recommendations section.

I have never used the Oracle Java repo, only repos for testing Satellite Server with ed Hat Enterprise Linux 7 Server, but I suggest you allocate 50 GB if you can, and then do a test to see the size. What ever the final size is remember to leave room for growth. Using LVM storage is best for coping with the growth.

I have set up an host for testing with a 80 GB partition for the packages/repos. The first run of reposync with the parameter -n (download only newest packages) results in:

  • 180 MB for rhel-7-server-extras-rpms
  • 4,3 GB for rhel-7-server-optional-rpms
  • 3,9 GB for rhel-7-server-rpms
  • 451 MB for rhel-7-server-supplementary-rpms
  • 571 MB for rhel-7-server-thirdparty-oracle-java-rpms
  • 3,6 GB for rhel-7-server-rhscl-rpms

After that I run reposync a second time without -n. Now the disk space usage looks like:

  • 1,1 GB for rhel-7-server-extras-rpms
  • 17 GB for rhel-7-server-optional-rpms
  • 18 GB for rhel-7-server-rpms
  • 4,3 GB for rhel-7-server-supplementary-rpms
  • 5,5 GB for rhel-7-server-thirdparty-oracle-java-rpms
  • 6,8 GB for rhel-7-server-rhscl-rpms

So anyone could get an idea of the disk space you need to start, today. Of course Stephen is right, you should keep in mind that the repos are growing over time.

Hello,
I've returned my tests today and would like to share the results here. The first run of reposync with the parameter -n (download only newest packages) results in:

  • 256M rhel-7-server-extras-rpms
  • 6,4G rhel-7-server-optional-rpms
  • 4,5G rhel-7-server-rpms
  • 506M rhel-7-server-supplementary-rpms
  • 4,9G rhel-server-rhscl-7-rpms

After that I run reposync a second time without -n. Now the disk space usage looks like:

  • 2,6G rhel-7-server-extras-rpms
  • 35G rhel-7-server-optional-rpms
  • 34G rhel-7-server-rpms
  • 7,5G rhel-7-server-supplementary-rpms
  • 11G rhel-server-rhscl-7-rpms

So, my 80 GB partition from two years ago would be too small for that. :-)

I wanted to piggyback off Jorg's comment above with an updated size. I just created new local repos and the space below is what was used (with no compression of any kind, please note). Also please note - I only download 3 of the 5 mentioned above, so I do not have a size that includes those other two.

reposync --gpgcheck -l --repoid=rhel-7-server-optional-rpms --download_path=[mypath] --downloadcomps --download-metadata

  • 231M ///repodata
  • 2.9G ///rhel-7-server-extras-rpms
  • 46G ///rhel-7-server-optional-rpms
  • 39G ///rhel-7-server-rpms
  • 88G total

(Anyone landing here reading this specific discussion who downloads multiple repositories)

One important thing to remember if you have multiple repositories that happen to be on the same file system... In my case, I do a content-view export courtesy of Rich Jerrido (thanks Rich, I've been relying on your good article for some years now). Again, in my case, the content view export I do results in about 1.8-ish TB of rpms. I do a hardlink -cv /path/to/Default-Content-View-Export_using_the_actual_name which does a deduplication of duplication rpms by hard linking them. The result is going from 1.8TB to 313GB. Then I have to do a rsync -Hau --progress $source $target - and the -H will retain hard links during the rsync.

I do this from my public facing satellite. I take the resultant content view that has been deduplicated of duplicated rpms by hard-linking to my collection of disconnected satellites.

If you have multiple repositories on the same file system, this might be useful to you. Those repositories do have a lot of rpm duplications, enough in my case to take it from 1.8TB to 313GB or so. Your actual mileage may vary since you may not be taking down an entire Content View export such as I'm doing. It may help though if you take more than one repository down.

Regards

RJ