How much disk space do I need for reposync?

Latest response

Hello there,

I would like to create a simple mirror using reposync and createrepo. With these tools I like to sync the following repos which are included in a standard subscription:

  • Red Hat Software Collections (for RHEL Server)
  • Oracle Java (for RHEL Server)
  • Red Hat Enterprise Linux Server
  • Red Hat Developer Toolset (for RHEL Server)

Does someone know how much disk space is required to sync the repos mentioned above? Please tell me, if you know.

Best regards,
Joerg

Responses

Hello, it is not possible to give exact sizes as packages have dependencies and new package version are released in Errata from time to time. So the repo size keeps growing. The Satellite 6 Installation Guide gives some guidelines in the Storage Requirements and Recommendations section.

I have never used the Oracle Java repo, only repos for testing Satellite Server with ed Hat Enterprise Linux 7 Server, but I suggest you allocate 50 GB if you can, and then do a test to see the size. What ever the final size is remember to leave room for growth. Using LVM storage is best for coping with the growth.

I have set up an host for testing with a 80 GB partition for the packages/repos. The first run of reposync with the parameter -n (download only newest packages) results in:

  • 180 MB for rhel-7-server-extras-rpms
  • 4,3 GB for rhel-7-server-optional-rpms
  • 3,9 GB for rhel-7-server-rpms
  • 451 MB for rhel-7-server-supplementary-rpms
  • 571 MB for rhel-7-server-thirdparty-oracle-java-rpms
  • 3,6 GB for rhel-7-server-rhscl-rpms

After that I run reposync a second time without -n. Now the disk space usage looks like:

  • 1,1 GB for rhel-7-server-extras-rpms
  • 17 GB for rhel-7-server-optional-rpms
  • 18 GB for rhel-7-server-rpms
  • 4,3 GB for rhel-7-server-supplementary-rpms
  • 5,5 GB for rhel-7-server-thirdparty-oracle-java-rpms
  • 6,8 GB for rhel-7-server-rhscl-rpms

So anyone could get an idea of the disk space you need to start, today. Of course Stephen is right, you should keep in mind that the repos are growing over time.

Hello,
I've returned my tests today and would like to share the results here. The first run of reposync with the parameter -n (download only newest packages) results in:

  • 256M rhel-7-server-extras-rpms
  • 6,4G rhel-7-server-optional-rpms
  • 4,5G rhel-7-server-rpms
  • 506M rhel-7-server-supplementary-rpms
  • 4,9G rhel-server-rhscl-7-rpms

After that I run reposync a second time without -n. Now the disk space usage looks like:

  • 2,6G rhel-7-server-extras-rpms
  • 35G rhel-7-server-optional-rpms
  • 34G rhel-7-server-rpms
  • 7,5G rhel-7-server-supplementary-rpms
  • 11G rhel-server-rhscl-7-rpms

So, my 80 GB partition from two years ago would be too small for that. :-)

I wanted to piggyback off Jorg's comment above with an updated size. I just created new local repos and the space below is what was used (with no compression of any kind, please note). Also please note - I only download 3 of the 5 mentioned above, so I do not have a size that includes those other two.

reposync --gpgcheck -l --repoid=rhel-7-server-optional-rpms --download_path=[mypath] --downloadcomps --download-metadata

  • 231M ///repodata
  • 2.9G ///rhel-7-server-extras-rpms
  • 46G ///rhel-7-server-optional-rpms
  • 39G ///rhel-7-server-rpms
  • 88G total

(Anyone landing here reading this specific discussion who downloads multiple repositories)

One important thing to remember if you have multiple repositories that happen to be on the same file system... In my case, I do a content-view export courtesy of Rich Jerrido (thanks Rich, I've been relying on your good article for some years now). Again, in my case, the content view export I do results in about 1.8-ish TB of rpms. I do a hardlink -cv /path/to/Default-Content-View-Export_using_the_actual_name which does a deduplication of duplication rpms by hard linking them. The result is going from 1.8TB to 313GB. Then I have to do a rsync -Hau --progress $source $target - and the -H will retain hard links during the rsync.

I do this from my public facing satellite. I take the resultant content view that has been deduplicated of duplicated rpms by hard-linking to my collection of disconnected satellites.

If you have multiple repositories on the same file system, this might be useful to you. Those repositories do have a lot of rpm duplications, enough in my case to take it from 1.8TB to 313GB or so. Your actual mileage may vary since you may not be taking down an entire Content View export such as I'm doing. It may help though if you take more than one repository down.

Regards

RJ

Merry Christmas,
Today I ran reposync for the RHEL 8 repos downloading only the newest packages with the following result:

6.8G  rhel-8-for-x86_64-appstream-rpms
1.4G  rhel-8-for-x86_64-baseos-rpms

Besides that I would like to draw your attention to the RHEL 8 version of the Poor Man's RHEL Mirror hosted on GitHub.com. Information about what it does and for what it could be used you will find in the README.md.

Please feel free to use it and adapt it to your own needs. Feedback is welcome.

Best regards,
Joerg

All the details above refer to version 7. Any idea how much disk space is required to a mirror for version 8.x?

Hi Edward,

What Jörg provided in his output/post from 2019-12-25 above IS the reposync for RHEL 8 ! :)

Regards,
Christian

I ran a reposync just for centos and reached 50GB before I ran out of space. I believe Jörg's summary referes to just "newest packages".

Yes Edward - Jörg said : "Today I ran reposync for the RHEL 8 repos downloading only the newest packages." ... :)
Nothing has changed here, what Stephen said is still valid : "It is not possible to give exact sizes as packages have
dependencies and new package version are released in Errata from time to time. So the repo size keeps growing."

Regards,
Christian

Understood - but grosso modo? 100GB, 1TB? I need to define disk space and don't want it all to fail at 95%.

I'd say 100 GB should be sufficient, Edward - but it's pure guessing. Depends on which repos you wanna sync. :)

Regards,
Christian

I need to make a full repository for an customer who is air gapped and it will be used by foreman to deploy bare metal and VMs. So I need the large number. If you say 100GB so 200GB should be enough then (I hope).

Hi Edward,

I'm not able to give you a number here, because I don't mirror all packages. But it would be nice when you update this topic after you have figured out how much disk space you've needed.

Regards, Jörg