How much disk space do I need for reposync?

Latest response

Hello there,

I would like to create a simple mirror using reposync and createrepo. With these tools I like to sync the following repos which are included in a standard subscription:

  • Red Hat Software Collections (for RHEL Server)
  • Oracle Java (for RHEL Server)
  • Red Hat Enterprise Linux Server
  • Red Hat Developer Toolset (for RHEL Server)

Does someone know how much disk space is required to sync the repos mentioned above? Please tell me, if you know.

Best regards,
Joerg

Responses

Hello, it is not possible to give exact sizes as packages have dependencies and new package version are released in Errata from time to time. So the repo size keeps growing. The Satellite 6 Installation Guide gives some guidelines in the Storage Requirements and Recommendations section.

I have never used the Oracle Java repo, only repos for testing Satellite Server with ed Hat Enterprise Linux 7 Server, but I suggest you allocate 50 GB if you can, and then do a test to see the size. What ever the final size is remember to leave room for growth. Using LVM storage is best for coping with the growth.

I have set up an host for testing with a 80 GB partition for the packages/repos. The first run of reposync with the parameter -n (download only newest packages) results in:

  • 180 MB for rhel-7-server-extras-rpms
  • 4,3 GB for rhel-7-server-optional-rpms
  • 3,9 GB for rhel-7-server-rpms
  • 451 MB for rhel-7-server-supplementary-rpms
  • 571 MB for rhel-7-server-thirdparty-oracle-java-rpms
  • 3,6 GB for rhel-7-server-rhscl-rpms

After that I run reposync a second time without -n. Now the disk space usage looks like:

  • 1,1 GB for rhel-7-server-extras-rpms
  • 17 GB for rhel-7-server-optional-rpms
  • 18 GB for rhel-7-server-rpms
  • 4,3 GB for rhel-7-server-supplementary-rpms
  • 5,5 GB for rhel-7-server-thirdparty-oracle-java-rpms
  • 6,8 GB for rhel-7-server-rhscl-rpms

So anyone could get an idea of the disk space you need to start, today. Of course Stephen is right, you should keep in mind that the repos are growing over time.

Hello,
I've returned my tests today and would like to share the results here. The first run of reposync with the parameter -n (download only newest packages) results in:

  • 256M rhel-7-server-extras-rpms
  • 6,4G rhel-7-server-optional-rpms
  • 4,5G rhel-7-server-rpms
  • 506M rhel-7-server-supplementary-rpms
  • 4,9G rhel-server-rhscl-7-rpms

After that I run reposync a second time without -n. Now the disk space usage looks like:

  • 2,6G rhel-7-server-extras-rpms
  • 35G rhel-7-server-optional-rpms
  • 34G rhel-7-server-rpms
  • 7,5G rhel-7-server-supplementary-rpms
  • 11G rhel-server-rhscl-7-rpms

So, my 80 GB partition from two years ago would be too small for that. :-)

I wanted to piggyback off Jorg's comment above with an updated size. I just created new local repos and the space below is what was used (with no compression of any kind, please note). Also please note - I only download 3 of the 5 mentioned above, so I do not have a size that includes those other two.

reposync --gpgcheck -l --repoid=rhel-7-server-optional-rpms --download_path=[mypath] --downloadcomps --download-metadata

  • 231M ///repodata
  • 2.9G ///rhel-7-server-extras-rpms
  • 46G ///rhel-7-server-optional-rpms
  • 39G ///rhel-7-server-rpms
  • 88G total

(Anyone landing here reading this specific discussion who downloads multiple repositories)

One important thing to remember if you have multiple repositories that happen to be on the same file system... In my case, I do a content-view export courtesy of Rich Jerrido (thanks Rich, I've been relying on your good article for some years now). Again, in my case, the content view export I do results in about 1.8-ish TB of rpms. I do a hardlink -cv /path/to/Default-Content-View-Export_using_the_actual_name which does a deduplication of duplication rpms by hard linking them. The result is going from 1.8TB to 313GB. Then I have to do a rsync -Hau --progress $source $target - and the -H will retain hard links during the rsync.

I do this from my public facing satellite. I take the resultant content view that has been deduplicated of duplicated rpms by hard-linking to my collection of disconnected satellites.

If you have multiple repositories on the same file system, this might be useful to you. Those repositories do have a lot of rpm duplications, enough in my case to take it from 1.8TB to 313GB or so. Your actual mileage may vary since you may not be taking down an entire Content View export such as I'm doing. It may help though if you take more than one repository down.

Regards

RJ

Hello RJ,
I have a question regarding the duplicated rpms when syncing multiple repos. Does this only apply to Satellite content views or all repos in general?

For example, when syncing rhel-8-for-x86_64-baseos-rpms, it should not contain duplicates. And when syncing rhel-8-for-x86_64-appstream-rpms it should not contain duplicates and should not contain any package that is in rhel-8-for-x86_64-baseos-rpms, right? How come that there are duplicates in the first place?

Regards,
Jörg

Merry Christmas,
Today I ran reposync for the RHEL 8 repos downloading only the newest packages with the following result:

6.8G  rhel-8-for-x86_64-appstream-rpms
1.4G  rhel-8-for-x86_64-baseos-rpms

Besides that I would like to draw your attention to the RHEL 8 version of the Poor Man's RHEL Mirror hosted on GitHub.com. Information about what it does and for what it could be used you will find in the README.md.

Please feel free to use it and adapt it to your own needs. Feedback is welcome.

Best regards,
Joerg

All the details above refer to version 7. Any idea how much disk space is required to a mirror for version 8.x?

Hi Edward,

What Jörg provided in his output/post from 2019-12-25 above IS the reposync for RHEL 8 ! :)

Regards,
Christian

I ran a reposync just for centos and reached 50GB before I ran out of space. I believe Jörg's summary referes to just "newest packages".

Yes Edward - Jörg said : "Today I ran reposync for the RHEL 8 repos downloading only the newest packages." ... :)
Nothing has changed here, what Stephen said is still valid : "It is not possible to give exact sizes as packages have
dependencies and new package version are released in Errata from time to time. So the repo size keeps growing."

Regards,
Christian

Edward Resnick,

Please see the thing I wrote above about reducing space. Of course, this is ran after-the fact.

Regards,
RJ

Understood - but grosso modo? 100GB, 1TB? I need to define disk space and don't want it all to fail at 95%.

I'd say 100 GB should be sufficient, Edward - but it's pure guessing. Depends on which repos you wanna sync. :)

Regards,
Christian

I need to make a full repository for an customer who is air gapped and it will be used by foreman to deploy bare metal and VMs. So I need the large number. If you say 100GB so 200GB should be enough then (I hope).

Hi Edward,

I'm not able to give you a number here, because I don't mirror all packages. But it would be nice when you update this topic after you have figured out how much disk space you've needed.

Regards, Jörg

This might be useful info for somebody, today, I was setting a local mirror for RHEL8, and below is the disk space it used.

2.2G    codeready-builder-for-rhel-8-x86_64-rpms
36G     rhel-8-for-x86_64-appstream-rpms
9.5G    rhel-8-for-x86_64-baseos-rpms
1.3G    rhel-8-for-x86_64-supplementary-rpms

Thanks for providing this information, Mike ! :)

Thanks, Mike Brooker!

51G rhel-7-server-rpms 67G rhel-7-server-optional-rpms 3.4G rhel-7-server-extras-rpms

18.5.2021

45G rhel-8-for-x86_64-appstream-rpms

13G rhel-8-for-x86_64-baseos-rpms

17/07/2021

For those who want to reduce the amount of space the entire set of repos uses, I posted something above on that using hardlink -cvv /path/to/where/you/downloaded/repos.

This is even more relevant when you have downloaded multiple repositories and placed them in one directory. I mentioned above for example, when I download the entire Red Hat content view with my satellite server, I can reduce the footprint by hundreds of gigabytes.

See my post above (scroll up) on this.

Regards,
RJ

@7.4G for rhel-8_x86_64-appstream-rpms (newest only)

Hey everyone,
Here is an update from my environment.

RHEL 7

Repos are synced with the following command:

reposync --gpgcheck -l --repoid=$REPO --download_path=$DOWNLOADPATH --downloadcomps --download-metadata -d -n

Newest packages only are downloaded and the ones that are not in upstream anymore are deleted on the local mirror. That leads to:

460M    rhel-7-server-extras-rpms
9,9G    rhel-7-server-optional-rpms
7,6G    rhel-7-server-rpms
658M    rhel-7-server-supplementary-rpms
11G rhel-server-rhscl-7-rpms
RHEL 8

Command used:

reposync --repoid=$REPO --download-path=$BASEDIR --downloadcomps --download-metadata -n

So download only the newest packages but keep all.

28G rhel-8-for-x86_64-appstream-rpms
7,7G    rhel-8-for-x86_64-baseos-rpms

By the way, I'm glad the number of repos decreased in RHEL 8.

Newest only (rhel-77-server-rpms is RHEL 7.7)

93M codeready-builder-for-rhel-8-x86_64-rpms
5.5G    rhel-77-server-rpms
287M    rhel-7-server-extras-rpms
6.6G    rhel-7-server-rpms
8.7G    rhel-8-for-x86_64-appstream-rpms
2.2G    rhel-8-for-x86_64-baseos-rpms

after using hardlink to dedup:

93M codeready-builder-for-rhel-8-x86_64-rpms
5.5G    rhel-77-server-rpms
287M    rhel-7-server-extras-rpms
3.8G    rhel-7-server-rpms
8.7G    rhel-8-for-x86_64-appstream-rpms
2.2G    rhel-8-for-x86_64-baseos-rpms

14/10/2021

Hi I am testing reposync with option -n for save my disk space i have a question if i want to except download packages *.i686.rpm ? How can i config that?

i am a newbie, sorry for easy my question:) May, 15/10/2021

Hi,
In case you are looking for packages of a specific architecture you could use the option -a, e.g. -a i686 to download i686-architecture only.

Please see reposync(1) for more information on this.

I use the following options to exclude 32-bit packages.

--arch x86_64,noarch --newest-only

Jan 2021 the repos were around the size below - will have grown since then, but it will hopefully provide a ball-park figure.

2.2G    codeready-builder-for-rhel-8-x86_64-rpms
36G     rhel-8-for-x86_64-appstream-rpms
9.5G    rhel-8-for-x86_64-baseos-rpms
1.3G    rhel-8-for-x86_64-supplementary-rpms
13 October 2023
reposync --repoid=$REPO --download-path=$BASEDIR  --download-metadata -n

12G     rhel-8-for-x86_64-appstream-rpms
1.6G    rhel-8-for-x86_64-baseos-rpms
195M    rhel-8-for-x86_64-supplementary-rpms