rsync is extremely slow to synchronize a large amount of small files
Environment
Red Hat Enterprise Linux (RHEL) all
Issue
The rsync
is used to synchronize the files from a /home/user/folder_with_subfolders
to an NFS
mounted folder /home/user/mountpoint
.
The total size of the folder_with_subfolders
is about 59GB, but it cost almost 10 days to complete rsync
command.
According to the result of rsync
, in the folder_with_subfolders
there are more than 16,000,000 files with an average size of about 4KB each.
Is there any way to speed up?
Resolution
Instead of using NFS
, we have several other methods (the resulting speed depends on the environment):
-
rsync
overssh
:$ rsync -a folder_with_subfolders/ rsync_server:~/backup or $ rsync -avz folder_with_subfolders/ rsync_server:~/backup
-
Recursive
scp
:$ scp -r folder_with_subfolders/ rsync_server:~/backup
-
Packed chunks of files over
ssh
(should be significantly faster than the previous methods):$ find folder_with_subfolders/ -mindepth 1 -maxdepth 1 -type d -exec bash -c 'tar cz {} | ssh rsync_server tar -xzf - -C backup' \;
-
A backup software, for example:
AMANDA
(Advanced Maryland Automatic Network Disk Archiver) which is a package fromRHEL
repository.
In order to prevent having a delay in the ssh
connection used by rsync
, create the following block in ~/.ssh/config
for example:
Host rsync_server
Hostname server
User user
ControlMaster auto
ControlPath ~/.ssh/sockets/%r@%h-%p
ControlPersist 600
Set the correct permissions and create a folder for sockets:
$ chmod 600 ~/.ssh/config
$ mkdir ~/.ssh/sockets
This will have rsync
reuse the ssh
connection whenever it can.
Root Cause
NFS
is a bottleneck because it has poor performance when synchronizing a large number of small files with rsync
.
Even the following basic command does not work faster than rsync
:
$ cp -a folder_with_subfolders/. mountpoint/
Diagnostic Steps
-
Create a large number of test files:
$ mkdir rsync_test $ for (( i = 0; i < 1000000; i++ )) do > rsync_test/file_$i; done $ rsync -avz rsync_test mountpoint/
-
After more than an hour on the remote system, the total number of files is still low:
$ ls | wc -l 60878
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments