rsync is extremely slow to synchronize a large amount of small files

Solution Verified - Updated -

Environment

Red Hat Enterprise Linux (RHEL) all

Issue

The rsync is used to synchronize the files from a /home/user/folder_with_subfolders to an NFS mounted folder /home/user/mountpoint.
The total size of the folder_with_subfolders is about 59GB, but it cost almost 10 days to complete rsync command.
According to the result of rsync, in the folder_with_subfolders there are more than 16,000,000 files with an average size of about 4KB each.
Is there any way to speed up?

Resolution

Instead of using NFS, we have several other methods (the resulting speed depends on the environment):

  • rsync over ssh:

    $ rsync -a folder_with_subfolders/ rsync_server:~/backup
    
    or
    
    $ rsync -avz folder_with_subfolders/ rsync_server:~/backup
    
  • Recursive scp:

    $ scp -r folder_with_subfolders/ rsync_server:~/backup
    
  • Packed chunks of files over ssh (should be significantly faster than the previous methods):

    $ find folder_with_subfolders/ -mindepth 1 -maxdepth 1 -type d -exec bash -c 'tar cz {} | ssh rsync_server tar -xzf - -C backup' \;
    
  • A backup software, for example: AMANDA (Advanced Maryland Automatic Network Disk Archiver) which is a package from RHEL repository.


In order to prevent having a delay in the ssh connection used by rsync, create the following block in ~/.ssh/config for example:

Host rsync_server
    Hostname server
    User user
    ControlMaster auto
    ControlPath  ~/.ssh/sockets/%r@%h-%p
    ControlPersist 600

Set the correct permissions and create a folder for sockets:

$ chmod 600 ~/.ssh/config
$ mkdir ~/.ssh/sockets

This will have rsync reuse the ssh connection whenever it can.

Root Cause

NFS is a bottleneck because it has poor performance when synchronizing a large number of small files with rsync.

Even the following basic command does not work faster than rsync:

$ cp -a folder_with_subfolders/. mountpoint/

Diagnostic Steps

  • Create a large number of test files:

    $ mkdir rsync_test
    $ for (( i = 0; i < 1000000; i++ )) do > rsync_test/file_$i; done
    $ rsync -avz rsync_test mountpoint/
    
  • After more than an hour on the remote system, the total number of files is still low:

    $ ls | wc -l
    60878
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments