Efficient remote copy of sparse files on RHEL 6
Environment
- Red Hat Enterprise Linux 6
Issue
We have a set of large sparse files which we need to copy over the network to another storage. How can we achieve this efficiently?
Resolution
IMPORTANT NOTE: the procedure described below uses unsupported tools. Red Hat provides this article for information only and cannot provide technical support for the tools mentioned below.
-
get
libarchive
which includesbsdtar
$ wget http://www.libarchive.org/downloads/libarchive-3.1.2.tar.gz
- extract and build the source
$ tar xvzf libarchive-3.1.2.tar.gz
$ cd libarchive-3.1.2
$ ./configure
$ make
(as root)# make install
- create some dummy sparse files
$ dd if=/dev/urandom of=sparse.bin bs=1 count=1 seek=999999999
$ bsdtar -Scvf - sparse.bin | ssh destination_server "tar -xSvf -"
Root Cause
Sparse files could be copied to a remote destination using common tools like rsync
and tar
. For instance, you may use rsync
as in:
$ dd if=/dev/urandom of=sparse.bin bs=1 count=1 seek=999999999
$ rsync -Sv sparse.bin remote-server:/tmp/
You could also use tar
as in:
$ dd if=/dev/urandom of=sparse.bin bs=1 count=1 seek=999999999
$ tar cSv sparse.bin | ssh remote-server "tar xSv"
This shall produce a reasonably sized destination file which should not take long to copy over the network. However, scanning the original source file would still take too long and remains inefficient in Red Hat Enterprise Linux up to version 6.
Two solutions were implemented in the Linux kernel to provide applications with an efficient way to scan sparse files: the FIEMAP ioctl and the SEEK_HOLE/SEEK_DATA options to the lseek(2) system call.
The FIEMAP ioctl is specific to Linux and seems to be used by some coreutils binaries like cp
and other non-GNU tools like bsdtar
running on Fedora 17 and later versions. This ioctl is implemented by most file systems and is available even in RHEL 6 kernels. However, not all user space tools (likely to handle sparse files) do implement it.
For instance, cp version 8.15 shipped in Fedora 17 does implement this ioctl and does handle sparse files very efficiently:
$ cp --version
cp (GNU coreutils) 8.15
(..)
$ dd if=/dev/urandom of=sparse.bin bs=1 count=1 seek=99999999999
$ ls -lh sparse.bin
-rw-r--r--. 1 root root 94G Apr 1 16:06 sparse.bin
$ time strace cp sparse.bin sparse.bin.copy
(..)
ioctl(3, FS_IOC_FIEMAP, 0x7fffcbaffc00) = 0
(..)
real 0m0.024s
user 0m0.002s
sys 0m0.007s
Recent versions of cp
would also do an efficient copy when built on RHEL 6. bsdtar
does implement this ioctl too, but it's not shipped in RHEL.
$ time strace bsdtar -Scvf sparse.bin.tar sparse.bin
(..)
ioctl(6, FS_IOC_FIEMAP, 0x7fffa81b98a0) = 0
(..)
a sparse.bin
real 0m0.072s
user 0m0.034s
sys 0m0.010s
$ ls -lh sparse.bin.tar
-rw-r--r--. 1 root root 9.0K Apr 1 16:09 sparse.bin.tar
The other option using SEEK_HOLE and SEEK_DATA has been introduced in Solaris a few years ago and it seems like the Linux kernel has re-introduced this interface since the FIEMAP ioctl was quite complex to use and to preserve compatibility with the existing Solaris implementation.
Most file systems do implement this capability as of Fedora 17. However, user-space tools support does not seem to be ready yet. This support is being implemented in rsync
, for instance.
As Red Hat Enterprise Linux 7 is likely to be based on Fedora 19, we are quite confident that most of the bits required (in kernel space and user space) will go in RHEL 7.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments