Increasing vsftpd's disk IO size beyond 2MB to 4MB.
We have vsftp running very well ... at 10GbE wire speed of 1250 MB/sec between two strong 64-bit x86 RHEL servers. The underlying file system is 64-bit IBM GPFS, with a disk farm optimized for large IO, and running over 10,000 MB/sec, driving multiple 8 gbit fibre-channel controllers (Emulex LP 12002) at wire-speed.
GPFS is configured for a 4 MB block size, and the Linux IO stack's IO sizes parameters have been adjusted to allow for 4MB IO.
Using a simple "dd if=/dev/zero of=<file in GPFS file system> bs=4M" yields 2,900 MB/sec, with iostat showing IO being issued with an average write size of ~ 4MB. The system has 4 x 8 gbit fibre channel controllers.
All is well.
When we use vsftp to transfer large files, we get excellent performance ... wire speed @ 1250 MB/sec in a single direction, but we have observed that the disk IO is being done using a 2 MB IO size, which is less efficient than the 4 MB that GPFS is configured for. Sendfile is NOT enabled within vsftpd, as GPFS does not support sendfile.
From our preliminary troubleshooting, it appears that vsftp issuing a file-specific "fsync" after writing 2MB. This compells GPFS to flush the 2MB of data to disk, without further buffering. Our disk farm is strong enough to handle the less-efficient 2MB IOs, but we see an increase in storage busy-ness because of the 2MB IOs.
A quick scan of the vsftpd source code did not uncover where these "short" writes with sync were occurring. We have not yet tried tracing the execution using systemtap.
We're trying to identify where the 2 MB limitation is being specificied, and the fsync behavior enabled. It might not be in the vsftpd code itself, but an underlying library, like glib. The fsync (if it truly is being issued) could be an artifact of the vsftpd restart/recovery algorithms.
We understand that a simple systemtap trace will identify eactly what sequence of file writes, what "flavor" of write system calls, and the other ioctl's that are being issued, but we have not yet run the test. (Our production system unfortunately does not have systemtap installed).
Since other programs doing simple disk IO (like "dd") to the GPFS file system show average IO sizes of 4MB in iostat, we know that GPFS and the Linux IO stack are properly configured to enable 4MB IO. With vsftp of a multi-gigabyte file, the writes are constrained to only 2MB ... even though we still are sustaining 1250 MB/sec due to the strength of the storage system.
Thoughts?
Dave B.