Increasing vsftpd's disk IO size beyond 2MB to 4MB.

Latest response

We have vsftp running very well ... at 10GbE wire speed of 1250 MB/sec between two strong 64-bit x86 RHEL servers.  The underlying file system is 64-bit IBM GPFS, with a disk farm optimized for large IO, and running over 10,000 MB/sec, driving multiple 8 gbit fibre-channel controllers (Emulex LP 12002) at wire-speed.

GPFS is configured for a 4 MB block size, and the Linux IO stack's IO sizes parameters have been adjusted to allow for 4MB IO.

Using a simple "dd if=/dev/zero of=<file in GPFS file system> bs=4M" yields 2,900 MB/sec, with iostat showing IO being issued with an average write size of ~ 4MB.  The system has 4 x 8 gbit fibre channel controllers.

All is well.

When we use vsftp to transfer large files, we get excellent performance ... wire speed @ 1250 MB/sec in a single direction, but we have observed that the disk IO is being done using a 2 MB IO size, which is less efficient than the 4 MB that GPFS is configured for.  Sendfile is NOT enabled within vsftpd, as GPFS does not support sendfile.

From our preliminary troubleshooting, it appears that vsftp issuing a file-specific "fsync" after writing 2MB.  This compells GPFS to flush the 2MB of data to disk, without further buffering.  Our disk farm is strong enough to handle the less-efficient 2MB IOs, but we see an increase in storage busy-ness because of the 2MB IOs.

A quick scan of the vsftpd source code did not uncover where these "short" writes with sync were occurring.  We have not yet tried tracing the execution using systemtap.

We're trying to identify where the 2 MB limitation is being specificied, and the fsync behavior enabled.  It might not be in the vsftpd code itself, but an underlying library, like glib.  The fsync (if it truly is being issued) could be an artifact of the vsftpd restart/recovery algorithms.

We understand that a simple systemtap trace will identify eactly what sequence of file writes, what "flavor" of write system calls, and the other ioctl's that are being issued, but we have not yet run the test.  (Our production system unfortunately does not have systemtap installed).

Since other programs doing simple disk IO (like "dd") to the GPFS file system show average IO sizes of 4MB in iostat, we know that GPFS and the Linux IO stack are properly configured to enable 4MB IO.  With vsftp of a multi-gigabyte file, the writes are constrained to only 2MB ... even though we still are sustaining 1250 MB/sec due to the strength of the storage system.

Thoughts?

Dave B.

Responses

Dave B.

Is this RHEL6, if so we can grab some stack traces with fairly low overhead while the vsftpd is running to see the kernel behaviour and calls. This may give us enought to troubleshoot further.

Can you run a script to grab the stack dumps say every 10s for the vsftpd PID's.

Something like:

#!/bin/bash

PID=$1

while true

do

 cat /proc/${PID}/stack

 sleep 10

done

 

These steps have an impact, but thay are low so if running in prod, you should do it after hours.

Also would be good to get a strace -c -f -ttt -T -p PID. (Where PID is the forked vsftpd process)

Running just an strace aginst the PID for a forked vxftpd would also be a good data point

Let me have this data and I can try and help you root cause this

Thanks

Laurence Oberman

Hi Laurence,

Thank you for the suggestions.  The system where the behavior was first identified is a RHEL 5.4/5.5 system, but I should be able to reproduce it on a RHEL 6.x system that also shows the behavior.  Strace should work on both and is easier than Systemtap which is not installed.

For testing and tracing purposes, I should not need to run at 10GbE speeds to track the technique used to perform the disk IO, which lowers the complexity.

Let me see what I can dig up and post.

Dave B