Large file copies require manual drop_caches to maintain throughput
Hello
I have (4) systems that are identical, with rhel 6.8/9. The difference is that two were setup with rhel 6.2 and the others setup about rhel 6.5.
systems have 12 core / 2 thread CPU and 128GB RAM, and dual 8Gpbs fibre cards.
For the longest time I have had this issue. When I perform a file copy, with say a 100GB file, the first two servers will maintain a high throughput for about 50% of the file, then trail off to about 10MB/sec for the rest of the copy. The second two seem to be able to maintain a high throughput for the entire copy.
Over time I have figured out, by watching my SAN perf monitor, iostat, and mem usage that the once the first to fill up cache, I can issue a manual
sync
echo 1 >> /proc/sys/vm/drop_caches
and see memory usage drop to minimal, and the throughput pick back up to several hundred MB/sec. The servers loaded with 6.5 do not exhibit this behavior, and maintain a pretty steady copy the entire time, as if they are flushing the cache as needed. I have been unable to pinpoint how to fix the older servers.
By watching my stats, the typical order of events
- start copy
- as copy progresses mem usage increases
- SAN IOPs on writes runs at about 30k
- SAN read IOPS drops to minimal at about 40GB
- SAN write IOPS run for another couple of minutes, then drops
- throughput drops to about 10MB/sec
- I issue commands
- mem usage drops
- IOPS increase to 30K
- throughput returns
Just curious if there is a config change I can make on the two servers to make them copy more like the second two.
Thanks!
Jim