Large file copies require manual drop_caches to maintain throughput

Latest response

Hello

I have (4) systems that are identical, with rhel 6.8/9. The difference is that two were setup with rhel 6.2 and the others setup about rhel 6.5.

systems have 12 core / 2 thread CPU and 128GB RAM, and dual 8Gpbs fibre cards.

For the longest time I have had this issue. When I perform a file copy, with say a 100GB file, the first two servers will maintain a high throughput for about 50% of the file, then trail off to about 10MB/sec for the rest of the copy. The second two seem to be able to maintain a high throughput for the entire copy.

Over time I have figured out, by watching my SAN perf monitor, iostat, and mem usage that the once the first to fill up cache, I can issue a manual

sync
echo 1 >> /proc/sys/vm/drop_caches

and see memory usage drop to minimal, and the throughput pick back up to several hundred MB/sec. The servers loaded with 6.5 do not exhibit this behavior, and maintain a pretty steady copy the entire time, as if they are flushing the cache as needed. I have been unable to pinpoint how to fix the older servers.

By watching my stats, the typical order of events

  • start copy
  • as copy progresses mem usage increases
  • SAN IOPs on writes runs at about 30k
  • SAN read IOPS drops to minimal at about 40GB
  • SAN write IOPS run for another couple of minutes, then drops
  • throughput drops to about 10MB/sec
  • I issue commands
  • mem usage drops
  • IOPS increase to 30K
  • throughput returns

Just curious if there is a config change I can make on the two servers to make them copy more like the second two.

Thanks!

Jim

Responses