NVMe performance degradation on RHEL 6.6
Problem Statement
We are seeing unexpected performance degradation on our NVMe device when using RHEL 6.6.
The scenario is running a FIO random read job with a 4k block size. See all the parms below.
We do not see the problem when using RHEL 6.5 or RHEL 7.0 on the same hardware
System Details
OS Level RHEL 6.6 Kernel 2.6.32-504.el6.x86_64 H/W Super-micro X10SAE motherboard, 16GB ddr3 memory @ 1600MHz Intel Xeon CPU E3-1225 v3 @ 3.20GHz, 1 socket – 4 core Device Samsung NVMe SSD Controller 171X (rev 03) Dell Express Flash NVMe XS1715 SSD 400GB Using PCIe 3.0 slot. Target Link Speed: 8GT/s (from lspci) Driver nvme – shipped with the kernel code
FIO Test Results
RHEL 6.5 2.6.32-431.23.3.el6.x86_64 750 Kiops @ 55% cpu utilization RHEL 6.6 2.6.32-504.el6.x86_64 139 Kiops @ 97% cpu utilization ---- RHEL 7.0 3.10.0-123.9.3.el7.x86_64 753 Kiops @ 59% cpu utilizaion CentOS 6.5 2.6.32-431.29.2.el6.centos.plus.x86_64 753 Kiops @ 58% cpu utilization CentOS 6.6 2.6.32-504.1.3.el6.x86_64 749 Kiops @ 55% cpu utilization CentOS 7.0 3.10.0-123.9.2.el7.x86_64 749 Kiops @ 59% cpu utilization
FIO Output Sample (using rhel 6.6)
Measure_RR_4KB_QD256: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
...
Measure_RR_4KB_QD256: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.0.13
Starting 4 processes
Jobs: 4 (f=4): [rrrr] [100.0% done] [543.5M/0K/0K /s] [139K/0 /0 iops] [eta 00m:00s]
Measure_RR_4KB_QD256: (groupid=0, jobs=4): err= 0: pid=5510: Mon Dec 8 12:24:54 2014
read : io=161844MB, bw=552427KB/s, iops=138106 , runt=300001msec
slat (usec): min=0 , max=94120 , avg=21.57, stdev=96.18
clat (usec): min=7 , max=96788 , avg=1828.19, stdev=801.42
lat (usec): min=94 , max=96875 , avg=1850.74, stdev=806.36
clat percentiles (usec):
| 1.00th=[ 812], 5.00th=[ 1064], 10.00th=[ 1208], 20.00th=[ 1384],
| 30.00th=[ 1512], 40.00th=[ 1640], 50.00th=[ 1768], 60.00th=[ 1880],
| 70.00th=[ 2024], 80.00th=[ 2192], 90.00th=[ 2448], 95.00th=[ 2672],
| 99.00th=[ 3280], 99.50th=[ 3856], 99.90th=[11840], 99.95th=[13120],
| 99.99th=[21632]
bw (KB/s) : min=76848, max=154416, per=25.01%, avg=138144.24, stdev=7932.83
lat (usec) : 10=0.01%, 50=0.01%, 100=0.01%, 250=0.01%, 500=0.03%
lat (usec) : 750=0.56%, 1000=2.98%
lat (msec) : 2=64.83%, 4=31.12%, 10=0.30%, 20=0.16%, 50=0.01%
lat (msec) : 100=0.01%
cpu : usr=20.09%, sys=77.38%, ctx=165487, majf=0, minf=354
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=41432135/w=0/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=161844MB, aggrb=552426KB/s, minb=552426KB/s, maxb=552426KB/s, mint=300001msec, maxt=300001msec
FIO Job Parms
;Async Test CPU Utilization
;======================
; -- start job file --
[Measure_RR_4KB_QD256]
ioengine=libaio
direct=1
rw=randread
norandommap
randrepeat=0
iodepth=64
size=25%
numjobs=4
bs=4k
overwrite=1
filename=/dev/nvme1n1
runtime=5m
time_based
group_reporting
stonewall
Responses