NVMe performance degradation on RHEL 6.6
Problem Statement
We are seeing unexpected performance degradation on our NVMe device when using RHEL 6.6.
The scenario is running a FIO random read job with a 4k block size. See all the parms below.
We do not see the problem when using RHEL 6.5 or RHEL 7.0 on the same hardware
System Details
OS Level RHEL 6.6
Kernel 2.6.32-504.el6.x86_64
H/W Super-micro X10SAE motherboard,
16GB ddr3 memory @ 1600MHz
Intel Xeon CPU E3-1225 v3 @ 3.20GHz, 1 socket – 4 core
Device Samsung NVMe SSD Controller 171X (rev 03)
Dell Express Flash NVMe XS1715 SSD 400GB
Using PCIe 3.0 slot. Target Link Speed: 8GT/s (from lspci)
Driver nvme – shipped with the kernel code
FIO Test Results
RHEL 6.5 2.6.32-431.23.3.el6.x86_64 750 Kiops @ 55% cpu utilization
RHEL 6.6 2.6.32-504.el6.x86_64 139 Kiops @ 97% cpu utilization <----
RHEL 7.0 3.10.0-123.9.3.el7.x86_64 753 Kiops @ 59% cpu utilizaion
CentOS 6.5 2.6.32-431.29.2.el6.centos.plus.x86_64 753 Kiops @ 58% cpu utilization
CentOS 6.6 2.6.32-504.1.3.el6.x86_64 749 Kiops @ 55% cpu utilization
CentOS 7.0 3.10.0-123.9.2.el7.x86_64 749 Kiops @ 59% cpu utilization
FIO Output Sample (using rhel 6.6)
Measure_RR_4KB_QD256: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
...
Measure_RR_4KB_QD256: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.0.13
Starting 4 processes
Jobs: 4 (f=4): [rrrr] [100.0% done] [543.5M/0K/0K /s] [139K/0 /0 iops] [eta 00m:00s]
Measure_RR_4KB_QD256: (groupid=0, jobs=4): err= 0: pid=5510: Mon Dec 8 12:24:54 2014
read : io=161844MB, bw=552427KB/s, iops=138106 , runt=300001msec
slat (usec): min=0 , max=94120 , avg=21.57, stdev=96.18
clat (usec): min=7 , max=96788 , avg=1828.19, stdev=801.42
lat (usec): min=94 , max=96875 , avg=1850.74, stdev=806.36
clat percentiles (usec):
| 1.00th=[ 812], 5.00th=[ 1064], 10.00th=[ 1208], 20.00th=[ 1384],
| 30.00th=[ 1512], 40.00th=[ 1640], 50.00th=[ 1768], 60.00th=[ 1880],
| 70.00th=[ 2024], 80.00th=[ 2192], 90.00th=[ 2448], 95.00th=[ 2672],
| 99.00th=[ 3280], 99.50th=[ 3856], 99.90th=[11840], 99.95th=[13120],
| 99.99th=[21632]
bw (KB/s) : min=76848, max=154416, per=25.01%, avg=138144.24, stdev=7932.83
lat (usec) : 10=0.01%, 50=0.01%, 100=0.01%, 250=0.01%, 500=0.03%
lat (usec) : 750=0.56%, 1000=2.98%
lat (msec) : 2=64.83%, 4=31.12%, 10=0.30%, 20=0.16%, 50=0.01%
lat (msec) : 100=0.01%
cpu : usr=20.09%, sys=77.38%, ctx=165487, majf=0, minf=354
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=41432135/w=0/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=161844MB, aggrb=552426KB/s, minb=552426KB/s, maxb=552426KB/s, mint=300001msec, maxt=300001msec
FIO Job Parms
;Async Test CPU Utilization
;======================
; -- start job file --
[Measure_RR_4KB_QD256]
ioengine=libaio
direct=1
rw=randread
norandommap
randrepeat=0
iodepth=64
size=25%
numjobs=4
bs=4k
overwrite=1
filename=/dev/nvme1n1
runtime=5m
time_based
group_reporting
stonewall