NVMe performance degradation on RHEL 6.6
Problem Statement
We are seeing unexpected performance degradation on our NVMe device when using RHEL 6.6.
The scenario is running a FIO random read job with a 4k block size. See all the parms below.
We do not see the problem when using RHEL 6.5 or RHEL 7.0 on the same hardware
System Details
OS Level RHEL 6.6
Kernel 2.6.32-504.el6.x86_64
H/W Super-micro X10SAE motherboard,
16GB ddr3 memory @ 1600MHz
Intel Xeon CPU E3-1225 v3 @ 3.20GHz, 1 socket – 4 core
Device Samsung NVMe SSD Controller 171X (rev 03)
Dell Express Flash NVMe XS1715 SSD 400GB
Using PCIe 3.0 slot. Target Link Speed: 8GT/s (from lspci)
Driver nvme – shipped with the kernel code
FIO Test Results
RHEL 6.5 2.6.32-431.23.3.el6.x86_64 750 Kiops @ 55% cpu utilization
RHEL 6.6 2.6.32-504.el6.x86_64 139 Kiops @ 97% cpu utilization <----
RHEL 7.0 3.10.0-123.9.3.el7.x86_64 753 Kiops @ 59% cpu utilizaion
CentOS 6.5 2.6.32-431.29.2.el6.centos.plus.x86_64 753 Kiops @ 58% cpu utilization
CentOS 6.6 2.6.32-504.1.3.el6.x86_64 749 Kiops @ 55% cpu utilization
CentOS 7.0 3.10.0-123.9.2.el7.x86_64 749 Kiops @ 59% cpu utilization
FIO Output Sample (using rhel 6.6)
Measure_RR_4KB_QD256: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
...
Measure_RR_4KB_QD256: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.0.13
Starting 4 processes
Jobs: 4 (f=4): [rrrr] [100.0% done] [543.5M/0K/0K /s] [139K/0 /0 iops] [eta 00m:00s]
Measure_RR_4KB_QD256: (groupid=0, jobs=4): err= 0: pid=5510: Mon Dec 8 12:24:54 2014
read : io=161844MB, bw=552427KB/s, iops=138106 , runt=300001msec
slat (usec): min=0 , max=94120 , avg=21.57, stdev=96.18
clat (usec): min=7 , max=96788 , avg=1828.19, stdev=801.42
lat (usec): min=94 , max=96875 , avg=1850.74, stdev=806.36
clat percentiles (usec):
| 1.00th=[ 812], 5.00th=[ 1064], 10.00th=[ 1208], 20.00th=[ 1384],
| 30.00th=[ 1512], 40.00th=[ 1640], 50.00th=[ 1768], 60.00th=[ 1880],
| 70.00th=[ 2024], 80.00th=[ 2192], 90.00th=[ 2448], 95.00th=[ 2672],
| 99.00th=[ 3280], 99.50th=[ 3856], 99.90th=[11840], 99.95th=[13120],
| 99.99th=[21632]
bw (KB/s) : min=76848, max=154416, per=25.01%, avg=138144.24, stdev=7932.83
lat (usec) : 10=0.01%, 50=0.01%, 100=0.01%, 250=0.01%, 500=0.03%
lat (usec) : 750=0.56%, 1000=2.98%
lat (msec) : 2=64.83%, 4=31.12%, 10=0.30%, 20=0.16%, 50=0.01%
lat (msec) : 100=0.01%
cpu : usr=20.09%, sys=77.38%, ctx=165487, majf=0, minf=354
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=41432135/w=0/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=161844MB, aggrb=552426KB/s, minb=552426KB/s, maxb=552426KB/s, mint=300001msec, maxt=300001msec
FIO Job Parms
;Async Test CPU Utilization
;======================
; -- start job file --
[Measure_RR_4KB_QD256]
ioengine=libaio
direct=1
rw=randread
norandommap
randrepeat=0
iodepth=64
size=25%
numjobs=4
bs=4k
overwrite=1
filename=/dev/nvme1n1
runtime=5m
time_based
group_reporting
stonewall
Responses
Thomas,
Have you tried RHEL 6.6 with the RHEL 6.5 kernel version? This may help narrow the issue down to either an OS component or the update in kernel being the culprit.
From there I think it will definitely be a formal support request through Red Hat to investigate the regression.
Just curious as to if there was a ticket opened on this, and what the resolution might have been. Same situation here on 6.6. Thanks so much!
Interesting issue (I wish I could look at this first hand).
I am curious:
* are you using tuned profiles -- https://access.redhat.com/videos/898563
* did you double-check the disk alignment on the SSD between the 2 builds
* differences between the I/O elevator settings -- https://access.redhat.com/solutions/54164
* different SElinux settings (or nsswitch - perhaps there is a hang-up doing user lookups? a bit of a stretch here)
* buffer cache tuning the same between hosts (sysctl -a)
* boot params the same between 6.5 and 6.6 ( cat /proc/cmdline)
Along the lines of what PixelDrift had asked, I would be curious what happens if you install 6.5 and update to 6.6 and if the problem returns. If so, analyze all the files that get updated (using find or something). Hopefully it's a tunable (or setup) causing this behavior and not a binary.
Perhaps my issue is slightly different, but I'm seeing abysmal read performance with Intel PCIe NVMe drives and EL6.6 with XFS or ext4 filesystems. Tests against the raw unformatted devices with fio are good, but read performance is capped at around 600MB/s sequential when I add a filesystem.
The state of irqbalance didn't impact results. I'm not in a position to downgrade to EL6.5 to test, nor move to EL7.1 (because of other incompatibilities).
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
