Performance issue with HP SmartArray P400 raid controller
Environment
- Red Hat Enterprise Linux
- HP ProLiant DL580 G5
- HP P400 SmartArray
Issue
We have 2 cluster nodes : inhdbpri03 and inhdbpri04
inhdbpri04 needs nearly about 5 times longer for a 'yum makecache
':
inhdbpri03 : real 2m12.861s, user 1m54.431s, sys 0m12.021s
inhdbpri04 : real 10m27.887s user 2m1.214s sys 0m17.096s
Both server are HP ProLiant DL580 G5 with similar load and the same OS installation / patches.
Resolution
From the test there is an issue with the HP Smart Array P400 controller. The following should be checked, which is also mentioned in the following article from HP 'Write Operation to Local Datastore Is Slow' :
a) Battery-Backed Write Cache (BBWC) is installed
b) Battery-Backed Write Cache (BBWC) is enabled to 50 percent to read and 50 percent to write.
c) the battery is ok
From there:
When it is confirmed from the previous test that the disk write is slow, check whether Battery-Backed Write Cache (BBWC)
is installed and enabled to 50 percent to read and 50 percent to write. This can be checked by booting
HP ProLiant DL 380 G5 with HP SmartStart CD.
In some cases, 128/256/512 MB cache module might got installed and cache ratio is set to 100 percent to read but cannot
change. The ratio cannot be changed until and unless there is a battery pack installed for the module.
Please contact HP support to verify the Smart Array Controller configuration.
Root Cause
- Battery-Backed Write Cache (BBWC) battery is empty
Diagnostic Steps
- The strace -c shows the same system calls on both systems. Most time consuming are the write calls.
inhdbpri03
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
71.16 0.365388 0 1367705 write
13.23 0.067918 8 8167 brk
8.32 0.042696 0 468661 394 read
5.83 0.029908 0 1741907 lseek
...
------ ----------- ----------- --------- --------- ----------------
100.00 0.513440 3611230 5926 total
inhdbpri04
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
82.81 1.088414 1 1367705 write
10.21 0.134189 0 468369 174 read
3.27 0.042950 30 1440 fsync
2.92 0.038419 0 1741907 lseek
0.26 0.003395 4 781 poll
0.18 0.002329 7 334 brk
...
------ ----------- ----------- --------- --------- ----------------
100.00 1.314405 3602885 5706 total
- looking again at the detailed strace and focus on the writes we see that 10 minutes have been spent writing data for the rbi-rhel5-5.0 channel. Compare times writing rbi-rhel5-5.0 from package 1/15626 to 15625/15626 like :
14:30:57.345493 write(1, "\rbi-rhel5-5.0: [############################################################################## ] 12330/15626", 128) = 128 <0.000015>
The different parts of the repository requires to run through the channel data 3 times:
inhdbpri03
1st run 2nd run 3rd run
start 14:29:16.045607 14:31:55.504825 14:35:20.339379
end 14:31:50.891361 14:35:10.255625 14:38:37.843687
~2:30min ~3min ~3min
inhdbpri04
1st run 2nd run 3rd run
start 14:30:01.293210 14:37:04.570126 14:44:49.772010
end 14:36:43.813519 14:44:22.621332 14:49:33.605101
~6:42min ~7min ~5min
=> about +4min 1st and 2nd run -> 10min spend during this operation
- since this is an I/O issue, collect statistics while running the command using iostat. This shows that the cciss device, where the logical volume is, is 100% utilized:
Time: 04:56:02 PM
avg-cpu: %user %nice %system %iowait %steal %idle
2.89 0.00 0.71 8.35 0.00 88.05
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
cciss/c0d0 0.00 457.00 0.00 629.00 0.00 4380.00 13.93 142.22 235.68 1.59 100.10
cciss/c0d0p1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
cciss/c0d0p2 0.00 457.00 0.00 629.00 0.00 4380.00 13.93 142.22 235.68 1.59 100.10
- What does this mean:
%util
Percentage of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100%.
- running some tests using
dd
withoflag=direct
option that we prevent data is written to cache:
1.) Test with direct write on inhdbpri04 - during this run the iostat
on cciss gives 100%, like seen with 'yum makecache
'
# time sh -c "dd if=/dev/zero of=/var/tmp/ddfile bs=8k count=10000 oflag=direct"
10000+0 records in
10000+0 records out
81920000 bytes (82 MB) copied, 68.5465 seconds, 1.2 MB/s
real 1m8.551s
user 0m0.005s
sys 0m0.248s
2.) Test on (good) partner node inhdbpri03:
# time sh -c "dd if=/dev/zero of=/var/tmp/ddfile bs=8k count=10000 oflag=direct"
10000+0 records in
10000+0 records out
81920000 bytes (82 MB) copied, 0.828298 seconds, 98.9 MB/s
real 0m0.832s
user 0m0.006s
sys 0m0.253s
=> 1.2 MB/s compared to 98.9 MB/s
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments