Performance issue with HP SmartArray P400 raid controller

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux
  • HP ProLiant DL580 G5
  • HP P400 SmartArray

Issue

We have 2 cluster nodes : inhdbpri03 and inhdbpri04

inhdbpri04 needs nearly about 5 times longer for a 'yum makecache':

inhdbpri03 : real 2m12.861s, user 1m54.431s, sys 0m12.021s 
inhdbpri04 : real 10m27.887s user 2m1.214s sys 0m17.096s

Both server are HP ProLiant DL580 G5 with similar load and the same OS installation / patches.

Resolution

From the test there is an issue with the HP Smart Array P400 controller. The following should be checked, which is also mentioned in the following article from HP 'Write Operation to Local Datastore Is Slow' :

a) Battery-Backed Write Cache (BBWC) is installed
b) Battery-Backed Write Cache (BBWC) is enabled to 50 percent to read and 50 percent to write.
c) the battery is ok

From there:

When it is confirmed from the previous test that the disk write is slow, check whether Battery-Backed Write Cache (BBWC) 
is installed and enabled to 50 percent to read and 50 percent to write. This can be checked by booting 
HP ProLiant DL 380 G5 with HP SmartStart CD.

In some cases, 128/256/512 MB cache module might got installed and cache ratio is set to 100 percent to read but cannot 
change. The ratio cannot be changed until and unless there is a battery pack installed for the module.

Please contact HP support to verify the Smart Array Controller configuration.

Root Cause

  • Battery-Backed Write Cache (BBWC) battery is empty

Diagnostic Steps

  • The strace -c shows the same system calls on both systems. Most time consuming are the write calls.

inhdbpri03

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 71.16    0.365388           0   1367705           write
 13.23    0.067918           8      8167           brk
  8.32    0.042696           0    468661       394 read
  5.83    0.029908           0   1741907           lseek
...
------ ----------- ----------- --------- --------- ----------------
100.00    0.513440               3611230      5926 total

inhdbpri04

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 82.81    1.088414           1   1367705           write
 10.21    0.134189           0    468369       174 read
  3.27    0.042950          30      1440           fsync
  2.92    0.038419           0   1741907           lseek
  0.26    0.003395           4       781           poll
  0.18    0.002329           7       334           brk
...
------ ----------- ----------- --------- --------- ----------------
100.00    1.314405               3602885      5706 total
  • looking again at the detailed strace and focus on the writes we see that 10 minutes have been spent writing data for the rbi-rhel5-5.0 channel. Compare times writing rbi-rhel5-5.0 from package 1/15626 to 15625/15626 like :
14:30:57.345493 write(1, "\rbi-rhel5-5.0: [##############################################################################                     ] 12330/15626", 128) = 128 <0.000015>

The different parts of the repository requires to run through the channel data 3 times:

inhdbpri03

      1st run           2nd run           3rd run
start 14:29:16.045607   14:31:55.504825   14:35:20.339379
end   14:31:50.891361   14:35:10.255625   14:38:37.843687
      ~2:30min          ~3min             ~3min     

inhdbpri04

      1st run           2nd run           3rd run
start 14:30:01.293210   14:37:04.570126   14:44:49.772010
end   14:36:43.813519   14:44:22.621332   14:49:33.605101
      ~6:42min          ~7min             ~5min

=> about +4min 1st and 2nd run -> 10min spend during this operation

  • since this is an I/O issue, collect statistics while running the command using iostat. This shows that the cciss device, where the logical volume is, is 100% utilized:
Time: 04:56:02 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.89    0.00    0.71    8.35    0.00   88.05

Device:         rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
cciss/c0d0        0.00   457.00  0.00 629.00     0.00  4380.00    13.93   142.22  235.68   1.59 100.10
cciss/c0d0p1      0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
cciss/c0d0p2      0.00   457.00  0.00 629.00     0.00  4380.00    13.93   142.22  235.68   1.59 100.10
  • What does this mean:
%util
      Percentage of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100%.
  • running some tests using dd with oflag=direct option that we prevent data is written to cache:

1.) Test with direct write on inhdbpri04 - during this run the iostat on cciss gives 100%, like seen with 'yum makecache'

# time sh -c "dd if=/dev/zero of=/var/tmp/ddfile bs=8k count=10000 oflag=direct"
10000+0 records in
10000+0 records out
81920000 bytes (82 MB) copied, 68.5465 seconds, 1.2 MB/s

real    1m8.551s
user    0m0.005s
sys     0m0.248s

2.) Test on (good) partner node inhdbpri03:

# time sh -c "dd if=/dev/zero of=/var/tmp/ddfile bs=8k count=10000 oflag=direct"
10000+0 records in
10000+0 records out
81920000 bytes (82 MB) copied, 0.828298 seconds, 98.9 MB/s

real    0m0.832s
user    0m0.006s
sys     0m0.253s

=> 1.2 MB/s compared to 98.9 MB/s

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments