RHEL 5.6 application performance slowdown due to P600 regression within cciss driver

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux 5.5
  • Red Hat Enterprise Linux 5.6
  • P600 Smart Array backplane raid controller

Issue

  • upgraded from RHEL 5.5 (3.6.20-RH4) to RHEL 5.6 (3.6.22-RH1) and applications run a lot slower than usual
  • installing HP's driver from http://cciss.sourceforge.net/ sources (3.6.20-20s) on RHEL 5.6 result in fixing the performance issue by regressing the driver to 5.5 or pre-5.5 levels
  • tried RHEL 5.7 and same performance issue present
  • performance issue is not present in P400 Smart Array adapter within same system, that is this seems localized to just P600 model

Resolution

If experiencing this problem, please create a case with both HP and Red Hat.

At present, the only way forward to avoid this issue appears to upgrade the Smart Array to a newer model card. The problem has only been encountered with the P600 controller. The problem isn't present in newer models like the P410, and others.

Root Cause

This appears to be a Smart Array firmware issue present within the P600 that doesn't work well with the newer driver revisions which include performance enhancements associated with scatter-gather maps. Per HP, the P600 reached end of life several years ago and therefore will not have any firmware related issues addressed.

From the provided iostat data for the three configurations: 5.5, 5.6, 5.6 w/HP upstream driver, it was observed that read io performance was the same across all three configurations but that write performance was substantially slower on the 5.6 configuration.

The 5.6 and later cciss drivers, both Red Hat shipped and the one available directly from HP includes new code which interrogates the controller firmware as to how many scatter-gather elements are supported. Code bisect testing identified the patch that included this change resulted in the performance drop. Although the patch included other changes, the strongest suspect is this change involved obtaining MaxSGElements from the controller firmware.

RHEL 5.5 reads:
Timestamp Device rrqm wrqm r/s w/s rkb/s wkb/s rsec/s wsec/s avgrq avgqu await svctm %util
14:55:08 cciss/c0d0 101.00 1.00 628.50 0.50 80532.00 12.00 40266.00 6.00 128.05 0.60 0.95 0.91 57.15
14:55:10 cciss/c0d0 180.00 30.50 1128.50 10.50 142452.00 328.00 71226.00 164.00 125.36 1.06 0.93 0.70 80.05
RHEL5.6 reads:
14:29:09 cciss/c0d0 129.00 29.50 914.00 79.00 114480.00 868.00 57240.00 434.00 116.16 1.99 2.00 0.89 88.15
14:29:11 cciss/c0d0 120.50 645.00 952.50 21.50 120296.00 5332.00 60148.00 2666.00 128.98 1.85 1.89 0.78 75.55
RHEL5.6 (HP) reads:
14:04:12 cciss/c0d0 129.50 17.00 967.00 20.50 120316.00 300.00 60158.00 150.00 122.14 1.27 1.28 0.86 85.20
14:04:14 cciss/c0d0 142.50 13.50 907.50 71.00 113464.00 676.00 56732.00 338.00 116.65 1.15 1.18 0.86 84.05

RHEL 5.5 and RHEL 5.6 w/HP upstream driver write performance numbers:

Timestamp Device     rrqm  wrqm      r/s    w/s     rkb/s wkb/s      rsec/s wsec/s   avgrq   avgqu   await   svctm  %util
14:55:44 cciss/c0d0 0.00 17655.50 0.00 789.50 0.00 135184.00 0.00 67592.00 171.23 37.98 45.52 0.39 30.50
14:55:46 cciss/c0d0 0.00 8772.50 0.00 609.50 0.00 87976.00 0.00 43988.00 144.34 10.89 21.23 0.34 20.80
14:55:48 cciss/c0d0 0.00 21003.00 0.00 1069.00 0.00 176272.00 0.00 88136.00 164.89 2.60 2.43 0.41 43.55
14:55:50 cciss/c0d0 0.00 3889.50 0.00 241.50 0.00 33352.00 0.00 16676.00 138.10 0.49 2.07 0.35 8.45
14:55:52 cciss/c0d0 0.00 16766.50 0.00 852.00 0.00 140948.00 0.00 70474.00 165.43 40.38 47.40 0.37 31.85

RHEL 5.6 driver write performance numbers:

Timestamp Device     rrqm  wrqm      r/s     w/s     rkb/s wkb/s      rsec/s wsec/s  avgrq   avgqu   await   svctm  %util
14:29:41 cciss/c0d0 0.00 1031.00 0.00 43.50 0.00 8048.00 0.00 4024.00 185.01 7.52 172.09 22.08 96.05
14:29:43 cciss/c0d0 0.00 1015.42 0.00 94.53 0.00 8879.60 0.00 4439.80 93.94 4.09 43.27 10.32 97.56
14:29:45 cciss/c0d0 0.00 1015.50 0.00 27.00 0.00 8340.00 0.00 4170.00 308.89 5.11 195.65 36.31 98.05
14:29:47 cciss/c0d0 0.00 873.00 0.00 22.00 0.00 7708.00 0.00 3854.00 350.36 4.30 188.89 44.59 98.10

Note that the write service times are upwards of 40x slower than the other configurations. Note that the read performance numbers are not effected but write performance numbers are.

Diagnostic Steps

Verify that a controller cache configuration related issue isn't present.   Verify that the cache is properly configured on each of the Smart Array  cards.  HP provides a utility, hpacucli, to manage the configuration on the card with which you can check and modify the on-board configuration parameters.

For  example, the following is the output for a P400 w/512MB cache with the  battery backup write cache (BBWC) option.  The on-board write cache is  typically disabled for write use if the battery backup or flash memory  options are not installed on the Smart Array.  Please consult HP for  further details.  However, without the on-board write cache properly  configured, there as substantial write performance penalties.  Please  see the HP configuration and user guides for how to use the hpacucli,  but by way of example:

$ hpacucli

=> controller slot=0 show config detail

Smart Array P400 in Slot 0 (Embedded)
   Bus Interface: PCI
   Slot: 0
   Hardware Revision: Rev D
   Firmware Version: 7.22
   Rebuild Priority: Medium
   Expand Priority: Medium
   Surface Scan Delay: 15 secs
   Surface Scan Mode: Idle
   Wait for Cache Room: Disabled
   Surface Analysis Inconsistency Notification: Disabled
   Post Prompt Timeout: 0 secs
   Cache Board Present: True
   Cache Status: OK
   Accelerator Ratio: 25% Read / 75% Write
   Drive Write Cache: Disabled
   Total Cache Size: 512 MB
   No-Battery Write Cache: Disabled
   Cache Backup Power Source: Batteries
   Battery/Capacitor Count: 1
   Battery/Capacitor Status: OK
   SATA NCQ Supported: True

Verify that cache status is ok, that there is a backup source and that the proper amount of cache for your configuration is dedicated to writes.

**
**

NOTE: in the above configuration, the physical drive write cache has been disabled. This is a common practice with the Smart Array in order to guarantee data integrity. Otherwise there is no guarantee that data written to a device has actually made it onto the platter and is not just being held within the on-drive cache. This protects the data against sudden power loss -- for example the controller writes data to the drive but before the data is moved from drive cache to the platter power loss occurs and the data within the physical drive cache is then lost.

Running something like watcher-cron.bsh to gather baseline statistics during user workloads is highly useful for providing information on resource utilization during customer workloads or testing.

Alternatively, running something like kbs-test.bsh, for example ./kbs-test.bsh /tmp/iotest/testfile.1 4, where /tmp is located on a cciss!c#d# device. This script gathers the same set of files as watcher-cron.bsh does but creates a set of captured stats for each separate test pass.

The resulting performance numbers associated with running a number of dd commands for buffered and direct io writes {see attached kbs-test.tar for exact commands} showed no substantial performance difference between RHEL 5.5 and RHEL5.6 (see attached images cciss-55-writes.png and cciss-56-writes.png).  The test was re-run on RHEL 5.6 after modifying the accerator ration ot 100% reads/ 0% writes for cache. The results was a substantial drop in performance of 3:1 as seen in cciss-56-writes-nocache.png.

Attachments

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments