Loss of disk access after smartpqi driver resets scsi bus; results in "Medium access timeout failure. Offlining disk!",

Solution Verified - Updated -

Issue

  • smartpqi driver logs following, offlined disk is no longer accessible

    Jan  6 01:50:48 hostname kernel: smartpqi 0000:5c:00.0: resetting scsi 14:1:0:1
    Jan  6 01:50:48 hostname kernel: smartpqi 0000:5c:00.0: reset of scsi 14:1:0:1: SUCCESS
    Jan  6 01:50:48 hostname kernel: sd 14:1:0:1: [sdb] Medium access timeout failure. Offlining disk!
    Jan  6 01:50:48 hostname kernel: sd 14:1:0:1: Device offlined - not ready after error recovery
    :
    
  • Why did a device which is in a hardware RAID1 stopped being accessible when one leg of the RAID had a failure?

  • After one of the legs in a hardware RAID1 got a failure, the corresponding device on the system wasn't accessible anymore
  • An error in one leg of a hardware RAID1 should be transparent to the operating system, why did the device stop being accessible?

    Jun  4 06:02:08 localhost kernel: smartpqi 0000:12:00.0: resetting scsi 1:1:0:2
    Jun  4 06:02:08 localhost kernel: smartpqi 0000:12:00.0: reset of scsi 1:1:0:2: SUCCESS
    Jun  4 06:02:08 localhost kernel: sd 1:1:0:2: [sdc] Medium access timeout failure. Offlining disk!
    Jun  4 06:02:08 localhost kernel: sd 1:1:0:2: Device offlined - not ready after error recovery
    :
    Jun  4 06:02:10 localhost kernel: XFS (sdc1): metadata I/O error: block 0x6fd0f0d0 ("xlog_iodone") error 5 numblks 64
    Jun  4 06:02:10 localhost kernel: XFS (sdc1): xfs_do_force_shutdown(0x2) called from line 1200 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffc02f7ea0
    Jun  4 06:02:10 localhost kernel: XFS (sdc1): Log I/O Error Detected.  Shutting down filesystem
    Jun  4 06:02:10 localhost kernel: XFS (sdc1): Please umount the filesystem and rectify the problem(s)
    
  • RAID controllers having devices go offline for no apparent reason.

  • After a command timeout a smartpqi device is offlined after the driver performs a reset
  • After command timeout smartpqi provided device is offlined after drivers eh performs a reset.

Environment

  • Red Hat Enterprise Linux 6
  • Red Hat Enterprise Linux 7
  • smartpqi driver

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In