Loss of disk access after smartpqi driver resets scsi bus; results in "Medium access timeout failure. Offlining disk!",

Solution Verified - Updated -

Issue

  • smartpqi driver logs following, offlined disk is no longer accessible

    Jan  6 01:50:48 hostname kernel: smartpqi 0000:5c:00.0: resetting scsi 14:1:0:1
    Jan  6 01:50:48 hostname kernel: smartpqi 0000:5c:00.0: reset of scsi 14:1:0:1: SUCCESS
    Jan  6 01:50:48 hostname kernel: sd 14:1:0:1: [sdb] Medium access timeout failure. Offlining disk!
    Jan  6 01:50:48 hostname kernel: sd 14:1:0:1: Device offlined - not ready after error recovery
    :
    
  • Why did a device which is in a hardware RAID1 stopped being accessible when one leg of the RAID had a failure?

  • After one of the legs in a hardware RAID1 got a failure, the corresponding device on the system wasn't accessible anymore
  • An error in one leg of a hardware RAID1 should be transparent to the operating system, why did the device stop being accessible?

    Jun  4 06:02:08 localhost kernel: smartpqi 0000:12:00.0: resetting scsi 1:1:0:2
    Jun  4 06:02:08 localhost kernel: smartpqi 0000:12:00.0: reset of scsi 1:1:0:2: SUCCESS
    Jun  4 06:02:08 localhost kernel: sd 1:1:0:2: [sdc] Medium access timeout failure. Offlining disk!
    Jun  4 06:02:08 localhost kernel: sd 1:1:0:2: Device offlined - not ready after error recovery
    :
    Jun  4 06:02:10 localhost kernel: XFS (sdc1): metadata I/O error: block 0x6fd0f0d0 ("xlog_iodone") error 5 numblks 64
    Jun  4 06:02:10 localhost kernel: XFS (sdc1): xfs_do_force_shutdown(0x2) called from line 1200 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffc02f7ea0
    Jun  4 06:02:10 localhost kernel: XFS (sdc1): Log I/O Error Detected.  Shutting down filesystem
    Jun  4 06:02:10 localhost kernel: XFS (sdc1): Please umount the filesystem and rectify the problem(s)
    
  • RAID controllers having devices go offline for no apparent reason.

  • After a command timeout a smartpqi device is offlined after the driver performs a reset
  • After command timeout smartpqi provided device is offlined after drivers eh performs a reset.

Environment

  • Red Hat Enterprise Linux 6
  • Red Hat Enterprise Linux 7
  • smartpqi driver

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content