'multipathd' and disk checker recognizes failed disks later than Oracle ASM

Solution Verified - Updated -

Issue

  • We have recognized that the multipathd daemon and the scsi path checker for checking LUNs on an EMC Symmetrix storage box recognizes failed or unresponsive LUNs later than the Oracle RAC ASM volume manager.

    The Oracle RAC ASM seems to deactivate disks which are unresponsive for 15 seconds. But the multipathd or the SCSI tur checker seems to recognize a unresponsive disk or a scsi path after 60 seconds. This leads to the situation that the Oracle ASM deactivates disks even if they seem fine from the OS. This results in following error messages in database logs:

    WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
    WARNING: Waited 15 secs for write IO to PST disk 1 in group 8.
    WARNING: Waited 15 secs for write IO to PST disk 1 in group 8.
    Fri Oct 17 21:40:56 2014
    NOTE: process _b000_+asm1 (30427) initiating offline of disk 0.3915928412 (DATA1_0000) with mask 0x7e in group 3
    NOTE: checking PST: grp = 3
    Fri Oct 17 21:40:56 2014
    NOTE: process _b001_+asm1 (30429) initiating offline of disk 1.3915928451 (DATA2_0001) with mask 0x7e in group 8
    NOTE: checking PST: grp = 8
    
  • Is there any possibility to change the checking parameter for unresponsive LUNs and path, so that we recognize unresponsive disks earlier that the Oracle RAC ARM volume manager.

Environment

  • Red Hat Enterprise Linux (RHEL) 6, 7, 8
  • Oracle ASM

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content