'multipathd' and disk checker recognizes failed disks later than Oracle ASM
Issue
-
We have recognized that the
multipathd
daemon and the scsi path checker for checking LUNs on an EMC Symmetrix storage box recognizes failed or unresponsive LUNs later than the Oracle RAC ASM volume manager.The Oracle RAC ASM seems to deactivate disks which are unresponsive for 15 seconds. But the
multipathd
or the SCSItur
checker seems to recognize a unresponsive disk or a scsi path after 60 seconds. This leads to the situation that the Oracle ASM deactivates disks even if they seem fine from the OS. This results in following error messages in database logs:WARNING: Waited 15 secs for write IO to PST disk 0 in group 3. WARNING: Waited 15 secs for write IO to PST disk 0 in group 3. WARNING: Waited 15 secs for write IO to PST disk 1 in group 8. WARNING: Waited 15 secs for write IO to PST disk 1 in group 8. Fri Oct 17 21:40:56 2014 NOTE: process _b000_+asm1 (30427) initiating offline of disk 0.3915928412 (DATA1_0000) with mask 0x7e in group 3 NOTE: checking PST: grp = 3 Fri Oct 17 21:40:56 2014 NOTE: process _b001_+asm1 (30429) initiating offline of disk 1.3915928451 (DATA2_0001) with mask 0x7e in group 8 NOTE: checking PST: grp = 8
-
Is there any possibility to change the checking parameter for unresponsive LUNs and path, so that we recognize unresponsive disks earlier that the Oracle RAC ARM volume manager.
Environment
- Red Hat Enterprise Linux (RHEL) 6, 7, 8
- Oracle ASM
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.