"megaraid_sas: FW detected to be in fault state, restarting it..." in logs following read-only event or server crash.

Solution Verified - Updated -

Issue

  • Server crashed with the following message seen in the vmcore logs:
crash> log
[...]
NMI received for unknown reason 3c
CPU 0 
Modules linked in: vxodm(PFU) nfsd auth_rpcgss autofs4 smbus(U) ipmi_devintf ipmi_si ipmi_msghandler nfs nfs_acl dmpjbod(PU) dmpap(PU) dmpaa(PU) dmpalua(PU) vxspec(PFU) vxio(PFU) vxdmp(PU) lockd sunrpc be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i libcxgbi cxgb3 libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi vxportal(PFU) fdd(PFU) vxfs(PU) exportfs dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport tpm_infineon sr_mod cdrom igb(U) sg pcspkr i2c_i801 i2c_core 8021q dca tpm_tis tpm tpm_bios dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod lpfc scsi_transport_fc ata_piix libata shpchp megaraid_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 0, comm: swapper Tainted: PF    ---- 2.6.18-274.12.1.el5 #1
RIP: 0010:[<ffffffff8006b9bf>]  [<ffffffff8006b9bf>] mwait_idle_with_hints+0x66/0x67
RSP: 0018:ffffffff8045df88  EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffffffff80056c7e RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000090000 R08: ffffffff8045c000 R09: 0000000000000028
R10: ffff81407ff90368 R11: 0000000000000206 R12: 000000007901394c
R13: 000000000000001f R14: 0000000000075000 R15: fffffffff00000c6
FS:  0000000000000000(0000) GS:ffffffff8042c000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000009ef168 CR3: 0000003e61c9e000 CR4: 00000000000006a0
Process swapper (pid: 0, threadinfo ffffffff8045c000, task ffffffff80315b60)
Stack:  ffffffff80056c8a ffffffff80048fe2 0000000000200800 ffffffff80467809
 0000000000090000 000000007901394c ffffffff804b6740 ffffffff8046722f
 80008e000010019c 00000000ffffffff 0000000000000000 0000000000000000
Call Trace:
 [<ffffffff80056c8a>] mwait_idle+0xc/0x20
 [<ffffffff80048fe2>] cpu_idle+0x95/0xb8
 [<ffffffff80467809>] start_kernel+0x220/0x225
 [<ffffffff8046722f>] _sinittext+0x22f/0x236


Code: c3 41 57 41 56 49 89 f6 41 55 49 89 fd 41 54 4c 8d a7 e0 02
  • Filesystem encountered read-only event with following in logs:
 megasas: moving cmd[95]:ffff81407fd29240:0:ffff813d19c82b40 on the defer queue as internal reset in progress.
 megaraid_sas: FW detected to be in fault state, restarting it...
 megaraid_sas: FW was restarted successfully, initiating next stage...
 megaraid_sas: HBA recovery state machine, state 2 starting...
 megasas: Waiting for FW to come to ready state
 megasas: FW in FAULT state!!
 FW state [-268435456] hasn't changed in 180 secs
 megaraid_sas: out: controller is not in ready state
 megasas: waiting_for_outstanding: after issue OCR. 
 megasas: waiting_for_outstanding: before issue OCR. FW state = f0000000
 megasas: moving cmd[0]:ffff8130800f3340:0:ffff810ca2edd500 on the defer queue as internal reset in progress.
 megaraid_sas: ERROR while moving this cmd:ffff8130800f3340, 0 ffff810ca2edd500, it was discovered on some list?
 sd 0:2:0:0: timing out command, waited 360s
 sd 0:2:0:0: Unhandled error code
 sd 0:2:0:0: SCSI error: return code = 0x06000000
 Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT,SUGGEST_OK
 Buffer I/O error on device sda3, logical block 585840
 lost page write due to I/O error on sda3
 sd 0:2:0:0: timing out command, waited 360s
 sd 0:2:0:0: rejecting I/O to offline device
 Buffer I/O error on device sda3, logical block 585840
 lost page write due to I/O error on sda3

Environment

  • Red Hat Enterprise Linux 5
  • Megaraid Storage Controller

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content