One node of Oracle RAC reboots often with the error "o2hb_write_timeout:172 ERROR: Heartbeat write timeout to device dm-5 after 60000 milliseconds"

Solution Verified - Updated -

Issue

  • One node of the two node Oracle RAC cluster is getting rebooted frequently. Following messages appear in the logs just before reboot:

    Feb 19 06:16:19 hostname kernel: qla2xxx 0000:05:00.0: scsi(4:0:6): Abort command issued -- 1 1991d49 2002.
    Feb 19 06:22:29 hostname kernel: qla2xxx 0000:05:00.0: scsi(4:1:7): Abort command issued -- 1 19b25d0 2002.
    Feb 19 06:25:01 hostname kernel: (events/5,55,5):o2hb_write_timeout:172 ERROR: Heartbeat write timeout to device dm-5 after 60000 milliseconds
    

Environment

  • Red Hat Enterprise Linux 5.5
  • Oracle RAC cluster
  • QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA
  • Ocfs2 filesystem

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.