Oracle cluster nodes getting rebooted frequently after 'lpfc' driver errors

Solution Verified - Updated -

Issue

  • Oracle cluster nodes getting rebooted frequently after the following errors from lpfc driver:

    kernel: lpfc 0000:41:00.2: 0:(0):0748 abort handler timed out waiting for aborting I/O (xri:x415) to complete: ret 0x2003, ID 0, LUN 34
    kernel: [1255741.429100] lpfc 0000:41:00.3: 1:(0):0748 abort handler timed out waiting for aborting I/O (xri:xb98) to complete: ret 0x2003, ID 0, LUN 34
    kernel: lpfc 0000:41:00.3: 1:(0):0748 abort handler timed out waiting for aborting I/O (xri:xb98) to complete: ret 0x2003, ID 0, LUN 34
    [...]
    kernel: [131073.606937] OKSK-00008: Cluster Membership Change starting - Incarnation 9.
    kernel: [131073.641251] OKSK-00023: Cluster membership node list:
    kernel: OKSK-00008: Cluster Membership Change starting - Incarnation 9.
    kernel: OKSK-00023: Cluster membership node list:
    kernel: OKSK-00024:   Node 1 (Interconnect address: 192.168.2.2) node1
    kernel: [131073.666454] OKSK-00024:   Node 1 (Interconnect address: 192.168.2.2) node1
    kernel: [131073.702545] OKSK-00025: Cluster membership node count: 1, Local Node Number: 1, Rebuild Master: 1
    kernel: OKSK-00025: Cluster membership node count: 1, Local Node Number: 1, Rebuild Master: 1
    kernel: [131073.744333] ADVMK-0013: Cluster reconfiguration started.
    kernel: ADVMK-0013: Cluster reconfiguration started.
    [...]
    

Environment

  • Red Hat Enterprise Linux 7.6
  • kernel-3.10.0-957.el7
  • Emulex FC/FCoE HBAs using SLI-4 interface
  • Oracle RAC

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content