Oracle cluster nodes getting rebooted frequently after 'lpfc' driver errors
Issue
-
Oracle cluster nodes getting rebooted frequently after the following errors from
lpfc
driver:kernel: lpfc 0000:41:00.2: 0:(0):0748 abort handler timed out waiting for aborting I/O (xri:x415) to complete: ret 0x2003, ID 0, LUN 34 kernel: [1255741.429100] lpfc 0000:41:00.3: 1:(0):0748 abort handler timed out waiting for aborting I/O (xri:xb98) to complete: ret 0x2003, ID 0, LUN 34 kernel: lpfc 0000:41:00.3: 1:(0):0748 abort handler timed out waiting for aborting I/O (xri:xb98) to complete: ret 0x2003, ID 0, LUN 34 [...] kernel: [131073.606937] OKSK-00008: Cluster Membership Change starting - Incarnation 9. kernel: [131073.641251] OKSK-00023: Cluster membership node list: kernel: OKSK-00008: Cluster Membership Change starting - Incarnation 9. kernel: OKSK-00023: Cluster membership node list: kernel: OKSK-00024: Node 1 (Interconnect address: 192.168.2.2) node1 kernel: [131073.666454] OKSK-00024: Node 1 (Interconnect address: 192.168.2.2) node1 kernel: [131073.702545] OKSK-00025: Cluster membership node count: 1, Local Node Number: 1, Rebuild Master: 1 kernel: OKSK-00025: Cluster membership node count: 1, Local Node Number: 1, Rebuild Master: 1 kernel: [131073.744333] ADVMK-0013: Cluster reconfiguration started. kernel: ADVMK-0013: Cluster reconfiguration started. [...]
Environment
- Red Hat Enterprise Linux 7.6
- kernel-3.10.0-957.el7
- Emulex FC/FCoE HBAs using SLI-4 interface
- Oracle RAC
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.