Cluster node was rebooted by itself after 'ethX: firmware hang detected' messages
Issue
-
One of the node in cluster was rebooted by itself after the following
ethX: firmware hang detectederror messages in logs:Oct 11 19:49:17 node2 kernel: eth1: firmware hang detected Oct 11 19:49:17 node2 kernel: eth2: firmware hang detected Oct 11 19:49:17 node2 kernel: eth3: firmware hang detected Oct 11 19:49:17 node2 kernel: eth0: firmware hang detected Oct 11 19:49:17 node2 kernel: bonding: bond1: link status definitely down for interface eth1, disabling it Oct 11 19:49:17 node2 kernel: bonding: bond1: now running without any active interface ! Oct 11 19:49:17 node2 kernel: bonding: bond0: link status definitely down for interface eth0, disabling it Oct 11 19:49:17 node2 kernel: bonding: bond0: now running without any active interface ! Oct 11 19:49:17 node2 kernel: bonding: bond0: link status definitely down for interface eth3, disabling it Oct 11 19:49:17 node2 firmware_helper[30774]: Loading of /lib/firmware/phanfw.bin for netxen_nic driver failed: No such file or directory Oct 11 19:49:17 node2 firmware_helper[30777]: Loading of /lib/firmware/nx3fwct.bin for netxen_nic driver failed: No such file or directory Oct 11 19:49:20 node2 kernel: netxen_nic 0000:04:00.0: loading firmware from flash Oct 11 19:49:24 node2 openais[29572]: [TOTEM] The token was lost in the OPERATIONAL state. [...] Oct 11 19:51:21 node2 syslogd 1.4.1: restart. Oct 11 19:51:21 node2 kernel: klogd 1.4.1, log source = /proc/kmsg started. Oct 11 19:51:21 node2 kernel: : Current: sense key: Not Ready Oct 11 19:51:21 node2 kernel: Add. Sense: Logical unit not ready, manual intervention required [...]
Environment
- Red Hat Enterprise Linux Server 5 (with the High Availability and Resilient Storage Add Ons)
- Red Hat Enterprise Linux Server 6 (with the High Availability and Resilient Storage Add Ons)
- HP Hardware
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.