After changing the VLAN ID, all the interfaces including bond0 became unreachable and we were not able to ping and ssh bond0 IP from outside.

Solution Verified - Updated -

Environment

RHEL 6.9
kernel 2.6.32-696.13.2.el6.x86_64

Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01), fw 4.2.401.6

Issue

After changing the VLAN ID from 704 to 905(bond0.704 --> bond0.905), all the interfaces including bond0 became unreachable and we were not able to ping and ssh bond0 IP from outside.

Also, seeing below error in the console at the time of issue:

kernel: be2net 0000:04:00.1: Error detected in the adapter

Resolution

Check HW of the card with HW vendor

Upgrade kernel to latest version available.

Root Cause

Not confirmed by customer, but there were two possible reasons identified:

1) Check the HW health status, possibility of FW upgrades, and if needed perform replacement and/or FW update.
Check whether the issue is still present after this action point.

2) If point 1 does not resolve the issue, perform upgrade of the system to the version of kernel containing the fix, as described above.
Check whether the issue is still present after this action point.

Customer reverted with information that the issue is gone and case was closed, but no confirmation which of these steps helped with the issue.

There were following types of error repeating in the log:

kernel: be2net 0000:04:00.0: Error detected in the adapter
kernel: be2net 0000:04:00.0: UE: ERX bit set
kernel: be2net 0000:04:00.1: Error detected in the adapter
kernel: be2net 0000:04:00.1: UE: ERX bit set

and repeating.

As both interfaces are on the same physical NIC, HW issue can explain why in such a situation any connection to the machine is failing.

Aside that, there was a bug related to the kernel version and UE detection logic, which is fixed in RHEL10 kernel (https://bugzilla.redhat.com/show_bug.cgi?id=1437991)

[netdrv] be2net: Fix UE detection logic for BE3

O-Subject: [RHEL6.10 PATCH] be2net: Fix UE detection logic for BE3
Bugzilla: 1437991

BZ#1437991 - Emulex be2net inbox driver delays the failover in bond after getting unrecoverable error in adapter.

Description:
The patch solves a problem with long failover time when UE
(unrecoverable error) occurs in firmware. For certain chipsets (BE3) the
UEs are not handled properly, the carrier remains on and failover is not
done.

It's for different issue, but the same kind of issue can affect the situation, too. Fix for the bugzilla was released under https://bugzilla.redhat.com/show_bug.cgi?id=1437991 in kernel-2.6.32-754.el6.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.