Elastic Network Adapter

Latest response

With the release of RHEL 7.4, Red Hat updated the kernel-drivers to include support for AWS's Elastic Network Adapter. Red Hat also updated their published AMIs to support this capability.

Previously, when SRIOV-enabling a Red Hat AMI so that the ixgbevf drivers could support up to 10Gbps of throughput, one could still query the NIC (e.g., ethtool eth0) for supported speeds. When ENA is enabled, I can see that the ena-driver is in use:

# ethtool -i eth0
driver: ena
version: 1.0.2
firmware-version:
expansion-rom-version:
bus-info: 0000:00:03.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

However, when trying to query for speed, I just get an error:

# ethtool eth0
Settings for eth0:
Cannot get device settings: Operation not permitted
Cannot get wake-on-lan settings: Operation not permitted
        Current message level: 0x000004e3 (1251)
                               drv probe ifup rx_err tx_err tx_done
        Link detected: yes

Other than, "eth0 is bound to ena so you know the speeds are available", is there any way to probe for the actual NIC-speed?

Responses

I found the source code of the ena driver here and did some light reading.

The code that provides the speed information is in ena_ethtool.c. It basically calls the OS-independent part of the ena driver, which just sends a ENA_ADMIN_GET_FEATURE opcode with parameter ENA_ADMIN_LINK_CONFIG to whatever part of the networking architecture runs at the hypervisor side, and then gets back a response full of juicy details about NIC link configuration.

Unfortunately, the speed reporting was added in ena driver version 1.1.2, so your version 1.0.2 will not have it yet. So, unless you're willing to roll your own backport of the ena driver, you'll need to wait for driver version 1.1.2 or later to hit RHEL.

According to the commit notes, 1.1.2 was the next release after 1.0.2, so it seems likely that as soon as the RHEL driver gets updated for any reason, the speed reporting will be included in that update.

Looking to avoid having to go the KMS route, so, have to wait or Red Hat to update the native driver. :-\

You may want to submit an RFE through ELRepo's bug tracker. There is no guarantee that a newer version builds without an issue, but it's worth a try.

ena will update to v1.2.0 in RHEL 7.5. The work is already done on Private Bug 1478896.

Version 1.4.0 of the kmod-ena package has been released to the elrepo-testing repository and is now syncing to mirrors. It will eventually appear here:

http://elrepo.org/linux/testing/el7/x86_64/RPMS/kmod-ena-1.4.0-1.el7.elrepo.x86_64.rpm

Please test and see if it works.

Just for grins (our production systems don't use RPMS from other than vendor or internally-originated repositories), I spun up a C5 instance-type and installed the RPM. Made sure that the components listed in the readme existed and rebooted. While the RPM-packaged components are present and match what the README indicates should be in place.

After instance tried to come back from reboot, the instance got "stuck" because it could no longer query the AWS metadata URL. Usually when I've seen this, it's because the interface has gotten broken - either due to straight-up driver breakage or the primary interface's index-number changing.

Note: used the Red Hat AMI ami-c998b6b2 in the us-east-1 region to test against.

On the system where the rpm package has been installed, could you show us the output of the following command?

ls -l find /lib/modules -name ena.ko

Also, does the lsmod command indicate that the ena module has been loaded? If not, can you try running a "modprobe ena" command?

Inherent problem is that installing the RPM and rebooting resulted in a broken EC2 instance (which I nuked). This means that the instance isn't reachable and will require a recovery-mount to a parallel instance.

I'll try to set aside time to re-create the issue and do the recovery-steps to get access to the files, but that's likely to be a bit.

Update: Haven't had a chance to do a recovery - just did a basic UserData launch with the RPM installed and the repo enabled followed by a reboot. The EC2 boot logs (available via the EC2 web-console and CLI) showed the following in the logged boot output:

[    4.288140] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1
[    4.345064] Request for unknown module key 'The ELRepo Project (http://elrepo.org): ELRepo.org Secure Boot Key: f365ad3481a7b20e3427b61b2a26635b83fe427b' err -11
[    4.345069] ena: loading out-of-tree module taints kernel.
[    4.345110] ena: module verification failed: signature and/or required key missing - tainting kernel
[    4.345187] ena: Unknown symbol ena_sysfs_terminate (err 0)
[    4.345235] ena: Unknown symbol ena_sysfs_init (err 0)

Won't have time, for another several hours, to do any recovery ops so I can actually look at the dead EC2's filesystem.