Why do I lose connectivity when I use an ixgbe network device on a system with 16 or more logical processors?

Updated -

Release Found: Red Hat Enterprise Linux 5.3

Due to a bug in Red Hat Enterprise Linux 5.3, systems with 16 or more logical processors that use network devices requiring the ixgbe driver will have intermittent network connectivity or may experience a kernel panic. A solution for this issue has already been delivered in Red Hat Enterprise Linux 5.3 errata kernel 2.6.18-128.1.6 and will be included in all subsequent releases. We strongly recommend immediately upgrading to kernel version 2.6.18-128.1.6 or newer to resolve the problem. Note that the Xen kernel is not affected by this bug (it does not use MSI interrupt mode). It only affects the bare-metal kernel.

Even though a solution has been delivered, the issue can still cause problems for users at install time, due to the fact that the installer will use the non-updated kernel. This article explains how to work around the problem by temporarily disabling MSI-style interrupts which will allow you to use the network connection to install the operating system,  access the Red Hat Network (http://rhn.redhat.com) and install the errata kernel. Note that turning off MSI-style interrupts may result in degraded system performance, so we only recommend it for as long as is necessary to install the errata kernel.

There are two ways to solve this problem, depending on whether you are installing a new system or if you are having problems with an existing system. Follow the first set of directions if you are installing a fresh system. Follow the second set of directions if you have encountered this problem on a system where Red Hat Enterprise Linux 5.3 is already installed and you do not wish to reinstall.

Fresh Install

If you have not yet installed the operating system, follow these steps.

1. Boot from the install media and add the option pci=nomsi to the boot: line.

2. Install Red Hat Enterprise Linux as you normally would.

3. When the system reboots, the pci=nomsi boot option will be automatically included on the kernel boot line. No action is needed on the GRUB screen.

4. Register the system with the the Red Hat Network either on the RHN registration screen in firstboot if you are performing a graphical install, or by using the command rhn_register in a terminal window after logging in.

5. Update the kernel using the command(s) yum update kernel for non-Xen kernels and/or yum update kernel-xen for Xen kernels. If you are unsure which kernels are installed, use the command  rpm -qa | grep kernel to find out. In this example, both the Xen and non-Xen kernels are installed on the system and the user would be required to run both yum commands that were previously mentioned to update both kernels.

[root@system1 ~]# rpm -qa | grep kernel
kernel-xen-2.6.18-128.el5
kernel-2.6.18-128.el5

6. Once the update kernel(s) are installed, edit the /etc/grub.conf file and remove all occurrences of pci=nomsi. In the examples below, the first file shows grub.conf before the edit and the second shows the file after making the necessary changes.

/etc/grub.conf before editing

# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/VolGroup00/LogVol00
#          initrd /initrd-version.img
# boot=/dev/sda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Red Hat Enterprise Linux Server (2.6.18-128.1.6.el5xen)
        root (hd0,0)
        kernel /xen.gz-2.6.18-128.1.6.el5
        module /vmlinuz-2.6.18-128.1.6.el5xen ro root=/dev/VolGroup00/LogVol00 pci=nomsi rhgb quiet
        module /initrd-2.6.18-128.1.6.el5xen.img
title Red Hat Enterprise Linux Server (2.6.18-128.1.6.el5)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-128.1.6.el5 ro root=/dev/VolGroup00/LogVol00 pci=nomsi rhgb quiet
        initrd /initrd-2.6.18-128.1.6.el5.img
title Red Hat Enterprise Linux Server (2.6.18-128.el5)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-128.el5 ro root=/dev/VolGroup00/LogVol00 pci=nomsi rhgb quiet
        initrd /initrd-2.6.18-128.el5.img
title Red Hat Enterprise Linux Server (2.6.18-128.el5xen)
        root (hd0,0)
        kernel /xen.gz-2.6.18-128.el5
        module /vmlinuz-2.6.18-128.el5xen ro root=/dev/VolGroup00/LogVol00 pci=nomsi rhgb quiet
        module /initrd-2.6.18-128.el5xen.img

/etc/grub.conf after editing

# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/VolGroup00/LogVol00
#          initrd /initrd-version.img
#boot=/dev/sda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Red Hat Enterprise Linux Server (2.6.18-128.1.6.el5xen)
        root (hd0,0)
        kernel /xen.gz-2.6.18-128.1.6.el5
        module /vmlinuz-2.6.18-128.1.6.el5xen ro root=/dev/VolGroup00/LogVol00 rhgb quiet
        module /initrd-2.6.18-128.1.6.el5xen.img
title Red Hat Enterprise Linux Server (2.6.18-128.1.6.el5)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-128.1.6.el5 ro root=/dev/VolGroup00/LogVol00 rhgb quiet
        initrd /initrd-2.6.18-128.1.6.el5.img
title Red Hat Enterprise Linux Server (2.6.18-128.el5)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-128.el5 ro root=/dev/VolGroup00/LogVol00 rhgb quiet
        initrd /initrd-2.6.18-128.el5.img
title Red Hat Enterprise Linux Server (2.6.18-128.el5xen)
        root (hd0,0)
        kernel /xen.gz-2.6.18-128.el5
        module /vmlinuz-2.6.18-128.el5xen ro root=/dev/VolGroup00/LogVol00 rhgb quiet
        module /initrd-2.6.18-128.el5xen.img

7. Save the edited file and reboot.

8. When the updated kernel(s) were installed by yum the newest one that matched the type of the previously booted kernel (either Xen or non-Xen) was set to boot by default. As long as the original -128 kernels are not manually selected on the GRUB screen at boot time, the newer kernel will boot and the network problems will no longer occur.

Fresh Install - PXE Environment

For environments with many servers where PXE and kickstart installations are employed, a much quicker and easier method of installing the errata kernel is available:

1. Add pci=nomsi to the append line of the PXE boot menu option that you use for your installs.

2. Create a yum repository that contains the necessary kernel file(s) by following these steps:

...1. Copy the errata kernel RPM file(s) to a unique directory on a server that is accessible via httpd (/var/www/html/kernels for example).

...2. Install the create-repo package on the server if it is not already installed.

...3. cd into the directory where you placed your kernels and run the command createrepo -p . to generate the yum repository files. Do not forget to include the period at the end of the command.

3. Edit the kickstart file that is invoked by the previously edited PXE menu item and add a repo line that points to the yum repository you just created. The entry should come before the %packagessection and look similar to this example:

repo --name=erratakernels --baseurl=http://myserver.mydomain.com/location_of_errata_kernel_repo

The --name you give the repo does not matter, but it should be unique throughout your kickstart file.

Now when you install systems, the custom repo will be used automatically and the updated kernels will be installed instead of the 2.6.18-128 kernels. Note that this method of installation assumes the use of an --append option on the bootloader line in your kickstart file. Using the --append option overwrites any boot parameters passed to the system at install time, including pci=nomsi. If you do not wish to use an --append option, you will need to remove the pci=nomsi options from grub.conf as described above, starting with step six. If you wish to automate the removal procedure, you can do so by adding the following lines to the %post section of your kickstart file:

sed s/pci=nomsi// /boot/grub/grub.conf > /tmp/grub.new
mv /boot/grub/grub.conf /boot/grub/grub.old
mv /tmp/grub.new /boot/grub/grub.conf

You can use a different tool to remove the option if you like, sed is just a simple tool that can do the job easily.

Updating Kernel Post-Install

If you have previously installed the operating system without the pci=nomsi option, you cannot access the network and you do not wish to reinstall the operating system, follow these steps to update the kernel.

1. Reboot the system and halt the boot process when the GRUB screen appears by pressing any arrow key.

2. Use the up and down arrow keys and highlight the kernel you wish to boot from and press the "e" key to enter edit mode.

3. Scroll down to the kernel line using the arrow keys and press the "e" key again.

4. Add the option pci=nomsi to the end of the line and press "enter". The modified boot entry will look similar to this for non-Xen kernels:

title Red Hat Enterprise Linux Server (2.6.18-128.el5)
     root (hd0,0)
     kernel /vmlinuz-2.6.18-128.el5 ro root=/dev/VolGroup00/LogVol00 quiet rhgb pci=nomsi
     initrd /initrd-2.6.18-128.el5.img

or this for Xen kernels:

title Red Hat Enterprise Linux Server (2.6.18-128.el5xen)
        root (hd0,0)
        kernel /xen.gz-2.6.18-128.el5
        module /vmlinuz-2.6.18-128.el5xen ro root=/dev/VolGroup00/LogVol00 rhgb quiet pci=nomsi
        module /initrd-2.6.18-128.el5xen.img

5. Press "b" to boot using the modified GRUB entry line. Note that this change is only temporary. The newly added parameter will not reappear the next time the system is booted.

6. When the system has booted, register it with the Red Hat Network (if you have not done so already) and update your kernel(s) using the directions found in steps four and five of the Fresh Install section at the top of the article. You do not need to perform step six from the Fresh Install section, as the pci=nomsi option will not be placed in /etc/grub.conf here.

7. Once the update kernel(s) are installed, reboot the system. The newer kernel that matches the same type as your previous default kernel (either Xen or non-Xen) will automatically be selected as the boot kernel.

Note: A kernel update can also be delivered to systems post-install by downloading the necessary kernel on a machine unaffected by this bug, copying the kernel to the affected machine via CD, memory key or other non-networked method, and then installing the kernel with the command rpm -ivh kernelname. Once the machine is rebooted with the new kernel, the network devices will work as expected.

Comments