Why do I lose connectivity when I use an ixgbe network device on a system with 16 or more logical processors?
Release Found: Red Hat Enterprise Linux 5.3
Due to a bug in Red Hat Enterprise Linux 5.3, systems with 16 or more logical processors that use network devices requiring the ixgbe
driver will have intermittent network connectivity or may experience a kernel panic. A solution for this issue has already been delivered in Red Hat Enterprise Linux 5.3 errata kernel 2.6.18-128.1.6 and will be included in all subsequent releases. We strongly recommend immediately upgrading to kernel version 2.6.18-128.1.6 or newer to resolve the problem. Note that the Xen kernel is not affected by this bug (it does not use MSI interrupt mode). It only affects the bare-metal kernel.
Even though a solution has been delivered, the issue can still cause problems for users at install time, due to the fact that the installer will use the non-updated kernel. This article explains how to work around the problem by temporarily disabling MSI-style interrupts which will allow you to use the network connection to install the operating system, access the Red Hat Network (http://rhn.redhat.com) and install the errata kernel. Note that turning off MSI-style interrupts may result in degraded system performance, so we only recommend it for as long as is necessary to install the errata kernel.
There are two ways to solve this problem, depending on whether you are installing a new system or if you are having problems with an existing system. Follow the first set of directions if you are installing a fresh system. Follow the second set of directions if you have encountered this problem on a system where Red Hat Enterprise Linux 5.3 is already installed and you do not wish to reinstall.
Fresh Install
If you have not yet installed the operating system, follow these steps.
1. Boot from the install media and add the option pci=nomsi
to the boot:
line.
2. Install Red Hat Enterprise Linux as you normally would.
3. When the system reboots, the pci=nomsi
boot option will be automatically included on the kernel boot line. No action is needed on the GRUB screen.
4. Register the system with the the Red Hat Network either on the RHN registration screen in firstboot if you are performing a graphical install, or by using the command rhn_register
in a terminal window after logging in.
5. Update the kernel using the command(s) yum update kernel
for non-Xen kernels and/or yum update kernel-xen
for Xen kernels. If you are unsure which kernels are installed, use the command rpm -qa | grep kernel
to find out. In this example, both the Xen and non-Xen kernels are installed on the system and the user would be required to run both yum
commands that were previously mentioned to update both kernels.
[root@system1 ~]# rpm -qa | grep kernel
kernel-xen-2.6.18-128.el5
kernel-2.6.18-128.el5
6. Once the update kernel(s) are installed, edit the /etc/grub.conf
file and remove all occurrences of pci=nomsi
. In the examples below, the first file shows grub.conf
before the edit and the second shows the file after making the necessary changes.
/etc/grub.conf
before editing
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE: You have a /boot partition. This means that
# all kernel and initrd paths are relative to /boot/, eg.
# root (hd0,0)
# kernel /vmlinuz-version ro root=/dev/VolGroup00/LogVol00
# initrd /initrd-version.img
# boot=/dev/sda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Red Hat Enterprise Linux Server (2.6.18-128.1.6.el5xen)
root (hd0,0)
kernel /xen.gz-2.6.18-128.1.6.el5
module /vmlinuz-2.6.18-128.1.6.el5xen ro root=/dev/VolGroup00/LogVol00 pci=nomsi rhgb quiet
module /initrd-2.6.18-128.1.6.el5xen.img
title Red Hat Enterprise Linux Server (2.6.18-128.1.6.el5)
root (hd0,0)
kernel /vmlinuz-2.6.18-128.1.6.el5 ro root=/dev/VolGroup00/LogVol00 pci=nomsi rhgb quiet
initrd /initrd-2.6.18-128.1.6.el5.img
title Red Hat Enterprise Linux Server (2.6.18-128.el5)
root (hd0,0)
kernel /vmlinuz-2.6.18-128.el5 ro root=/dev/VolGroup00/LogVol00 pci=nomsi rhgb quiet
initrd /initrd-2.6.18-128.el5.img
title Red Hat Enterprise Linux Server (2.6.18-128.el5xen)
root (hd0,0)
kernel /xen.gz-2.6.18-128.el5
module /vmlinuz-2.6.18-128.el5xen ro root=/dev/VolGroup00/LogVol00 pci=nomsi rhgb quiet
module /initrd-2.6.18-128.el5xen.img
/etc/grub.conf
after editing
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE: You have a /boot partition. This means that
# all kernel and initrd paths are relative to /boot/, eg.
# root (hd0,0)
# kernel /vmlinuz-version ro root=/dev/VolGroup00/LogVol00
# initrd /initrd-version.img
#boot=/dev/sda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Red Hat Enterprise Linux Server (2.6.18-128.1.6.el5xen)
root (hd0,0)
kernel /xen.gz-2.6.18-128.1.6.el5
module /vmlinuz-2.6.18-128.1.6.el5xen ro root=/dev/VolGroup00/LogVol00 rhgb quiet
module /initrd-2.6.18-128.1.6.el5xen.img
title Red Hat Enterprise Linux Server (2.6.18-128.1.6.el5)
root (hd0,0)
kernel /vmlinuz-2.6.18-128.1.6.el5 ro root=/dev/VolGroup00/LogVol00 rhgb quiet
initrd /initrd-2.6.18-128.1.6.el5.img
title Red Hat Enterprise Linux Server (2.6.18-128.el5)
root (hd0,0)
kernel /vmlinuz-2.6.18-128.el5 ro root=/dev/VolGroup00/LogVol00 rhgb quiet
initrd /initrd-2.6.18-128.el5.img
title Red Hat Enterprise Linux Server (2.6.18-128.el5xen)
root (hd0,0)
kernel /xen.gz-2.6.18-128.el5
module /vmlinuz-2.6.18-128.el5xen ro root=/dev/VolGroup00/LogVol00 rhgb quiet
module /initrd-2.6.18-128.el5xen.img
7. Save the edited file and reboot.
8. When the updated kernel(s) were installed by yum
the newest one that matched the type of the previously booted kernel (either Xen or non-Xen) was set to boot by default. As long as the original -128 kernels are not manually selected on the GRUB screen at boot time, the newer kernel will boot and the network problems will no longer occur.
Fresh Install - PXE Environment
For environments with many servers where PXE and kickstart installations are employed, a much quicker and easier method of installing the errata kernel is available:
1. Add pci=nomsi
to the append line of the PXE boot menu option that you use for your installs.
2. Create a yum repository that contains the necessary kernel file(s) by following these steps:
...1. Copy the errata kernel RPM file(s) to a unique directory on a server that is accessible via httpd (/var/www/html/kernels
for example).
...2. Install the create-repo
package on the server if it is not already installed.
...3. cd
into the directory where you placed your kernels and run the command createrepo -p .
to generate the yum repository files. Do not forget to include the period at the end of the command.
3. Edit the kickstart file that is invoked by the previously edited PXE menu item and add a repo
line that points to the yum repository you just created. The entry should come before the %packages
section and look similar to this example:
repo --name=erratakernels --baseurl=http://myserver.mydomain.com/location_of_errata_kernel_repo
The --name
you give the repo does not matter, but it should be unique throughout your kickstart file.
Now when you install systems, the custom repo will be used automatically and the updated kernels will be installed instead of the 2.6.18-128 kernels. Note that this method of installation assumes the use of an --append
option on the bootloader
line in your kickstart file. Using the --append
option overwrites any boot parameters passed to the system at install time, including pci=nomsi
. If you do not wish to use an --append
option, you will need to remove the pci=nomsi
options from grub.conf
as described above, starting with step six. If you wish to automate the removal procedure, you can do so by adding the following lines to the %post
section of your kickstart file:
sed s/pci=nomsi// /boot/grub/grub.conf > /tmp/grub.new
mv /boot/grub/grub.conf /boot/grub/grub.old
mv /tmp/grub.new /boot/grub/grub.conf
You can use a different tool to remove the option if you like, sed
is just a simple tool that can do the job easily.
Updating Kernel Post-Install
If you have previously installed the operating system without the pci=nomsi
option, you cannot access the network and you do not wish to reinstall the operating system, follow these steps to update the kernel.
1. Reboot the system and halt the boot process when the GRUB screen appears by pressing any arrow key.
2. Use the up and down arrow keys and highlight the kernel you wish to boot from and press the "e"
key to enter edit mode.
3. Scroll down to the kernel
line using the arrow keys and press the "e"
key again.
4. Add the option pci=nomsi
to the end of the line and press "enter"
. The modified boot entry will look similar to this for non-Xen kernels:
title Red Hat Enterprise Linux Server (2.6.18-128.el5)
root (hd0,0)
kernel /vmlinuz-2.6.18-128.el5 ro root=/dev/VolGroup00/LogVol00 quiet rhgb pci=nomsi
initrd /initrd-2.6.18-128.el5.img
or this for Xen kernels:
title Red Hat Enterprise Linux Server (2.6.18-128.el5xen)
root (hd0,0)
kernel /xen.gz-2.6.18-128.el5
module /vmlinuz-2.6.18-128.el5xen ro root=/dev/VolGroup00/LogVol00 rhgb quiet pci=nomsi
module /initrd-2.6.18-128.el5xen.img
5. Press "b"
to boot using the modified GRUB entry line. Note that this change is only temporary. The newly added parameter will not reappear the next time the system is booted.
6. When the system has booted, register it with the Red Hat Network (if you have not done so already) and update your kernel(s) using the directions found in steps four and five of the Fresh Install section at the top of the article. You do not need to perform step six from the Fresh Install section, as the pci=nomsi
option will not be placed in /etc/grub.conf
here.
7. Once the update kernel(s) are installed, reboot the system. The newer kernel that matches the same type as your previous default kernel (either Xen or non-Xen) will automatically be selected as the boot kernel.
Note: A kernel update can also be delivered to systems post-install by downloading the necessary kernel on a machine unaffected by this bug, copying the kernel to the affected machine via CD, memory key or other non-networked method, and then installing the kernel with the command rpm -ivh kernelname
. Once the machine is rebooted with the new kernel, the network devices will work as expected.
Comments