5.13. kernel

The Kernel
  • NUMA class systems should not be booted with a single memory node configuration. Configuration of single node NUMA systems will result in contention for the memory resources on all of the non-local memory nodes. As only one node will have local memory the CPUs on that single node will starve the remaining CPUs for memory allocations, locks, and any kernel data structure access. This contention will lead to the "CPU#n stuck for 10s!" error messages. This configuration can also result in NMI watchdog timeout panics if a spinlock is acquired via spinlock_irq() and held for more than 60 seconds. The system can also hang for indeterminate lengths of time
    To minimize this problem NUMA class systems need to have their memory evenly distributed between nodes. NUMA information can be obtained from dmesg output as well as from the numastat command.
  • When upgrading from Red Hat Enterprise Linux 5.0, 5.1 or 5.2 to more recent releases, the gfs2-kmod may still be installed on the system. This package must be manually removed or it will override the (newer) version of GFS2 which is built into the kernel. Do not install the gfs2-kmod package on later versions of Red Hat Enterprise Linux. gfs2-kmod is not required since GFS2 is built into the kernel from 5.3 onwards. The content of the gfs2-kmod package is considered a Technology Preview of GFS2, and has not received any updates since Red Hat Enterprise Linux 5.3 was released.
    Note that this note only applies to GFS2 and not to GFS, for which the gfs-kmod package continues to be the only method of obtaining the required kernel module.
  • Issues might be encountered on a system with 8Gb/s LPe1200x HBAs and firmware version 2.00a3 when the Red Hat Enterprise Linux 5.6 kernel is used with the in-box LPFC driver. Such issues include loss of LUNs and/or fiber channel host hangs during fabric faults with multipathing.
    To work around these issues, it is recommended to either:
    • Downgrade the firmware revision of the 8Gb/s LPe1200x HBA to revision 1.11a5, or
    • Modify the LPFC driver’s lpfc_enable_npiv module parameter to zero.
      When loading the LPFC driver from the initrd image (i.e. at system boot time), add the line
      options lpfc_enable_npiv=0
      
      to /etc/modprobe.conf and re-build the initrd image.
      When loading the LPFC driver dynamically, include the lpfc_enable_npiv=0 option in the insmod or modprobe command line.
    For additional information on how to set the LPFC driver module parameters, refer to the Emulex Drivers for Linux User Manual.
  • If AMD IOMMU is enabled in BIOS on ProLiant DL165 G7 systems, the system will reboot automatically when IOMMU attempts to initalize. To work around this issue, either disable IOMMU, or update the BIOS to version 2010.09.06 or later.
  • As of Red Hat Enterprise Linux 5.6 the ext4 file system is fully supported. However, provisioning ext4 file systems with the anaconda installer is not supported, and ext4 file systems need to be provisioned manually after the installation.
  • In some cases the NFS server fails to notify NFSv4 clients about renames and unlinks done by other clients, or by non-NFS users of the server. An application on a client may then be able to open the file at its old pathname (and read old cached data from it, and perform read locks on it), long after the file no longer exists at that pathname on the server.
    To work around this issue, use NFSv3 instead of NFSv4. Alternatively, turn off support for leases by writing 0 to /proc/sys/fs/leases-enable (ideally on boot, before the nfs server is started). This change prevents NFSv4 delegations from being given out, restore correctness at the expense of some performance.
  • Some laptops may generate continuous events in response to the lid being shut. Consequently, the gnome-power-manager utility will consume CPU resources as it responds to each event.
  • A kernel panic may be triggered by the lpfc driver when multiple Emulex OneConnect Universal Converged Network Adapter initiators are included in the same Storage Area Network (SAN) zone. Typically, this kernel panic will present after a cable is pulled or one of the systems is rebooted. To work around this issue, configure the SAN to use single initiator zoning. (BZ#574858)
  • If a Huawei USB modem is unplugged from a system, the device may not be detected when it is attached again. To work around this issue, the usbserial and usb-storage driver modules need to be reloaded, allowing the system to detect the device. Alternatively, the if the system is rebooted, the modem will be detected also. (BZ#517454)
  • Memory on-line is not currently supported with the Boxboro-EX platform. (BZ#515299)
  • Unloading a PF (SR-IOV Physical function) driver from a host when a guest is using a VF (virtual function) from that device can cause a host crash. A PF driver for an SR-IOV device should not be unloaded until after all guest virtual machines with assigned VFs from that SR-IOV device have terminated. (BZ#514360)
  • Data corruption on NFS filesystems might be encountered on network adapters without support for error-correcting code (ECC) memory that also have TCP segmentation offloading (TSO) enabled in the driver. Note: data that might be corrupted by the sender still passes the checksum performed by the IP stack of the receiving machine A possible work around to this issue is to disable TSO on network adapters that do not support ECC memory. BZ#504811
  • After installation, a System z machine with a large number of memory and CPUs (e.g. 16 CPU's and 200GB of memory) might may fail to IPL. To work around this issue, change the line
    ramdisk=/boot/initrd-2.6.18-<kernel-version-number>.el5.img
    
    to
    ramdisk=/boot/initrd-2.6.18-<kernel-version-number>.el5.img,0x02000000
    
    The command zipl -V should now show 0x02000000 as the starting address for the inital RAM disk (initrd). Stop the logigal partiton (LPAR), and then manually increase the the storage size of the LPAR.
  • On certain hardware configurations the kernel may panic when the Broadcom iSCSI offload driver (bnx2i.ko and cnic.ko) is loaded. To work around this do not manually load the bnx2i or cnic modules, and temporarily disable the iscsi service from starting. To disable the iscsi service, run
    chkconfig --del iscsi
    chkconfig --del iscsid
    
    On the first boot of your system, the iscsi service may start automatically. To bypass this, during bootup, enter interactive start up and stop the iscsi service from starting.
  • In Red Hat Enterprise Linux 5, invoking the kernel system call "setpriority()" with a "which" parameter of type "PRIO_PROCESS" does not set the priority of child threads. (BZ#472251)
  • Physical CPUs cannot be safely placed offline or online when the 'kvm_intel' or 'kvm_amd' module is loaded. This precludes physical CPU offline and online operations when KVM guests that utilize processor virtualization support are running. It also precludes physical CPU offline and online operations without KVM guests running when the 'kvm_intel' or 'kvm_amd' module is simply loaded and not being used.
    If the kmod-kvm package is installed, the 'kvm_intel' or 'kvm_amd' module automatically loads during boot on some systems. If a physical CPU is placed offline while the 'kvm_intel' or 'kvm_amd' module is loaded a subsequent attempt to online that CPU may fail with an I/O error.
    To work around this issue, unload the 'kvm_intel' or 'kvm_amd' before performing physical CPU hot-plug operations. It may be necessary to shut down KVM guests before the 'kvm_intel' or 'kvm_amd' will successfully unload.
    For example, to offline a physical CPU 6 on an Intel based system:
    # rmmod kvm_intel
    # echo 0 > /sys/devices/system/cpu/cpu6/online
    # modprobe kvm_intel
    
  • A change to the cciss driver in Red Hat Enterprise Linux 5.4 made it incompatible with the "echo disk > /sys/power/state" suspend-to-disk operation. Consequently, the system will not suspend properly, returning messages such as:
    Stopping tasks:
    ======================================================================
     stopping tasks timed out after 20 seconds (1 tasks remaining):
      cciss_scan00
    Restarting tasks...<6> Strange, cciss_scan00 not stopped
     done
    
    (BZ#513472)
  • The kernel is unable to properly detect whether there is media present in a CD-ROM drive during kickstart installs. The function to check the presence of media incorrectly interprets the "logical unit is becoming ready" sense, returning that the drive is ready when it is not. To work around this issue, wait several seconds between inserting a CD and asking the installer (anaconda) to refresh the CD. (BZ#510632)
  • Applications attempting to malloc memory approximately larger than the size of the physical memory on the node on a NUMA system may hang or appear to stall. This issue may occur on a NUMA system where the remote memory distance, as defined in SLIT, is greater than 20 and RAM based filesystem like tmpfs or ramfs is mounted.
    To work around this issue, unmount all RAM based filesystems (i.e. tmpfs or ramfs). If unmounting the RAM based filesystems is not possible, modify the application to allocate lesser memory. Finally, if modifying the application is not possible, disable NUMA memory reclaim by running:
    sysctl vm.zone_reclaim_mode=0
    

    Important

    Turning NUMA reclaim negatively effects the overall throughput of the system.
  • Configuring IRQ SMP affinity has no effect on some devices that use message signalled interrupts (MSI) with no MSI per-vector masking capability. Examples of such devices include Broadcom NetXtreme Ethernet devices that use the bnx2 driver.
    If you need to configure IRQ affinity for such a device, disable MSI by creating a file in /etc/modprobe.d/ containing the following line:
    options bnx2 disable_msi=1
    
    Alternatively, you can disable MSI completely using the kernel boot parameter pci=nomsi. (BZ#432451)
  • The smartctl tool cannot properly read SMART parameters from SATA devices. (BZ#429606)
  • IBM T60 laptops will power off completely when suspended and plugged into a docking station. To avoid this, boot the system with the argument acpi_sleep=s3_bios. (BZ#439006)
  • The QLogic iSCSI Expansion Card for the IBM Bladecenter provides both ethernet and iSCSI functions. Some parts on the card are shared by both functions. However, the current qla3xxx and qla4xxx drivers support ethernet and iSCSI functions individually. Both drivers do not support the use of ethernet and iSCSI functions simultaneously.
    Because of this limitation, successive resets (via consecutive ifdown/ifup commands) may hang the device. To avoid this, allow a 10-second interval after an ifup before issuing an ifdown. Also, allow the same 10-second interval after an ifdown before issuing an ifup. This interval allows ample time to stabilize and re-initialize all functions when an ifup is issued. (BZ#276891)
  • Laptops equipped with the Cisco Aironet MPI-350 wireless may hang trying to get a DHCP address during any network-based installation using the wired ethernet port.
    To work around this, use local media for your installation. Alternatively, you can disable the wireless card in the laptop BIOS prior to installation (you can re-enable the wireless card after completing the installation). (BZ#213262)
  • Hardware testing for the Mellanox MT25204 has revealed that an internal error occurs under certain high-load conditions. When the ib_mthca driver reports a catastrophic error on this hardware, it is usually related to an insufficient completion queue depth relative to the number of outstanding work requests generated by the user application.
    Although the driver will reset the hardware and recover from such an event, all existing connections at the time of the error will be lost. This generally results in a segmentation fault in the user application. Further, if opensm is running at the time the error occurs, then you need to manually restart it in order to resume proper operation. (BZ#251934)
  • The IBM T41 laptop model does not enter Suspend Mode properly; as such, Suspend Mode will still consume battery life as normal. This is because Red Hat Enterprise Linux 5 does not yet include the radeonfb module.
    To work around this, add a script named hal-system-power-suspend to /usr/share/hal/scripts/ containing the following lines:
    	
    chvt 1
    radeontool light off
    radeontool dac off
    
    This script will ensure that the IBM T41 laptop enters Suspend Mode properly. To ensure that the system resumes normal operations properly, add the script restore-after-standby to the same directory as well, containing the following lines:
    	
    radeontool dac on
    radeontool light on
    chvt 7
    
  • If the edac module is loaded, BIOS memory reporting will not work. This is because the edac module clears the register that the BIOS uses for reporting memory errors.
    The current Red Hat Enterprise Linux Driver Update Model instructs the kernel to load all available modules (including the edac module) by default. If you wish to ensure BIOS memory reporting on your system, you need to manually blacklist the edac modules. To do so, add the following lines to /etc/modprobe.conf:
    	
    blacklist edac_mc
    blacklist i5000_edac
    blacklist i3000_edac
    blacklist e752x_edac
    
  • Due to outstanding driver issues with hardware encryption acceleration, users of Intel WiFi Link 4965, 5100, 5150, 5300, and 5350 wireless cards are advised to disable hardware accelerated encryption using module parameters. Failure to do so may result in the inability to connect to Wired Equivalent Privacy (WEP) protected wireless networks after connecting to WiFi Protected Access (WPA) protected wireless networks.
    To do so, add the following options to /etc/modprobe.conf:
    alias wlan0 iwlagn
    options iwlagn swcrypto50=1 swcrypto=1
    
    (where wlan0 is the default interface name of the first Intel WiFi Link device)
The following note applies to PowerPC Architectures:
  • The size of the PPC kernel image is too large for OpenFirmware to support. Consequently, network booting will fail, resulting in the following error message:
    Please wait, loading kernel...
    /pci@8000000f8000000/ide@4,1/disk@0:2,vmlinux-anaconda: No such file or directory
    boot:
    
    To work around this:
    1. Boot to the OpenFirmware prompt, by pressing the '8' key when the IBM splash screen is displayed.
    2. Run the following command:
      setenv real-base 2000000
      
    3. Boot into System Managment Services (SMS) with the command:
      0> dev /packages/gui obe