Red Hat Training

A Red Hat training course is available for Red Hat Enterprise Linux

Chapter 10. Kernel

/dev/disk/by-path/ now accounts for NPIV paths

Previously, if two or more virtual host bus adapters (HBAs) were created on a single physical HBA, only a single link to the device was created in the /dev/disk/by-path/ directory instead of one link for each path. As a consequence, creating a virsh pool with virtual HBAs by using Fibre Channel N_Port ID Virtualization (NPIV) did not work correctly. With this update, symbolic links in /dev/disk/by-path/ are created correctly and are unique. Symbolic links in /dev/disk/by-path/ created by udev for logical unit numbers (LUNs) connected through a physical Fibre Channel N_Port stay the same. (BZ#1032218)

Removed unintended kernel warning message

A recent change in Red Hat Enterprise Linux 6.8 caused an unintended warning message to be displayed in certain situations where a file size is increased, such as by using fallocate operations:
WARNING: at mm/truncate.c:614 pagecache_isize_extended+0x10d/0x120()
This bug has been fixed, and operations which increase file size no longer cause this warning message to be displayed or logged. (BZ#1205014)

librdmacm no longer outputs warnings and errors if no RDMA hardware is present

Previously, if librdmacm was installed on a system with no RDMA hardware present, it could, in some circumstances, output superfluous warning and error messages to the standard error stream (stderr). With this update, librdmacm no longer outputs warning and error messages to stderr in such cases. (BZ#1231766)

Fixed kernel booting issues with the mlx5 driver

When the mlx5 driver was enabled on a system with non-fatal PCIe errors, the kernel previously failed to boot, crashing in the mlx5 probe routine shortly after it enabled PCIe error handling. The patch causing this bug has been removed, and kernel now boots successfully when this driver is enabled. (BZ#1324599)

Changing snapshot read-only status no longer causes a kernel crash

Previously, the dm-snapshot target had improper handover of the exception store when the target was reloaded. As a consequence, when changing read-only status of the snapshot volume with lvchange -p r or lvchange -p rw commands and there was I/O to the origin volume in progress, the kernel crashed with the BUG() macro. With this update, the origin logical volume is suspended during exception store handover, so that there is no I/O in progress during the handover. As a result, changing snapshot read-only status no longer causes the aforementioned kernel crash. (BZ#1177389)

qla2xxx updated to version 8.07.00.26.06.8-k

The qla2xxx driver has been updated to version 8.07.00.26.06.8-k. This update backports initiator side upstream fixes and minor enhancements through 8.07.00.26. (BZ#1252111)

Memory leak in devpts_kill_sb() fixed

The devpts pseudo-file system allocates IDR resources during use. However, prior to this update, devpts did not free them when it was unmounted. Consequently, the resources use by the IDR system were leaked which could cause problems with frequent starting and stopping of containers, particularly with a high number of containers used. This update applies an upstream patch which releases these resources at unmount, and the IDR resources used by the devpts file system are no longer leaked at unmount. (BZ#1283557)

Setting a sysctl parameter now executes successfully

While executing the sysctl -w vm.compact_memory=1 command to set a sysctl parameter, the system previously returned the following error messages:
error: "Success" setting key "vm.compact_memory"
The provided patch fixes this bug, and the aforementioned command now executes successfully. (BZ#1278842)

netconsole no longer causes kernel crash

Resetting an ixgbe or vmxnet3 adapter while sending a message over netconsole or netpoll at the same time could previously cause a kernel crash. This update adds mutual exclusion between the core adapter reset path and netpoll transmit path, preventing kernel crashes in this situation. (BZ#1252212)

Loop checks added to VFS to prevent kernel crashes

The NFS client was previously failing to detect a directory loop for some NFS server directory structures. This failure could cause NFS inodes to remain referenced after attempting to unmount the file system, leading to a kernel crash. This update adds loop checks to VFS, which effectively prevents this problem from occurring. (BZ#1254020)

Playing audio from a USB sound card works as expected

Due to incorrect URB_ISO_ASAP semantics, playing an audio file using a USB sound card could previously fail for some hardware configurations. This update fixes the bug, and playing audio from a USB sound card now works as expected. (BZ#1255071)

Page fault and subsequent kernel oops in the HID driver fixed

Previously, when the Human Interface Device (HID) driver ran a report on an unaligned buffer, it could cause a page fault interrupt and a kernel oops when the end of the report was read. This update fixes this bug by padding the end of the report with extra bytes, so the reading of the report never crosses a page boundary. As a result, the page fault and subsequent kernel oops no longer occur. (BZ#1256568)

Fixed a deadlock when syncing a frozen file system

Due to broken s_umount lock ordering, a race condition occurred when an unlinked file was closed and the sync (or syncfs) utility was run at the same time. As a consequence, a deadlock occurred on a frozen file system between sync and a process trying to unfreeze the file system. With this update, sync (or syncfs) is skipped on frozen file systems, and deadlock no longer occurs in the aforementioned situation. (BZ#1241791)

dracut dependencies updated to prevent boot failures

The Deterministic Random Bit Generator (DRBG) module must be loaded during boot before cryptographic ciphers can be used. However, older versions of dracut did not include DRBG in the initramfs image which could use cryptographic ciphers for disk encryption. As a consequence, if disk encryption was in use on the root file system, the boot process failed. This update adds the DRBG module into the dependency list of dracut, ensuring that the module is present in the initramfs, and systems with encrypted root file systems can now boot successfully. (BZ#1241338)

Packets are now counted correctly

Due to a regression, packets counter detected only the number of normally processed completions (packets), but failed to detect erroneous ones. As these packets were thus never acknowledged, the firmware kept returning interrupt requests (IRQs). A patch has been provided to fix this bug, and all packets are now counted as expected. (BZ#1241287)

Fixed a deadlock when removing directories

When removing a directory while a reference was held to that directory by a reference to a negative child dentry, the directory dentry was previously not killed. In addition, once the negative child dentry was killed, an unlinked and unused dentry was still present in the cache. This could cause a deadlock by forcing dentry eviction while the file system in question was frozen. With this update, all unused dentries are unhashed and evicted immediately after a successful directory removal, which avoids the deadlock, and the system no longer hangs in the aforementioned scenario. (BZ#1241030)

Mapping hugetlb areas no longer causes data corruption

Inside hugetlb, region data structures were protected by a combination of a memory map semaphore and a single hugetlb instance mutex. However, a page-fault scalability improvement backported to the kernel in a previous release removed the single mutex and introduced a new mutex table, making the locking combination insufficient and leading to possible race windows that could cause corruption and undefined behavior. The problem could be observed for example when software mapping or remapping hugetlb areas with concurrent threads reading or writing to same areas, which caused page faults. This update fixes the problem by introducing a required spinlock to the region tracking functions for proper serialization. (BZ#1260755)

multipath request queue no longer causes stalls

Previously, running the multipath request queue caused regressions in cases where paths failed regularly under I/O load. This regression manifested as I/O stalls that exceeded 300 seconds. This update reverts the changes aimed to reduce running the multipath request queue, resulting in I/O completing in a timely manner. (BZ#1240767)

inodes are now freed as intended

Previously, when opening a file by its file handle (fhandle) with its dentry not present in the dcache ('cold dcache'), and then making use of the unlink() and close() functions, the inode was not freed upon the close() system call. As a consequence, the iput() final was delayed indefinitely. A patch has been provided to fix this bug, and the inode is now freed as expected. (BZ#1236736)

The vmxnet3 driver is now compatible with the vmxnet3 adapter version 2

Due to a bug, the vmxnet3 driver demonstrated incorrect behavior such as memory leaks or 'screaming interrupts' when in use with vmxnet3 adapter version 2. Several upstream patches have been applied to fix the behavior of the vmxnet3 driver - namely, this update fixes memory leaks in the rx path, implements a handler for PCI shutdown, and makes vmxnet3 compatible with adapter version 2. (BZ#1236564)

IP fragments are discarded in time

The memory used by the defragmentation engine is accounted for per CPU. However, on systems with numerous CPUs, the per-CPU caches could deviate from reality, thus causing the defragmentation engine to discard old fragments too early. This update adds a fix to minimize this discrepancy, and old IP fragments are now discarded at the correct time. (BZ#1235465)

GFS2 now references correct value

The GFS2 file system previously had a rare timing window that sometimes caused it to reference an uninitialized variable. Consequently, a kernel panic occurred. The code has been changed to reference the correct value during this timing window, and the kernel no longer panics. (BZ#1267995)

Software using IPC SysV semaphores works with kernel correctly

At a process or thread exit, when the Linux kernel undoes any SysV semaphore operations done previously (ones done using semop with the SEM_UNDO flag), there was a possible race condition with another process or thread removing the same semaphore set where the operations occurred, leading to a possible use of in-kernel-freed memory and then to possible unpredictable behaviour. This bug could be noticed with software which uses IPC SysV semaphores, such as IBM DB2, which could in certain cases have some of its processes or utilities get incorrectly stalled in an IPC semaphore operation or system call after the race condition happened. A patch has been provided to fix this bug, and the kernel now behaves as expected in the aforementioned scenario. (BZ#1233300)

Fixed a race condition in perf buildid-cache

Prior to this update, multiple instances trying to copy the same file triggered a race condition in perf buildid-cache that could truncate system libraries and other files. With this update, unique temporary files are used when copying to the buildid directory to prevent the aforementioned race condition from occurring. (BZ#1229673)

Cache serialization has been added to prevent kernel crashes

Due to a race condition whereby a cache operation could be submitted after a cache object was killed, the kernel occasionally crashed on systems running the cachefilesd service. The provided patch prevents the race condition by adding serialization in the code that makes the object unavailable. As a result, all subsequent operations on the object are rejected and the kernel no longer crashes in this scenario. (BZ#1096893)

Reloading or removing edac modules now works as expected

Previously, reloading or removing edac modules on a system using the i7core_edac module could lead to a number of warning messages to be returned and a subsequent kernel crash. The underlying source code has been patched, and the kernel no longer crashes when operating with edac modules. (BZ#1227845)

Custom MAC addresses can be specified again for bond interfaces

On a system with a bonded interface, the user could not specify their own custom MAC address for the bond. A patch has been provided to fix this bug, and custom MAC addresses can be specified again in the aforementioned situation. (BZ#1225359)

The st and sg drivers now work correctly

Due to the incorrect length for the FCP_RSP_INFO field, parts of the field could be copied, and the st and sg drivers thus did not work correctly. With this update, the code related to the FCP protocol has been updated, and st and sg now work as expected. (BZ#1223105)

Slave interfaces turn into promiscuous mode automatically

If a bonding VLAN interface turned into promiscuous mode while it was inactive, the slave interfaces previously did not turn into promiscuous mode automatically even after the bonding VLAN interface became active again. With this update, flag changes are always propagated to interfaces, and slave interfaces thus enter promiscuous mode as expected. (BZ#1222823)

force_hrtimer_reprogram parameter added to kernel

Due to a timer expiry issue, the scheduler tick previously stopped for too long when the ksoftirqd daemon for hrtimer was blocked by a running process. This update adds the force_hrtimer_reprogram kernel parameter. If force_hrtimer_reprogram=1 is used on the kernel command line, the reprogramming of all expired timers is forced, which prevents this bug from occuring. (BZ#1285142)

ipr memory buffer indexing updated

A bug in the ipr driver on 64-bit IBM Power Systems (ppc64) could result in backwards memory buffer indexing and cause a kernel crash when running the Hardware Test Exerciser (HTX) test suite. With this update, ipr memory buffer indexing uses a bit mask operation instead of modulo, causing low bits to be masked off so that no backwards indexing is possible, and preventing the crash. (BZ#1209543)

cgroup_threadgroup_rwsem variable added to kernel

Previously, the attach_task_by_pid() function in some cases raced with an exiting thread and tried to lock or unlock the already freed group_rwsem member of the signal_struct list. As a consequence, a kernel crash could occur. This update adds the cgroup_threadgroup_rwsem variable, which fixes this bug and prevents the kernel crash from occurring in this scenario. (BZ#1198732)

Adding keys into a revoked keyring no longer causes a memory leak

Attempting to use the request_key() function to add a key into a revoked keyring was previously causing a resource leak in the kernel error path. Keys which were allocated and then failed became stuck in kernel memory and were impossible for the garbage collector to remove. With this update, the reference count on failed keys will now correctly reach 0 in this situation, allowing the garbage collector to remove them so that failed keys will no longer stay in memory indefinitely. (BZ#1188442)

Kernel panic caused by repeated fork() no longer occurs

Previously, an unusual forking pattern could cause the anon_vma_chain and anon_vma slab memory to grow infinitely even though the number of processes involved stayed low. As a consequence, a kernel panic occurred. The provided patch adds a heuristic which reuses existing anon_vma instead of forking a new one and adds the anon_vma->degree counter which makes sure the count of anon_vma members is not bigger than twice the count of virtual memory areas. As a result, the kernel panic no longer occurs in this situation. (BZ#1151823)

Fixed job scheduling now ensures balanced CPU load

Due to prematurely decremented calc_load_task, the calculated load average was off by up to the number of CPUs in the machine. As a consequence, job scheduling worked improperly causing a drop in the system performance. This update keeps the delta of the CPU going into NO_HZ idle separately, and folds the pending idle delta into the global active count while correctly aging the averages for the idle-duration when leaving NO_HZ mode. Now, job scheduling works correctly, ensuring balanced CPU load. (BZ#1167755)

Only single processe can free specific memory page

A race condition was found in hash table invalidation code between inode invalidation and inode clearing code in the GFS2 file system. In some circumstances, two processes could attempt to free the same memory, resulting in a kernel panic. This update adds a spin_lock to the hash table invalidation code allowing only a single process to attempt to free a specific memory page, which prevents the race condition from occurring. (BZ#1250663)

macvtap transfers VLAN packets over be2net successfully

Previously, VLAN stacked on the macvlan or macvtap device did not work for devices that implement and use VLAN filters. As a consequence, macvtap passthrough mode failed to transfer VLAN packets over the be2net driver. This update implements VLAN ndo calls to the macvlan driver to pass appropriate VLAN tag IDs to lower devices. As a result, macvtap transfers VLAN packets over be2net successfully. (BZ#1213846)

primary_reselect=failure now works properly

A bug caused the primary_reselect=failure bond parameter to work incorrectly. The primary interface was always taking over even if others did not fail. With this update, the parameter works as expected, and the primary bond interface only takes over if the current non-primary active interface fails. (BZ#1290672)

Log messages from logshifter are now processed correctly

Under significant load, some applications such as logshifter could generate bursts of log messages too large for the system logger to spool. Due to a race condition, log messages from that application could then be lost even after the log volume dropped to manageable levels. This update fixes the kernel mechanism used to notify the transmitter end of the socket used by the system logger that more space is available on the receiver side, removing a race condition which previously caused the sender to stop transmitting new messages and allowing all log messages to be processed correctly. (BZ#1284900)

KVM virtual guests now connect via a bridged interface successfully

Previously, a bridge interface could exist on top of a bonded interface which was above a physical interface with the large receive offload (LRO) flag still on. Bridge interfaces are incompatible with LRO enabled on any underlying devices, which caused network communications on the bridge, such as that from a Virtual Machine (VM) to fail to function properly. This update makes sure devices underneath a bridge all get LRO disabled, and a VM now connects via a bridged interface successfully. (BZ#1258446)

SwapFree size is now correct

A previous change in the get_swap_page() locking removed the use of the swap_lock spinlock. This could cause nr_swap_pages corruption and invalid SwapFree information in the /proc/meminfo file, where the size of SwapFree could exceed the size of SwapTotal. This update uses an atomic variable for nr_swap_pages, and the size of SwapFree in /proc/meminfo is now correct. (BZ#1252362)

SCSI error handling no longer causes deadlocks

Previously, when a SCSI command timed out on a removable media device, the error handling code always attempted to re-lock the door of the device. This could cause a deadlock because the request to issue a command to re-lock the door could not be allocated if all requests were in use. With this update, SCSI error handling only attempts to re-lock if the device was reset as part of the error handling procedure, and the deadlock no longer occurs. (BZ#995234)

LRO flags now propagate correctly

Large Receive Offload (LRO) flag disabling was not being propagated downwards from above devices in the VLAN and bond hierarchy, breaking the flow of traffic. This bug has been fixed and LRO flags now propagate correctly. (BZ#1259008)

multicast group assignments fixed

The kernel was incorrectly assigning multicast groups for the nl80211 protocol, causing problems with nl80211 wireless drivers, for example, preventing hostapd from starting and initializing wireless devices in Access Point mode. This update fixes multicast group assignments for nl80211 and allows wireless devices to be managed correctly. (BZ#1259870)

Sending a UDP datagram over IPv6 works as expected

Due to a race condition, an ipv6_txoptions corruption previously appeared when sending a UDP datagram over the IPv6 protocol. An upstream patch has been applied to prevent data corruption that led to the kernel panic. (BZ#1312740)

nvme hard-lockup panic no longer occurs

When the the nvme driver held the queue lock for too long, for example during DMA mapping, a lockup occurred leading to the nvme hard-lockup panic. This update fixes the underlying source code, and nvme now works as expected. (BZ#1227342)

BUG_ON() in fs_clear_inode() no longer occurs

Previously, the BUG_ON() signal appeared in the fs_clear_inode() function where the nfs_have_writebacks() function reported a positive value for nfs_inode->npages. As a consequence, a kernel panic occurred. The provided patch performs a serialization by holding the inode i_lock over the check of PagePrivate and locking the request, which fixes this bug. (BZ#1135601)

UID and GID are assigned correct values

Due to a regression, the UID and GID environment variables were not assigned correct values during autofs mount requests. This update provides a patch that fixes the UID and GID assignment so that UID and GID now take on the value of the user that has triggered the mount. (BZ#1248820)

Using LUKS and IPSEC simultaneously no longer leads to data corruption

When using IPSEC and a LUKS-encrypted volume simultaneously, data corruption on a LUKS volume could occur. The provided patch fixes this bug, and data corruption no longer occurs when using LUKS and IPSEC simultaneously. (BZ#1259023)

VLAN_GROUP_ARRAY_LEN has been revived

In a previous update, the VLAN_GROUP_ARRAY_LEN kernel macro was renamed to VLAN_N_VID. Due to this rename, when compiling a kernel module requiring VLAN_GROUP_ARRAY_LEN, for example the vmxnet3 external driver, the compilation failed. With this update, the old macro has been revived so that the third party modules succeed to compile. (BZ#1242145)

Corrupted ELF header has been fixed

Previously, the corrupted ELF header of the /proc/vmcore ELF file caused that the ELF file could not be read correctly. As a consequence, the kdump service terminated unexpectedly, resulting in a kernel panic. The provided patch fixes the ELF header, and kdump now succeeds as expected. (BZ#1236437)

Quota warning deadlocks on tty mutex have been fixed

Previously, the quota code could call into the tty layer to print a warning, which could cause a lock inversion between tty->atomic_write_lock and dqptr_sem. The provided patch prevents the quota utility code from calling the tty layer with dqptr_sem semaphore held, and processes no longer end up in a deadlock. (BZ#1232387)

anon_vma degree is always decremented when the VMA list is empty

In the anon_vma data structure, the degree counts the number of child anon_vma members and of virtual memory areas that point to this anon_vma. In the unlink_anon_vma() function, when its list is empty, anon_vma is going to be freed whether the external reference count is zero or not, so the parent's degree should be decremented. However, failure to decrement the degree triggered a BUG_ON() signal in unlink_anon_vma(). The provided patch fixes this bug, and the degree is now decremented as expected. (BZ#1309898)

Repeated sysrq events proceed as expected

Previously, repeated sysrq events in an NMI context could cause a deadlock, leading to a system crash. The provided patchset adds minimal support for the seq_buf buffer and a per_cpu printk() function, which prevents the aforementioned deadlock from occurring. (BZ#1104266)

Unix domain datagram socket no longer experiences deadlock

Due to a regression, a Unix domain datagram socket could come to a deadlock when sending a datagram to itself. The provided patch adds another sk check to the unix_dgram_sendmsg() function, and the aforementioned deadlock no longer occurs. (BZ#1309241)

Exiting process decrements a counter as expected

Previously, when Kernel Shared Memory (KSM) or page migration were in use, an exiting process could fail to decrement a counter related to anonymous virtual memory areas. As a consequence, the counter unbalance triggered a kernel panic. The provided patch fixes this bug, and the kernel panic no longer occurs in the aforementioned scenario. (BZ#1126228)

VGA output speed in UEFI boot mode improved

Previously, the VGA console was very slow in UEFI boot mode, which resulted in a large difference in boot time for servers with many CPUs or I/O devices. As a consequence, printing large amount of debug output during the boot phase was extremely slow, making it difficult to analyze issues that occur during boot time. In addition, the VGA output slowdown continued during OS runtime, which could lead to a system hang. The provided fix improves the VGA output speed in UEFI boot mode, preventing the aforementioned problems. (BZ#1290686)

ndo_set_multicast_list field is again present in network drivers

When creating a VLAN interface on top of a netxen_nic physical interface after changing its MAC address, ping over VLAN to a remote VLAN previously failed. The provided patch adds back the use of the ndo_set_multicast_list field in network drivers, and the ping now succeeds as expected. (BZ#1213207)

fio no longer corrupts XFS

After adjusting the extent size with the xfs_fio utility and running the fio tool with the configuration file provided, the XFS file system previously became corrupted. The provided patch extends the size hints, and fio no longer corrupts XFS. (BZ#1211110)

NFS mount now reports correctly

When configuring the firewall on the NFS server to reject all the packets of 2049 and mounting the share on the NFS client, the following error was returned:
connection timed out
The provided fix corrects the error message, which now reads:
connection refused
(BZ#1206555)

Automatic signing is now enabled

When setting a security type with the sec= mount option and no signing had been specified with the trailing i, automatic signing was not previously enabled. For example, in DFS mounts where the DFS node requires signing but the client had disabled it using sec=, the user could not mount the DFS node if the node required signing to be enabled. The provided fix sets MAY_SIGN flags for all security types, thus fixing this bug. (BZ#1197875)

Writing a large file using direct I/O now proceeds successfully

Previously, writing a large file using direct I/O in 16 MB chunks sometimes caused a pathological allocation pattern where 16 MB chunks of large free extent were allocated to a file in a reversed order. The provided patch avoids the backward allocation, and writing a large file using direct I/O now proceeds successfully. (BZ#1302777)

Fix for shrinker return value prevents system hang

The shrink_dcache_memory shrinker is prone to overflow, reporting the following line in the log:
negative objects to delete
As a consequence, the system previously hung. The provided patch tests for this overflow sign extension from any shrinker return value, and refuses to set the max_pass variable larger than the INT_MAX preprocessor macro. As a result, the aforementioned hang no longer occurs. (BZ#1159675)

perf has been updated

To support a greater range of hardware and incorporate numerous bug fixes, perf has been updated. Notable enhancements include:
  • Added support for additional model numbers of 5th Generation Intel Core i7 processors.
  • Added support for Intel Xeon v5 mobile and desktop processors.
  • Enabled support for the uncore subsystem for Intel Xeon v3 and v4 processors.
  • Enabled support for the uncore subsystem for Intel Xeon Processor D-1500. (BZ#1189317)

Configuring settings for multiple WWPNs is now easier

This enhancement update adds support for tag and untag commands in targetcli. Instead of configuring LUN mapping using the numeric WWPN, for example 20:00:00:1b:21:59:12:36, it is now possible to give one or more WWPNs a descriptive name with the tag command, and then use the tag to configure LUN mappings. See help tag and help untag commands within the acls configuration node for more information. (BZ#882092)

Systems with iscsi_firmware are able to boot

A previous regression in dracut caused systems with iSCSI offloading or iSCSI Boot Firmware Table (iBFT) to stop booting in some cases. Consequently, freshly installed Red Hat Enterprise Linux 6.8 systems with iscsi_firmware on the kernel command line could be unable to boot. This update fixes the bug, and systems in the described scenario are able to boot as expected. (BZ#1322209)