L1TF - L1 Terminal Fault Attack - CVE-2018-3620 & CVE-2018-3646

Public Date: August 10, 2018, 11:58
Updated November 3, 2024, 20:02 - Chinese, Simplified French Japanese Korean
Resolved Status
Important Impact

Insights vulnerability analysis

View exposed systems

Red Hat has been made aware of a new computer microprocessor hardware implementation (microarchitecture) issue similar to Spectre and Meltdown which has been reported to affect x86 microprocessors manufactured by Intel. Unprivileged attackers can use this flaw to bypass memory security restrictions in order to gain access to data stored in memory that would otherwise be inaccessible.  There are three pieces to this vulnerability. The first affects only Intel “SGX” secure enclaves and is mitigated through microcode updates independently of the operating system. The other two pieces require software-level mitigations performed by operating systems and hypervisors. Full mitigation from potential attack by untrusted guest virtual machines in an environment using virtualization will require specific action by a system administrator.

CVE-2018-3620 is the CVE identifier assigned to the operating system vulnerability for this issue. CVE-2018-3646 is the CVE identifier assigned to the virtualization aspect of the flaw. This issue is referred to as L1 Terminal Fault (L1TF) by the larger industry and as “Foreshadow” by the security researcher.

The L1 Terminal Fault vulnerability allows a malicious actor to bypass memory access security controls ordinarily imposed and managed by the operating system or hypervisor. An attacker can use this vulnerability to read any physical memory location that is cached in the L1 data cache of the processor. Normally, operating system and hypervisor managed “page tables” provide information to the processor about which memory locations should be accessible to an application, the operating system kernel itself, and guest virtual machine instances. These page tables are formed from page table entries (PTEs) that include a “present” bit indicating validity. In exploiting L1TF, an attacker abuses the Intel processor logic that recognizes valid PTEs.

The L1 data cache (typically 32KB in size) is the first level of a fast on-chip processor memory hierarchy that contains copies of data also held in the external (to the processor) main memory chips. Caches are populated as memory is used by programs, and are typically separated into multiple levels. A small and fast highest level (L1) cache is closest to the processor functional units that perform the actual calculations within a program, while progressively larger and slower caches are located conceptually further away. The L1 is shared between two peer hyperthreads within an Intel processor core. Each core also has a slightly larger L2 cache. The L3 (also called an LLC or Last Level Cache) is shared by all of the cores within the processor and is much larger (e.g. 32MB). Data moves from memory, to the L3, and toward the L1 when it is used.

Accesses to the cache memory within the processor chip are orders of magnitude faster than going out to main memory, so it is used to significantly enhance performance. The data cache is also populated as a side-effect of operations performed during speculative or Out-of-Order execution. As a result of this relative difference in the performance of the caches as compared with memory, it is possible for malicious software code to infer cache activity. This is known as side-channel analysis, and was popularized by Meltdown and Spectre. In those vulnerabilities, as in L1TF, specific software sequences, known as “gadgets” can be created that exploit a vulnerable processor to cause observable cache activity during speculation.

L1TF is similar to Meltdown, insomuch that it exploits how vulnerable processors implement a form of speculation, in this case during the searching of page tables (known as a table walk). The processor is designed to yield the most aggressive performance possible, so it speculates that page table entries are valid and permit access to the underlying memory location prior to completing the necessary validity checks. The processor will preemptively search its L1 data cache for any physical address matching bits in the page table entry, forwarding any match to dependent speculative operations. After a small time, the processor will detect that the page table entry is not valid and signal an internal “terminal fault”. The processor then unwinds (throws away) previously speculated results, but observable impact upon the cache remains.

In the case of virtualized guest instances, the L1TF vulnerability manifests because of an aspect of the implementation of a technology within Intel processors known as “Extended Page Tables” (EPT). This hardware performance feature allows hypervisors (such as KVM) to delegate part of the management of page tables to guest virtual machines. Each memory access is subject to two translations - first by the guest, and then by the host page tables. This saves the overhead caused by repeated assistance from the hypervisor, which used to be required prior to EPT. In vulnerable implementations, a malicious guest is able to create a “not present” page table entry that will shortcut the normal two stages of translation, resulting in the guest being able to read host hypervisor or other guest physical memory if a copy exists in the L1 data cache.

There are two attack patterns that end-users will need to protect against: a malicious user on a system reading data on the physical system or a malicious guest OS or container accessing information from other guests or the host. This vulnerability is similar to CVE-2017-5754 (aka “Meltdown”) but exploits an interaction between the Memory Management Unit (MMU) and the L1 Data Cache under speculation when performing virtual to physical memory address translation. Existing mitigations for previous microarchitectural vulnerabilities (“Meltdown,” aka “Variant 3”) are not sufficient to protect against this new vulnerability.

Red Hat strongly recommends that customers take corrective  actions, including manually enabling specific kernel parameters or potentially disabling features like Intel Hyper-Threading, after the available updates have been applied. More details can be found in the Mitigations section of this article.

Background Information

Modern operating systems implement a ‘virtual memory’ scheme to efficiently use available main memory across multiple tasks/processes. Physical systems have fixed amounts of main memory which form the physical address space. This address space is divided into smaller managed units known as pages (e.g. 4KB). The operating system creates virtual address spaces for each running program (known as a process). Each virtual address is translated into an underlying physical address using a special piece of hardware within the processor, known as the Memory Management Unit (MMU). When a processor schedules a program for execution, its instructions and data are mapped into virtual addresses. Programs use virtual addresses to reference memory locations. The processor’s MMU uses a concept known as Paging to perform the translation between virtual addresses and their underlying mapped physical memory addresses.

The Paging technique translates each virtual address to a physical address by using hierarchical paging structures known as page tables. These translate portions of the virtual address using bits from the address as indexes into components of the page tables. Page tables are built from fixed size records/entries which hold a physical address pointing to either another paging structure entry or the mapped memory page. Along with the physical address, a paging structure entry also holds various attribute bits about the physical address. These include a bit indicating whether the page is present (P flag) in the physical memory or not (swapped out). A page table entry marked as not present is supposed to be ignored by the processor MMU logic.

Page Structure Entry:

The Intel Software Developers Manual (Volume 3, Chapter 4) defines the hardware paging structures used by Intel processors, including the 64-bit Page Table Entry (PTE) used to store the physical address during a virtual to physical address translation.

During address translation, the processor will walk through the page table structures, ultimately arriving upon a Page Table Entry that contains a possible physical translation for a particular virtual address. This address translation process terminates when the virtual address resolves to a mapped page frame in the physical memory OR when a paging structure entry indicates that the required page frame is not present (P flag(bit zero) = 0) in the main memory OR it has reserved bits set. Accessing the physical address from such an entry results in the page-fault exception (aka “terminal fault”).

Acknowledgements

Red Hat would like to thank Intel Inc. and industry partners for reporting this issue and collaborating on the mitigations for the same.

Additional References

For more information about this class of issues, please refer to Intel’s Website

Foreshadow website

Video: Everything you need to know about L1TF in 3 minutes

Video: Red Hat Technical Explainer Video - 10 minutes

Understanding L1 Terminal Fault Blog

Managing Risk in the Modern World Blog


 Additional Red Hat Product References

Is CPU Microcode available to address CVE-2018-3620 and CVE-2018-3646?

Managing Hyper-Threading

The "tuned-adm" command hangs when changing to the "cpu-partitioning" profile with CPUs disabled

Considerations for OpenStack and L1TF


Impacted Products

Red Hat Product Security has rated CVE-2018-3620 & CVE-2018-3646 as having a security impact of Important.

The following Red Hat product versions are impacted:

  • Red Hat Enterprise Linux 5

  • Red Hat Enterprise Linux 6

  • Red Hat Enterprise Linux 7

  • Red Hat Atomic Host

  • Red Hat Enterprise MRG 2

  • Red Hat OpenShift Online v3

  • Red Hat Enterprise Linux OpenStack Platform 7.0 (Kilo) for RHEL7

  • Red Hat Enterprise Linux OpenStack Platform 7.0 (Kilo) director for RHEL7

  • Red Hat OpenStack Platform 8.0 (Liberty)

  • Red Hat OpenStack Platform 8.0 (Liberty) director

  • Red Hat OpenStack Platform 9.0 (Mitaka)

  • Red Hat OpenStack Platform 9.0 (Mitaka) director

  • Red Hat OpenStack Platform 10.0 (Newton)

  • Red Hat OpenStack Platform 11.0 (Ocata)

  • Red Hat OpenStack Platform 12.0 (Pike)

  • Red Hat OpenStack Platform 13.0 (Queens)

  • Red Hat Virtualization (RHEV-H/RHV-H)


    While Red Hat's Linux Containers are not directly impacted by kernel issues, their security relies upon the integrity of the host kernel environment. Red Hat recommends that you use the most recent versions of your container images. The Container Health Index, part of the Red Hat Container Catalog, can always be used to verify the security status of Red Hat containers. To protect the privacy of the containers in use, you will need to ensure that the Container host (such as Red Hat Enterprise Linux or Atomic Host) has been updated against these attacks. Red Hat has released an updated Atomic Host for this use case.

    Attack Description and Impact

    Traditional Host Attack Vector:

    When a processor supports speculative execution of instructions, a speculative load from a virtual address which cannot be resolved to a physical address leads to a page-fault exception during its translation process. Before this page-fault exception is delivered, speculative execution of the load instruction uses the physical address from the not present (P flag = 0) OR reserved bits set paging structure entry to access the physical memory. If the physical memory location was cached in the L1 data cache, the speculative load reads data from an inappropriate physical memory location. As a result, subsequent speculative operations use this data and can have measurable impact upon the caches that an attacker can use to read unauthorized data.

    An unprivileged system user or process could exploit the L1TF vulnerability to read data from arbitrary physical memory locations of the kernel and/or other processes running on the system provided that the data is first loaded into the L1 data cache. Since loads into the L1 data cache are an intrinsic component of the processor, a malicious attacker only has to wait for a program containing secrets of interest to load those secrets while performing normal operations. This can include cryptographic keys, and other data.


    Virtualization Attack Vector:

    Virtualized guest environments compound the impact from exploiting this vulnerability. In virtualized environments, guests often run an operating system which manages memory as if it were running on a bare metal host machine. What the guest operating system perceives as physical memory is in reality a virtual address space on the host, created by the Virtual Machine Monitor (VMM, e.g. KVM), also known as a hypervisor. The hypervisor introduces a layer of memory virtualization to ensure that it has control over the true host physical memory and so that it can provide isolated virtual address spaces to numerous guests while managing their access to the host physical memory. This means that virtualized environments have two levels of memory translations.

    To facilitate efficient translation of the Guest physical addresses to Host physical addresses, hardware processors have introduced the Extended Page Table (EPT) feature.

    EPT is similar to the hierarchical Paging structures used by the host operating system to translate virtual addresses to physical memory addresses. While translating Guest physical addresses to Host physical addresses, just like page-fault exceptions on the host, EPT Violation exceptions cause VM exits, indicating that the given guest physical address translation can not resolve to a host physical address OR the guest does not have permissions to access the given host memory.

    Similar to the host, speculative execution of instructions by a guest user can leverage those EPT violations-induced VM exits to read data from host physical memory via cache side-channel attacks. Since the guest operating system can control guest physical address bits in the EPT paging entry, a guest user can target specific host physical addresses and read data via the L1 data cache, including arbitrary host kernel memory and/or memory of the other guests/processes running on the host.


    Simultaneous Multithreading Attack Vector:

    Simultaneous multithreading (SMT) allows modern processors to improve system performance by executing more than one instruction stream simultaneously on separate logical processors. i.e. the CPU creates two or more logical processors with their own set of data registers, control registers, segment registers etc. which form their architectural state. These logical processors execute separate instruction streams while sharing the same execution engine, processor caches, TLB, system bus interface, etc.

    Considering that two logical processors share the processor cache, any cache manipulation operation performed by one logical processor will be visible to the other one. Ie. an application/guest running on one logical processor could leverage L1TF and speculative execution of instructions to read arbitrary memory contents of other processes and the kernel from the shared data cache.

    The simultaneous nature of Intel Hyper-threading means that flushing the cache on a peer logical processor when switching virtual machine instances running on it leaves a small window of time during which secrets might be reloaded by the peer and visible to a malicious attacker running code on the other logical processor. There is currently no standard way to guarantee that this cannot happen,

    As a consequence, virtualized environments should not schedule two different virtual machines to share two logical processors from a single core. In addition, when a full Linux stack is running on the host, which is able to load its own secrets in response to events such as interrupts, it is not safe to enable hyperthreading if running untrusted guests on a logical processor.

    Performance impact

    Mitigating L1TF in bare metal environments that do not use virtualization has negligible performance impact and does not require specific action beyond installing the updates.

    Mitigating L1TF in virtualized environments requires that the hypervisor check whether to flush the L1 data cache (L1D) of potential secrets from other virtual machines (or the host) whenever it begins running virtual machine code. Depending upon the workload, it may also be necessary to disable Intel Hyper-threading, in order to prevent a malicious virtual machine running on one hyperthread from attacking another on its peer thread.

    The L1 data cache makes data available to the CPU much faster than the main memory and (almost) all data loaded from memory must pass through it. Once the L1 data cache is flushed, the CPU needs to spend hundreds of cycles to repopulate those entries from lower level caches. This is expensive in terms of time spent on data access. In terms of relative efficiency, L1 data cache takes 1x CPU cycles, while the same data access from lower level caches could take 4x CPU cycles. Data access from main memory could take 60-70x or more cycles. Since data contained within the L1 is also contained within the higher level caches, it will typically be refilled from the L2 at only modest impact to performance. This overhead also increases external interrupt and other event latency.

    Similarly, Hyper-Threading (an SMT implementation) simultaneously executes more than one instruction stream to gain system performance. Disabling these threads would drop the number of logical CPUs seen by the system by half, thus significantly reducing the overall compute throughput of a system and its performance.

    Note the actual performance impact of these measures is subject to the workload and can vary from one workload to another. More details about the workload specific performance impact can be seen here:   Performance considerations for L1 Terminal Fault

    Diagnose your vulnerability

    Use the detection script to determine if your system is currently vulnerable to this flaw. To verify the legitimacy of the script, you can download the detached GPG signature as well, with the signing key on our Product Security openPGP Keys page . The current version of the script is 1.6.

    Additonally, two Ansible playbooks are provided below. The first, CVE-2018-3620-fix_disable_ht.yml, can be applied to systems which have not been updated to address L1TF to disable Hyper-Threading without restarting the system. The playbook will make an effort to determine if the system is acting as a hypervisor and has any running VMs, but be aware that offlining Hyper-Threading can cause unexpected behavior in applications with specific core/thread affinity. If the system's kernel doesn't include the necessary features, it will fail with a message to that effect. In that case, it is recommended that you update to the latest kernel and use the second playbook, which takes advantage of L1TF-aware kernel features.

    To use the CVE-2018-3620-fix_disable_ht.yml playbook, simply call ansible-playbook with the hosts you'd like to change specified in the HOSTS extra var:
    ansible-playbook -e "HOSTS=webservers,db01,portal" CVE-2018-3620-fix_disable_ht.yml

    The second playbook, CVE-2018-3620-apply_settings.yml, will help you enable mitigations provided by an L1TF-aware kernel. There are three settings you can specify:

    • FLUSH=1 will change the behavior of the kernel and the kvm-intel kernel module to flush the L1 data cache on every VM enter, not just those where the kernel detects that a flush is necessary. While there are performance implications, this will provide a stronger guarantee that information isn't shared between VMs, and should prevent information leaks such as ASLR derandomization. This setting should only be necessary if you are running a hypervisor with untrusted guest systems.
    • NOSMT=1 will disable Hyper-Threading, both for the live system and at boot time. As with the first playbook, an attempt is made to determine if there are active guests to prevent offlining threads which have set affinity.
    • FORCE=1 will, when combined with NOSMT=1, prevent SMT from being re-enabled at runtime via the /sys/devices/cpu/smt/control interface. If FORCE is not specified, users with sufficient permissions can re-enable or disable SMT at runtime.

    Additionally, the playbook provides a RESET=1 argument which will remove the above mitigations, and return the system to its default behavior of enabled SMT and conditional L1 data cache flushes on VM enter. A reboot is necessary after using RESET=1.

    To use CVE-2018-3620-apply_settings.yml, specify the features you want to set in extra vars, as well as the systems you'd like to target in the HOSTS variable. For example:

    To turn off SMT, but allow runtime changes, and to leave the default behavior of conditional L1 data cache flushes:
    ansible-playbook -e "HOSTS=webservers NOSMT=1" CVE-2018-3620-apply_settings.yml

    To apply all mitigations and prevent runtime changes:
    ansible-playbook -e "HOSTS=vmserver01 FLUSH=1 NOSMT=1 FORCE=1" CVE-2018-3620-apply_settings.yml

    To reset the L1TF settings to their default state:
    ansible-playbook -e "HOSTS=webservers RESET=1" CVE-2018-3620-apply_settings.yml

    Download Mitigation Playbooks

    CVE-2018-3620-fix_disable_ht.yml  Detached GPG signature

    CVE-2018-3620-apply_settings.yml  Detached GPG signature

    Determine if your system is vulnerable


    Take Action

    Red Hat customers running affected versions of these Red Hat products are strongly recommended to update them as soon as errata are available. Customers are urged to apply the available updates immediately and enable the mitigations as they feel appropriate.   
     
    The order the patches are applied is not important, but after updating firmware and hypervisors, every system/virtual machine will need to power off and restart to recognize a new hardware type.

    Updates for Affected Products

    ProductPackageAdvisory/Update
    Red Hat Enterprise Linux 7 (z-stream)kernelRHSA-2018:2384
    Red Hat Enterprise Linux 7kernel-rtRHSA-2018:2395
    Red Hat Enterprise Linux 7microcode_ctl RHEA-2018:2299
    Red Hat Enterprise Linux 7.4 Extended Update Support [2]kernelRHSA-2018:2387
    Red Hat Enterprise Linux 7.4 Extended Update Support [2]microcode_ctlRHEA-2018:2298
    Red Hat Enterprise Linux 7.3 Extended Update Support [2]kernelRHSA-2018:2388
    Red Hat Enterprise Linux 7.3 Extended Update Support [2]microcode_ctlRHEA-2018:2296
    Red Hat Enterprise Linux 7.2 Update Services for SAP Solutions, & Advanced Update Support [3],[4]kernelRHSA-2018:2389
    Red Hat Enterprise Linux 7.2 Update Services for SAP Solutions, & Advanced Update Support [3],[4]microcode_ctlRHEA-2018:2301
    Red Hat Enterprise Linux 6 (z-stream)kernelRHSA-2018:2390
    Red Hat Enterprise Linux 6microcode_ctlRHEA-2018:2300
    Red Hat Enterprise Linux 6.7 Extended Update Support [2]kernelRHSA-2018:2391
    Red Hat Enterprise Linux 6.7 Extended Update Support [2]microcode_ctl RHEA-2018:2304
    Red Hat Enterprise Linux 6.6 Advanced Update Support [3],[4]kernelRHSA-2018:2392
    Red Hat Enterprise Linux 6.6 Advanced Update Support [3],[4]microcode_ctlRHEA-2018:2302
    Red Hat Enterprise Linux 6.5 Advanced Update Support [3]kernelRHSA-2018:2393
    Red Hat Enterprise Linux 6.5 Advanced Update Support [3]microcode_ctlRHEA-2018:2303 
    Red Hat Enterprise Linux 6.4 Advanced Update Support [3]kernelRHSA-2018:2394
    Red Hat Enterprise Linux 6.4 Advanced Update Support [3]microcode_ctlRHEA-2018:2297 
    Red Hat Enterprise Linux 5 Extended Lifecycle Support [1]kernelRHSA-2018-2602
    Red Hat Enterprise Linux 5 Extended Lifecycle Support [1]microcode_ctlRHEA-2018:2305
    Red Hat Enterprise Linux 5.9 Advanced Update Support [3]kernelRHSA-2018:2603
    Red Hat Enterprise Linux 5.9 Advanced Update Support [3]microcode_ctlRHEA-2018:2295
    RHEL Atomic Host [6]kernelrepsun 17August
    Red Hat Enterprise MRG 2kernel-rtRHSA-2018:2396
    Red Hat Virtualization 4redhat-virtualization-hostRHSA-2018:2403
    Red Hat Virtualization 4rhvm-applianceRHSA-2018:2402
    Red Hat Virtualization 3 Extended Lifecycle Support [1]rhev-hypervisor7RHSA-2018:2404
    Red Hat Enterprise Linux OpenStack Platform 7.0 (Kilo) director for RHEL7 [7]director imagesrespin pending
    Red Hat OpenStack Platform 8.0 (Liberty) [7]director imagesrespin pending
    Red Hat OpenStack Platform 9.0 (Mitaka) [7]director imagesrespin pending
    Red Hat OpenStack Platform 10.0 (Newton) [7]director imagesrespin pending
    Red Hat OpenStack Platform 11.0 (Ocata) [7]director imagesrespin pending
    Red Hat OpenStack Platform 12.0 (Pike) [7]director imagesrespin pending
    Red Hat OpenStack Platform 12.0 (Pike) [7]containersrespin pending
    Red Hat OpenStack Platform 13.0 (Queens) [7]director imagesimages respun
    Red Hat OpenStack Platform 13.0 (Queens) [7]containersimages respun


    [1] An active ELS subscription is required for access to this patch.  Please contact Red Hat sales or your specific sales representative for more information if your account does not have an active ELS subscription.

    [2] An active EUS subscription is required for access to this patch.  Please contact Red Hat sales or your specific sales representative for more information if your account does not have an active EUS subscription.

    What is the Red Hat Enterprise Linux Extended Update Support Subscription?

    [3] An active AUS subscription is required for access to this patch in RHEL AUS.

    What is Advanced mission critical Update Support (AUS)?

    [4]An active TUS subscription is required for access to this patch in RHEL TUS.

    [5] Subscribers should contact their hardware OEMs to get the most up-to-date versions of CPU microcode/firmware.

    [6] For details on how to update Red Hat Enterprise Atomic Host, please see Deploying a specific version fo Red Hat Enterprise Atomic Host.

    [7] Considerations for OpenStack and L1TF


    Mitigation

    Current mitigations include applying vendor software updates combined with hardware OEMs CPU microcode/firmware. All Red Hat customers should apply vendor solutions to patch their CPUs, system BIOS (as needed), and update the kernel as soon as patches are available.  All mitigations are enabled by default with the exception of disabling Hyper-Threading, which customers must take explicit manual steps to turn off. Customers are advised to take a risk-based approach in reviewing and reacting to this issue. Customers will need to assess the security risk and the range of mitigations and their performance impact to create mitigation strategy.  Systems that require high degrees of security and trust should be addressed first, and should be isolated from untrusted systems until such time as treatments can be applied to those systems to reduce the risk of exploit. Customers desiring to completely mitigate this issue will need to consider more securely managing and possibly disabling Hyper-Threading to close off all attack vectors.

    Mitigation/Remediation:

    Complete mitigation of the L1 Terminal Fault requires three changes; Page Table Inversion (a small change to the kernel which is provided and enabled by default in updated kernels), Flushing the L1 Data Cache when switching between virtual machines, and possibly disabling SMT. Each of these changes independently provides some protection against different parts of an attack.  

    Page Table Inversion (which is NOT the same as the Page Table Isolation added in mitigating Meltdown) is performed by default when manipulating not present page table entries in updated kernels provided by Red Hat.

    Flushing the L1 Data Cache is optional and can be implemented by updated microcode or by the updated kernel. It is currently enabled by default when running virtualized guest machines, but can be controlled by the user through kernel parameters.

    SMT Disable is optionally used to secure an untrusted shared environment running virtual machines. This is implemented through software changes to the kernel to add a new control interface. It is also possible to disable Intel Hyper-threading in the BIOS firmware, but this is much more cumbersome and is not the recommended approach.

    Remediation requires this three-pronged approach because the flaw has three contributing factors:

    1. Not Present (P flag = 0) page table entry (PTE) OR one which has reserved bits set holds an uninitialised physical address. And that physical address is cached into the L1 data cache.
    2. Updates caused by the speculative operations to the L1 data cache are not cleared. They can be observed by a subsequent process. 
    3. Simultaneous multi-threads (SMT) share the execution engine and the on-chip processor caches like L1 data and TLB caches.

    Page Table Inversion:

    The processor speculatively accesses physical address from a Not Present PTE and if the contents of this physical address are cached into the L1 data cache, the memory access succeeds. Therefore, if the physical address’ contents were not cached, then said memory access would not leak data. The Page Table Inversion mitigation, updates the physical address in the Not Present PTEs, by setting the high order address bits, such that it points to a physical address that is not in memory and/or is uncacheable. Thus its contents can not be present in the L1 data cache.

    Updated kernel package shows the current status of the system via the sysfs interface as shown:

    # cat /sys/devices/system/cpu/vulnerabilities/l1tf
    Mitigation: PTE Inversion; VMX: SMT vulnerable, L1D conditional cache flushes
    #

    Flush L1 Data Cache:

    Intel x86 processors offer a IA32_FLUSH_CMD Machine Specific Register (MSR), when updated microcode is installed on the system. It can be used to invalidate L1 data cache. The support for IA32_FLUSH_CMD MSR can be seen from /proc/cpuinfo flags OR via the lscpu(1) command which lists CPU flags:

    # lscpu
    Flags:    … ssbd ibrs ibpb stibp spec_ctrl intel_stibp flush_l1d
    #

    If flush_l1d flag is not available, please confirm that you have the updated microcode installed on the system and that system has been rebooted.

    In case the updated microcode is not installed, flush_l1d flag will not be available. The updated kernel would still flush the L1 data cache in software. This will be slower compared to the hardware flush_l1d interface.

    Updated kernel package introduces the following kernel and KVM module parameters to control L1TF mitigations:

    l1tf=[full/full,force/flush/flush,nosmt/flush,nowarn/off]

    This is a kernel boot parameter to control mitigation of the L1TF issue on the affected CPUs. It can take one of the following values

    full:  enables all mitigations for the L1TF issue.

    • Enables L1 data cache flush on every VM entry operation.
    • Disables Hyper-Threading (SMT).

    Both L1 data cache flush and SMT can be controlled at run time, after boot, via the sysfs interface.

    full,force: enables all mitigations for the L1TF issue. 

    • Enables L1 data cache flush on every VM entry operation.
    • Disables Hyper-Threading (SMT).

    The force parameter here prevents users from disabling above L1TF mitigations via the sysfs interface. Users cannot enable SMT at run time.

    flush: only enable L1 data cache flush

    • Enables the default L1 data cache flush mitigation, which is to conditionally flush the L1 data cache.
    • Does not disable Hyper-Threading (SMT).

    Both can still be controlled via the sysfs interface. The hypervisor (KVM) issues a warning if a guest is started with insecure configurations like SMT enabled OR L1 data cache flush disabled.

    flush,nosmt: disables the Hyper-Threading (SMT)

    • Enable conditional L1 data cache flush mitigation. 
    • Disable Hyper-Threading (SMT)

    Both can still be controlled via the sysfs interface. The hypervisor (KVM) issues a warning if a guest is started with insecure configurations like SMT enabled OR L1 data cache flush disabled.

    flush,nowarn: similar to the flush option above, except the KVM module warnings about guests running with insecure configurations are suppressed.

    off: disables hypervisor mitigations, ie. KVM won’t flush the L1 data cache on VM entry. 

    Default l1tf boot parameter value is set to the flush mitigation option above.

    kvm-intel.vmentry_l1d_flush: KVM module parameter to control L1 data cache flush operations on each VM entry. It can have possible values as shown:

       always     L1D cache flush on every VMENTER.
       cond       Conditional L1D cache flush.
       never      Disable the L1D cache flush mitigation.

    cond is trying to avoid L1D cache flushes on VMENTER if the code executed between VMEXIT and VMENTER is considered safe, i.e. is not bringing any
    interesting information into L1D which might be exploited.

            # cat  /sys/module/kvm_intel/parameters/vmentry_l1d_flush
           cond
           #

    This parameter is set to cond by default; it performs an L1 Data Cache flush upon selective VM entry instances. This should help to reduce the performance impact of the L1 data cache flush.

    Control SMT:

    When simultaneous multithreading or Hyper-Threading technology is in use, unrelated threads running on the same core and sharing processor cache resources can read each other’s data from the L1 data cache by leveraging the L1TF issue. If you are running an untrusted shared environment you will want to consider disabling SMT as part of your mitigation strategy. Generally SMT can be enabled or disabled from system Bios.

    The updated kernel packages introduce the new command line parameter nosmt. 

    nosmt: Disable simultaneous multithreading (SMT) . It can be re-enabled at run time via a sysfs interface.

    nosmt=force: Disable simultaneous multithreading (SMT). It can not be re-enabled at run time.

    NOTE: There are controls in the kernel sysfs interface to manage multithreading at run time. Though their use can lead to erroneous behavior when SMT is re-enabled. Therefore, it is strongly recommended to use either the BIOS settings or the kernel boot parameters listed above to enable/disable SMT. The runtime flags are described here only as a reference:

    /sys/devices/system/cpu/smt/active
    /sys/devices/system/cpu/smt/control

    The active file above is a read-only interface. When the active file contains ‘1’, it means that SMT is enabled and sibling threads are online. When the kernel is booted with the ‘nosmt’ parameter, it essentially writes ‘0’ to the active file above. 

    The control file noted above is a read/write interface to control SMT. It can have the following values:

    on             :  SMT is enabled. Enable sibling threads if they were offline.
    off            :  SMT is disabled. Disable sibling threads if they were online.
    forceoff :  SMT is forcibly disabled and cannot be changed. It cannot be
                     Controlled or changed at run time.
    notsupported: the processor does not support Hyper-Threading (SMT).

    Disable EPT:

    Disabling Extended Page Tables (EPT) for KVM guests mitigates the L1TF issue, because it essentially lets the VMM/Hypervisor manage the address translations for the guest. If EPT is disabled, one does not need to disable Hyper-Threads (SMT) and flush L1 data cache as listed above. 

    Please note: disabling EPT will significantly hamper the system performance. It may not be a viable option.

    EPT can be disabled in the hypervisor via the kvm-intel.ept parameter. It is enabled by default.

    Red Hat recommends customers apply the microcode/firmware update provided by your hardware or CPU vendor and also install these updated kernels as soon as possible. Software updates can be applied independently from the hardware microcode, but will not take effect until the CPU firmware has been updated.

    Tuned

    The new L1TF kernels provide controls to allow the user to disable Hyper-Threading either at runtime or at kernel boot time.  If either of those two methods are used, and the system is running any one of the following four tuned profiles:

          cpu-partitioning    
          realtime
          realtime-virtual-host
          realtime-virtual-guest

    then an updated tuned will be needed.  This only impacts RHEL 7.4z and 7.5z. An updated tuned is not needed if Hyper-Threading is disabled in the BIOS.

    Comments