TLBleed - side-channel attack over shared TLBs

Solution In Progress - Updated -

Environment

  • Red Hat Ansible Engine
  • Red Hat CloudForms
  • Red Hat Enterprise Linux 5
  • Red Hat Enterprise Linux 6
  • Red Hat Enterprise Linux 7
  • Red Hat Enterprise Application Platform 5
  • Red Hat Enterprise Application Platform 6
  • Red Hat Enterprise Application Platform 7
  • Red Hat JBoss Core Services
  • Red Hat JBoss Web Server 2
  • Red Hat JBoss Web Server 3
  • Red Hat OpenStack Platform
  • Red Hat OpenShift Container Platform
  • Red Hat Satellite
  • Red Hat Subscription Asset Manager
  • Red Hat Virtualization

Issue

  • I've read articles on the Internet that describe a problem called "TLBleed"
  • I have security concerns around side-channel attacks
  • I've heard about problems with cryptographic packages

Resolution

Red Hat Product Security has rated this update as having a security impact of Moderate. All Red Hat products are being evaluated for impact and Red Hat will issue updates as soon as they are available. Red Hat customers running affected versions of the Red Hat products are strongly recommended to update them as soon as errata are available.

Mitigations

Short term mitigations can be split into the following two categories:

  • Prevent attacker code from running on co-resident hyperthreads. This can be performed by following (and providing) industry standard guidance, which has always been not to share hyperthreads between two unrelated containers, or VMs. For use cases where customers do not provide direct local system access but do run user containers or VMs, this may be sufficient.
  • In the case of untrusted local users on the same underlying physical or virtual system, it may be necessary to further tune thread pinning, or even to disable Intel Hyper-Threading.

There are two primary recommended methods of disabling Hyper-Threading on the Red Hat Enterprise Linux platforms.

  • Disable at the BIOS level.
  • Disable the CPUs via the sysfs tunables.

For more information about Hyper-threading in Red Hat Enterprise Linux, please refer to the following article:
Reviewing and Disabling Hyper-Threading in Red Hat Enterprise Linux

Root Cause

Overview

A side-channel attack was found in certain implementations of contemporary computer architectures where an attacker can use temporal analysis in order to observe and subsequently reconstruct sensitive or secret data, such as the encryption keys used by an application or library. Such temporal analysis can be used in traditional cache side-channel attacks, but it can also be used to attack other shared microarchitectural resources, such as the Translation Lookaside Buffers (TLBs) that are often shared between sibling Symmetric Multi-Threading (SMT) threads.

In the example “TLBleed” attack, the side-channel exists in the L1 data Translation Lookaside Buffer (L1 dTLB) used by Intel Hyper-Threads. A similar side-channel may exist in other microprocessor implementations, such as those from AMD, IBM, and Arm, although such analysis is still in progress and will depend upon the specific implementation choices used (in particular, the displacement or eviction algorithm used to replace TLB entries).

This can allow an attacker who can execute user-privileged code the ability to use this side channel to determine private keys used in cryptographic functions. It can also (theoretically) be used to perform other timing attacks against systems, such as against kernel threads running on co-resident SMT threads. This in theory could be used to extract information from the kernel, or from other software (such as the qemu process that backs VMs), although no specific attack has yet been found beyond the example attack upon a specific user encryption library. We should consider this a new and novel form of timing analysis side-channel that will spawn other attacks.

Background

Side-channel attacks are not new and many of the new attacks build on existing methods. In particular, timing side-channel attacks have been common as a means for attacks against security/encryption software for many years. For this reason, it is standard industry practice to create “constant time” operations when building encryption software resistant to such attacks.

Indeed, many microprocessors are beginning to provide “constant time” variants of popular instructions, in which the instruction latencies themselves cannot be used to infer the operations (e.g. not using bypassing structures within the CPU to “shortcut” the end of certain complex calculations). In addition, vendors such as Intel provide guidance and even compiler intrinsics that can be used in encryption libraries in order to provide for “constant time” operation. Nonetheless, it is common for software developers not to consider timing side-channels, especially outside of specialist crypto libraries.

As an example, NCCgroup had a similar paper which outlined an attack that required shared memory, for the instruction set to implement the timing. This new attack mechanism does not require shared memory, and instead leverages a different shared resource (the Translation Lookaside Buffer within the CPU). Any competitively shared resource can be used for a timing attack.

Modern microprocessors leverage a concept known as a “Translation Lookaside Buffer” (TLB) in order to cache translations between virtual memory addresses (VAs) to physical memory addresses (PAs) used during processor load and store operations (reads and writes to and from memory). When the CPU needs to access a value in memory, or on the stack, it uses software managed “page tables” to perform a translation between the virtual addresses used by an application (or the Operating System kernel) and those of the physical underlying memory sub-system wherein the data actually resides. Because the translation from virtual to physical address is a slow operation, TLB caches are used.

TLBs cache translations at the “page” level of granularity. That is to say 4KB on Intel x86 machines, 64KB on IBM POWER and Arm (in the configuration used by Red Hat). Other sizes of page can be in use (so called “huge”, “enormous”, or even “gigantic” pages) formed from contiguous ranges of the smaller size. These other sizes are common optimizations used in order to save pressure on the TLB structures. They can have benefit in mitigation and will be further discussed later. For the moment, consider only the fundamental 4KB page.

TLBs are “competitively shared” microarchitectural resources in some designs. In particular, Intel implements SMT (Symmetric Multi-Threading) in their “Intel Hyper-Threading” technology. Sibling Hyperthreads share many compute resources but provide a lightweight context that allows the Operating System to provide the illusion of an additional core-like “thread” upon which an application or kernel thread can be scheduled to share resources with another running application or kernel thread. Because such threads are competitive in their use of underlying resources, the Operating System scheduler attempts an amount of intelligence in how it schedules onto co-resident sibling threads.

The Intel Hyper-Threading design provides separate TLBs for instruction and data memory. A separated instruction TLB (iTLB) per sibling thread caches the translations used for running code while the sibling threads share an L1 dTLB (data TLB) for their accesses to virtual memory. This is similar to how they also share an L1 data cache (L1D$), but is not the same. The shared L1 dTLB is “competitive” in the sense that memory translations used by one thread will displace those used by the peer thread using an algorithm, similar to how data caches use a pseudo-LRU replacement algorithm to evict older “Least Recently Used” cache entries when competitive threads perform loads and stores.

In the Intel case, a unified non-inclusive L2 TLB is used to provide additional translations and will also be searched for translations. The Intel design uses an associative cache type design for the individual TLB entries with a computable replacement algorithm, that is similar to the one used in other caches, in which virtual addresses are hashed (and XORed) to generate the TLB entry that will be used to lookup a translation. Such an associative design is more easily exploited because an attacker on one thread can predetermine which virtual memory address (VA) accesses on the sibling thread will displace shared entries with the co-resident attacker thread, thus they are able to infer through monitoring evictions of TLB entries to which they have access which other VA entries are used by the sibling thread.

Other microprocessors implement their data side TLBs differently. At this time, the TLBleed author has been unable to reproduce the Intel Hyper-Threading attack against the AMD implementation in the EPYC processor.

It should be noted that “TLBleed” uses a novel approach of temporal analysis against the TLBs rather than spatial as has been performed when attacking TLBs in the past. In the common cache side-channels (such as “Spectre”), the cache side-channel measurement is formed from monitoring which data cache entries are evicted by vulnerable target code. In the case of this exploit, simply using the displacement of TLB entries would yield insufficient granularity of page-level (or worse, huge page level) translations. Instead, the exploit relies upon timing when the translations are used. This can be fed into a complex statistical model that can apply machine learning techniques to derive (from the timing of TLB hits) what the data being operated upon actually was. Thus vulnerable code must be unsafe from a constant time perspective.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.