DRAM-Related Faults (Rowhammer, SPOILER, RAMBleed, TRRespass including Blacksmith, Half-Double)

Updated -

Dynamic random access memory (DRAM) provides the bulk of low-latency random-access data storage in a computer system. It takes the form of a hardware chip fixed onto the motherboard. The DRAM chip holds millions of memory cells, typically arranged as two dimensional array. A memory cell is made of capacitor and transistor. While capacitor holds electric charge, transistor helps with its transmission. Each memory cell can hold one bit of information, indicated by its electric charge. Positive electric charge indicates value 1, whereas negative charge indicates 0. A memory cell in a row is accessed by activating the wordline of a row. If the same row of memory is accessed repeatedly, its wordline is activated and deactivated each time. Such repeated activation and deactivation of a wordline may introduce DRAM fault.

What are DRAM faults?

DRAM-related faults can be roughly classified along the following lines:

  1. Data is read from an incorrect address.
  2. Data is written to an incorrect address.
  3. Data is corrupted while it is read (that is, wrong data is returned, but the data in memory is unchanged).
  4. Data is corrupted while it is written (that is, it is written to the right address, but the wrong bits are stored).
  5. Data is corrupted spontaneously, without a write to that memory address.

The "dynamic" part of DRAM refers to the fact that a periodic refresh is needed to avoid spontaneous corruption (item 5 in the list above). The system will perform this refresh automatically, in the background. The refresh operation needs energy and blocks access to the memory regions being refreshed, which is why there is an incentive to minimize refresh.

Systems come with varying levels of protection against DRAM faults. During hardware design, simulations are used to make sure that the refresh operation is sufficient to prevent data corruption, and that the memory bus does not corrupt addresses and data in transit (items 1 through 4 in the list). Current consumer systems typically do not protect stored data at all because there is no redundancy, relying solely on a well-designed memory refresh. Many server systems use DRAM with error-correction codes (ECC), greatly increasing resiliency against spontaneous corruption. Some server systems have further protections.

How can DRAM faults be triggered?

DRAM faults can happen spontaneously, like any hardware failure. If DRAM modules are defective or there is some other hardware defect, such as a faulty power supply that operates outside its specification, DRAM faults can happen at a rate that are quite noticeable and affect system stability.

It has been known for a long time that certain memory access patterns are more likely to expose defective DRAM modules. The memtest86+ RAM tester provided as part of Red Hat Enterprise Linux uses such test patterns to expose hardware defects in the memory subsystem.

How does the Rowhammer attack work?

Rowhammer refers to a particular technique for DRAM stressing using the CLFLUSH, CLFLUSHOPT, and CLWB family of CPU instructions. This family of instructions are unprivileged instructions provided by the i386 and x86_64 architectures. They can be abused to increase DRAM traffic to specific locations. This makes it particularly interesting for writing DRAM fault inducers. These programs can run as ordinary processes, in hypervisor guests, because the CLFLUSH instruction does not require special privileges.

The kind of corruption, induced by Rowhammer, corresponds to item 5 in the initial list. Stored data is altered, without an explicit memory write, and is not necessarily at the addresses being accessed. This happens because the access pattern invalidates designed-in assumptions about the required DRAM refresh rate: the memory refresh does not happen often enough to preserve the stored data reliably when DRAM is accessed in this way.

The CLFLUSH instruction provides a particularly effective memory stresser that is difficult for a large variety of systems. CPUs provide other means (such as non-temporal memory access) which can bypass caches as well. A particularly clever stress tester might achieve a very similar effect by carefully choosing memory accesses; evicting cache lines as desired.

Blacksmith (CVE-2021-42114)

Blacksmith is an extension on the TRRespass family of attacks, using more complicated patterns to induce the same kind of bit-flips. According to research published, it is more capable of bypassing hardware mitigations in various models of DRAM (like DDR4).

TRRespass (CVE-2020-10255)

As mitigation against such memory stressers, system vendors have built a Target Row Refresh (TRR) mechanism into DRAMs. TRR essentially refresh the memory row being targeted to induce bit flips, thus avoiding the spontaneous cell corruption via the Rowhammer effect.

Latest research in this area indicates that the published TRR mechanism is not a single mitigation and certainly not one which fixes all DRAMs or Rowhammer variants. The said TRR protection mechanism may be implemented in a memory-controller unit (MMU) or inside the DRAM chip. Research indicates that on many new machines with the latest DDR4 DRAM chips, the TRR mechanism is either not present or not enabled by default. Server class machines have TRR enabled by default whereas retail consumer machines may not have it. The mitigations also don't seem to cover the entire memory address space; it may be constrained that way.

TRR protection mechanism(s) remain quite opaque and there is no documentation about how it protects the DRAM against numerous Rowhammer variants. To overcome this opaqueness around TRR, the 'TRRespass' tool is built. It is a many-sided Rowhammer fuzzer. It can help find newer patterns of memory access which may induce spontaneous bit corruption.

SPOILER: how does it relate to the Rowhammer attack?

The SPOILER vulnerability is a micro-architectural leakage which allows an attacker to determine virtual-to-physical page mappings in unprivileged user space processes. It leverages the data dependency of speculative load and store operations in the memory order buffer and uses rdtscp and mfence instructions to measure the timing discrepancies that reveal memory layout. This allows a detection of ranges of contiguous physical memory pages which makes Rowhammer much more effective and easier; just seconds of an attack instead of weeks.

The SPOILER vulnerability is specific to Intel CPUs and manifests itself starting from the 1st generation of Intel Core processors. This vulnerability is different and separate from the Spectre vulnerabilities. It can be potentially exploited by malicious JavaScript code executed by a web browser or untrusted code running on a system.

RAMBleed

A vulnerability named RAMBleed CVE-2019-0174 was discovered in contemporary industry wide DRAM memory implementations which allows an unprivileged attacker to read out certain memory belonging to other processes by levaraging the Rowhammer bit-flipping effect. The data read may otherwise be inaccessible and could include potentially secret information. RAMBleed is a side channel read vulnerability as the Rowhammer-induced bit-flips allow attackers to deduce values of bits in the memory belonging to other processes. Surrounding victim data pages with carefully constructed attacker pages on which hammering is performed can allow data dependent bit flips to be induced in one of the attacker controlled pages and allow data to be reconstructed.

Unlike the previous Rowhammer attack, RAMBleed does not require the use of huge pages and hugeTLB; the proof-of-concept code can read memory at a rate of about 3-4 bits per second. The attack is not architecture specific and many if not all DRAM memory implementations vulnerable to Rowhammer are vulnerable to RAMBleed.

There are a few commonly proposed hardware-based mitigations against Rowhammer that have potential to also mitigate RAMBleed. These are Targeted Row Refresh (TRR), increased DRAM refresh intervals (doubled DRAM refresh rate), and use of ECC memory. The extent to which these strategies may actually mitigate the problem varies and is hardware platform specific. Vendors are anticipated to provide suitable platform specific guidance.

To date, the attack has not been demonstrated on server platforms due to the added complexity introduced by the widespread use of ECC memory. However, researchers have documented that it is theoretically possible to circumvent ECC, creating persistent flips even in the presence of ECC memory and perform the attack against servers, and likely across VMs as well. While ECC memory does complicate the Rowhammering process, it does not prevent Rowhammer and thus not RAMBleed.

Half Double

The row hammer attacks were attacking the system at a distance of one row adjacent to the cell that was being attacked by accessing the DRAM row adjacent to the target bit. The Half-double attack proved that this attack can influnce the target cell based on access to rows that are not directly beside the adjacent neighbor. This allows the attacker to use additional rows in their target attack and make existing hardware mitigations ineffective. This Half-Double attack exploits the physical properties of the silicon and will become more viable as memory chip density increase.

JEDEC has published two documents about DRAM and system-level mitigation techniques for Half-double attacks( JEP 300-1 and JEP301-1).

What is the impact of DRAM faults?

Memory is a fundamental system component; everything depends on its correct operation. Consequently, most DRAM faults have the potential to undermine the correct operation of the system. This includes enforcement of security boundaries and protection of critical data, such as private key material.

If such faults happen, both local and remote attackers may benefit from it. As usual, local attackers are in a better position to trigger DRAM faults explicitly.

Are sandboxing solutions affected?

Java and JavaScript sandboxes do not provide access to the CLFLUSH instruction, which means that this particular variant of DRAM stressing will not work. Current versions of both languages, however, provide sufficiently flexible data structures, so it may be possible to induce DRAM faults by other means.

SELinux and containers do not offer protection because they do not intercept the entire instruction stream before execution, so they cannot block instructions such as CLFLUSH.

What about hypervisors?

Hypervisors such as RHEV do not currently provide protection against DRAM faults and their abuse by guests.

Is the Chrome browser able to protect users?

  • -> https://nvd.nist.gov/vuln/detail/CVE-2015-0565

Google has publicly claimed that they addressed a vulnerability related to DRAM fault injection in their Chrome browser, specifically in the Native Client functionality.

The Native Client component of Chrome uses a trusted compiler approach to execute native machine code downloaded from untrusted sources. The instruction stream is scanned and audited for dangerous instructions which would allow escaping the sandbox. Previous Chrome versions permitted the CLFLUSH instruction, which enabled CLFLUSH-based DRAM fault inducers to run. Native Client in current Chrome versions does not permit the CLFLUSH, which prevents this approach from working.

As explained above, CLFLUSH is not necessary for stressing DRAM, so the protection in Chrome may be incomplete. It only applies to the Native Client vector, and there might be other methods to trigger DRAM faults with a web browser, for example using JavaScript.

Is a software-based solution possible for this kind of problem?

It might be theoretically possible to map memory to hypervisor guests or operating system processes in such a way that Rowhammer-style DRAM fault injection only affects data within the same security domain (guest or process). It is unlikely that this is feasible in practice for several reasons:

  • It would be necessary to disable read-only page sharing across security boundaries (such as kernel same-page merging, or shared read-only mappings between processes), greatly reducing density of system loads.
  • The mechanism is highly dependent on the memory configuration, and would only be effective for very specific system configurations, depending on CPU silicon and microcode revision, system firmware version, mainboard revision, and so on.

Even today, on systems with such facilities, it is possible to monitor the occurrence of MCE events (using mcelog), and especially EDAC (Error Detection And Correction) counters, using the edac-util command from the edac-utils package. These tools can be used to spot the early signs of DRAM-related system degradation, before exploitable DRAM faults happen.

What should customers do to deal with DRAM faults?

Red Hat considers DRAM faults an issue that hardware vendors need to address, with sufficient design reserves and technologies like ECC memory.

Red Hat recommends to run memtest86+ in case DRAM defects are suspected, for example after the kernel has logged MCE events.

System vendors may have additional information about how DRAM faults affect their hardware platforms, and Red Hat customers are advised to contact them for additional information, as required.

References

  • https://comsec.ethz.ch/wp-content/files/blacksmith_sp22.pdf
  • https://comsec.ethz.ch/research/dram/blacksmith/
  • https://users.ece.cmu.edu/~yoonguk/papers/kim-isca14.pdf
  • https://www.vusec.net/projects/trrespass
  • https://download.vusec.net/papers/trrespass_sp20.pdf
  • https://rambleed.com/
  • https://arxiv.org/pdf/1805.04956.pdf
  • https://arxiv.org/pdf/1912.03076.pdf
  • https://www.vusec.net/projects/eccploit/
  • https://www.vusec.net/projects/throwhammer/
  • https://googleprojectzero.blogspot.com/2015/03/exploiting-dram-rowhammer-bug-to-gain.html
  • http://aod.teletogether.com/sec/20140519/SAMSUNG_Investors_Forum_2014_session_1.pdf
  • https://www.idginsiderpro.com/article/3529519/rowhammer-memory-attacks-close-in-on-the-real-world.html

Comments