What are DRAM faults?
Dynamic random access memory (DRAM) provides the bulk of low-latency random-access data storage in a computer.
DRAM-related faults can be roughly classified along the following lines:
- Data is read from an incorrect address.
- Data is written to an incorrect address.
- Data is corrupted while it is read (that is, wrong data is returned, but the data in memory is unchanged).
- Data is corrupted while it is written (that is, it is written to the right address, but the wrong bits are stored).
- Data is corrupted spontaneously, without a write to that memory address.
The “dynamic” part of DRAM refers to the fact that a periodic refresh is needed to avoid spontaneous corruption (item 5 in the list above). The system will perform this refresh automatically, in the background. The refresh operation needs energy and blocks access to the memory regions being refreshed, which is why there is an incentive to minimize refresh.
Systems come with varying levels of protection against DRAM faults. During hardware design, simulations are used to make sure that the refresh operation is sufficient to prevent data corruption, and that the memory bus does not corrupt addresses and data in transit (items 1 to 4 in the list). Current consumer systems typically do not protect stored data at all because there is no redundancy, and rely solely on a well-designed memory refresh. Many server systems use DRAM with error-correction codes (ECC), greatly increasing resiliency against spontaneous corruption. Some server systems have further protections.
How can DRAM faults be triggered?
DRAM faults can happen spontaneously, like any hardware failure. Sometimes this is attributed to “cosmic rays”.
If DRAM modules are defective or there is some other hardware defect, such as a faulty power supply that operates outside its specification, DRAM faults can happen at a rate that they are quite noticeable and affect system stability.
It has been known for a long time that certain memory access patterns are more likely to expose defective DRAM modules. The
memtest86+ RAM tester provided as part of Red Hat Enterprise Linux uses such test patterns to expose hardware defects in the memory subsystem.
How does the RowHammer attack work?
“Rowhammer” refers to a particular technique for DRAM stressing, using the CLFLUSH/CLFLUSHOPT/CLWB family of CPU instructions. This family of instruction is an unprivileged instruction provided by the i386 and x86_64 architectures. It can be abused to increase DRAM traffic to specific locations. This makes it particularly interesting for writing DRAM fault inducers. These programs can run as ordinary processes, in hypervisor guests, because the CLFLUSH instruction does not require special privileges.
The kind of corruption induced by “rowhammer” corresponds to item 5 in the initial list. Stored data is altered, without an explicit memory write, and not necessarily at the addresses being accessed. Reportedly, this happens because the access pattern invalidates designed-in assumptions about the required DRAM refresh rate: the memory refresh does not happen often enough to preserve the stored data reliably when DRAM is accessed in this way.
The CLFLUSH instruction allows to create a particularly effective memory stresser that is difficult for a large variety of systems. CPUs provide other means (such as non-temporal memory access) which can bypass caches as well. A particularly clever stress tester might achieve a very similar effect by carefully chosing memory accesses, evicting cache lines as desired.
What is the SPOILER vulnerability and how does it relate to the RowHammer attack?
The SPOILER vulnerability is a micro-architectural leakage which allows an attacker to determine virtual-to-physical page mappings in unprivileged user space processes. It leverages data dependency of speculative load and store operations in the Memory Order Buffer and uses
mfence instructions to measure the timing discrepancies that reveal memory layout. This allows to detect ranges of contiguous physical memory pages which makes RowHammer much more effective and easier, just seconds of an attack instead of weeks.
What is the impact of DRAM faults?
Memory is a fundamental system component. Everything depends on its correct operation, and if it does not work correctly, all bets are off. Consequently, most DRAM faults have the potential to undermine the correct operation of the system. This includes enforcement of security boundaries and protection of critical data, such as private key material.
If such faults happen, both local and remote attackers may benefit from it. As usual, local attackers are in a better position to trigger DRAM faults explicitly.
Are sandboxing solutions affected?
SELinux and containers do not offer protection because they do not intercept the entire instruction stream before execution, so they cannot block instructions such as CLFLUSH.
What about hypervisors?
Hypervisors such as RHEV do not currently provide protection against DRAM faults and their abuse by guests.
Is the Chrome browser able to protect users?
Google has publicly claimed that they addressed a vulnerability related to DRAM fault injection in their Chrome browser, specifically in the Native Client functionality.
The Native Client component of Chrome uses a trusted compiler approach to execute native machine code downloaded from untrusted sources. The instruction stream is scanned and audited for dangerous instructions which would allow escaping the sandbox. Previous Chrome versions permitted the CLFLUSH instruction, which enabled CLFLUSH-based DRAM fault inducers to run. Native Client in current Chrome versions does not permit the CLFLUSH, which prevents this approach from working.
Is a software-based solution possible for this kind of problem?
It might be theoretically possible to map memory to hypervisor guests or operating system processes in such a way that “rowhammer”-style DRAM fault injection only affects data within the same security domain (guest or process). It is unlikely that this is feasible in practice for several reasons:
- It would be necessary to disable read-only page sharing across security boundaries (such as Kernel Same-page merging, or shared read-only mappings between processes), greatly reducing density of system loads.
- The mechanism is highly dependent on the memory configuration, and would only be effective for very specific system configurations, depending on CPU silicon and microcode revision, system firmware version, mainboard revision, and so on.
Even today, on systems with such facilities, it is possible to monitor the occurrence of MCE events (using
mcelog), and especially EDAC (Error Detection And Correction) counters, using the
edac-util command from the
edac-utils package. These tools can be used to spot the early signs of DRAM-related system degradation, before exploitable DRAM faults happen.
What should customers do to deal with DRAM faults?
Red Hat considers DRAM faults an issue that hardware vendors need to address, with sufficient design reserves and technologies like ECC memory.
Red Hat recommends to run
memtest86+ in case DRAM defects are suspected, for example after the kernel has logged MCE events. See this article for information about relevant log messages.
System vendors may have additional information about how DRAM faults affect their hardware platforms, and Red Hat customers are advised to contact them for additional information, as required.
A vulnerability named RAMBleed CVE-2019-0174 was discovered in contemporary industry wide DRAM memory implementations which potentially allows an unprivileged attacker to read out certain memory belonging to other processes by leveraging the Rowhammer bit-flipping effect. The data read may otherwise be inaccessible, and could include potentially secret information. RAMBleed is a side channel read vulnerability as the Rowhammer-induced bit-flips allow attackers to deduce values of bits in the memory belonging to other processes. Surrounding victim data pages with carefully constructed attacker pages on which hammering is performed can allow data dependent bit flips to be induced in one of the attacker controlled pages, and allow data to be reconstructed.
Unlike previous Rowhammer attack, RAMBleed does not require the use of huge pages and hugeTLB and the proof-of-concept code can read memory at a rate of about 3-4 bits per second. The attack is not architecture specific and many if not all DRAM memory implementations vulnerable to Rowhammer are vulnerable to RAMBleed.
There are a few commonly proposed hardware-based mitigations against Rowhammer that have potential to also mitigate RAMBleed. These are Targeted Row Refresh (TRR), increased DRAM refresh intervals (doubled DRAM refresh rate), and use of ECC memory. The extent to which these strategies may actually mitigate the problem varies and is hardware platform specific. Vendors are anticipated to provide suitable platform specific guidance.
To date, the attack has not been demonstrated on server platforms in part due to the added complexity introduced by the widespread use of ECC memory. However, researchers have documented that it is theoretically possible to circumvent ECC, creating persistent flips even in the presence of ECC memory and perform the attack against servers, and likely across VMs as well. While ECC memory does complicate the Rowhammering process, it does not prevent Rowhammer and thus not RAMBleed.