### **Memory Systems** # and Memory-Centric Computing Systems Lecture 2b: RowHammer Prof. Onur Mutlu omutlu@gmail.com https://people.inf.ethz.ch/omutlu 13 June 2019 TU Wien Fast Course 2019 Carnegie Mellon # Four Key Directions Fundamentally Secure/Reliable/Safe Architectures - Fundamentally Energy-Efficient Architectures - Memory-centric (Data-centric) Architectures Fundamentally Low-Latency Architectures Architectures for Genomics, Medicine, Health ### The Story of RowHammer - One can predictably induce bit flips in commodity DRAM chips - □ >80% of the tested DRAM chips are vulnerable - First example of how a simple hardware failure mechanism can create a widespread system security vulnerability Forget Software—Now Hackers Are Exploiting Physics BUSINESS CULTURE DESIGN GEAR SCIENCE NDY GREENBERG SECURITY 08.31.16 7:00 AM # FORGET SOFTWARE—NOW HACKERS ARE EXPLOITING PHYSICS # Maslow's (Human) Hierarchy of Needs Maslow, "A Theory of Human Motivation," Psychological Review, 1943. Self-fulfillment Selfneeds Maslow, "Motivation and Personality," actualization: achieving one's Book, 1954-1970. full potential, including creative activities Esteem needs: prestige and feeling of accomplishment Psychological needs Belongingness and love needs: intimate relationships, friends Safety needs: security, safety Basic We need to start with reliability and security... needs Physiological needs: food, water, warmth, rest # How Reliable/Secure/Safe is This Bridge? # Collapse of the "Galloping Gertie" ### How Secure Are These People? Security is about preventing unforeseen consequences ### The DRAM Scaling Problem - DRAM stores charge in a capacitor (charge-based memory) - Capacitor must be large enough for reliable sensing - Access transistor should be large enough for low leakage and high retention time - Scaling beyond 40-35nm (2013) is challenging [ITRS, 2009] DRAM capacity, cost, and energy/power hard to scale ### As Memory Scales, It Becomes Unreliable - Data from all of Facebook's servers worldwide - Meza+, "Revisiting Memory Errors in Large-Scale Production Data Centers," DSN'15. # Large-Scale Failure Analysis of DRAM Chips - Analysis and modeling of memory errors found in all of Facebook's server fleet - Justin Meza, Qiang Wu, Sanjeev Kumar, and Onur Mutlu, "Revisiting Memory Errors in Large-Scale Production Data Centers: Analysis and Modeling of New Trends from the Field" Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Rio de Janeiro, Brazil, June 2015. [Slides (pptx) (pdf)] [DRAM Error Model] ### Revisiting Memory Errors in Large-Scale Production Data Centers: Analysis and Modeling of New Trends from the Field Justin Meza Qiang Wu\* Sanjeev Kumar\* Onur Mutlu Carnegie Mellon University \* Facebook, Inc. 10 ### Infrastructures to Understand Such Issues Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors (Kim et al., ISCA 2014) Adaptive-Latency DRAM: Optimizing DRAM Timing for the Common-Case (Lee et al., HPCA 2015) <u>AVATAR: A Variable-Retention-Time (VRT)</u> <u>Aware Refresh for DRAM Systems</u> (Qureshi et al., DSN 2015) An Experimental Study of Data Retention Behavior in Modern DRAM Devices: Implications for Retention Time Profiling Mechanisms (Liu et al., ISCA 2013) The Efficacy of Error Mitigation Techniques for DRAM Retention Failures: A Comparative Experimental Study (Khan et al., SIGMETRICS 2014) ### Infrastructures to Understand Such Issues ### SoftMC: Open Source DRAM Infrastructure Hasan Hassan et al., "SoftMC: A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies," HPCA 2017. - Flexible - Easy to Use (C++ API) - Open-source github.com/CMU-SAFARI/SoftMC ### SoftMC https://github.com/CMU-SAFARI/SoftMC # SoftMC: A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies ``` Hasan Hassan Nandita Vijaykumar Samira Khan Saugata Ghose Kevin Chang Gennady Pekhimenko Donghyuk Lee Gennady Pekhimenko Onur Mutlu Nandita Vijaykumar Samira Khan Saugata Ghose Kevin Chang Gennady Pekhimenko Onur Mutlu Nandita Vijaykumar Samira Khan Saugata Ghose Vij ``` ``` <sup>1</sup>ETH Zürich <sup>2</sup>TOBB University of Economics & Technology <sup>3</sup>Carnegie Mellon University <sup>4</sup>University of Virginia <sup>5</sup>Microsoft Research <sup>6</sup>NVIDIA Research ``` ### Data Retention in Memory [Liu et al., ISCA 2013] Retention Time Profile of DRAM looks like this: 64-128ms >256ms 128-256ms **Stored value pattern** dependent **Time** dependent ### Analysis of Data Retention Failures [ISCA'13] Jamie Liu, Ben Jaiyen, Yoongu Kim, Chris Wilkerson, and Onur Mutlu, "An Experimental Study of Data Retention Behavior in Modern DRAM **Devices: Implications for Retention Time Profiling Mechanisms**" Proceedings of the 40th International Symposium on Computer Architecture (ISCA), Tel-Aviv, Israel, June 2013. Slides (ppt) Slides (pdf) ### An Experimental Study of Data Retention Behavior in **Modern DRAM Devices:** Implications for Retention Time Profiling Mechanisms Jamie Liu\* 5000 Forbes Ave. Pittsburgh, PA 15213 jamiel@alumni.cmu.edu Ben Jaiyen<sup>\*</sup> Carnegie Mellon University Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA 15213 bjaiyen@alumni.cmu.edu Yoongu Kim Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA 15213 yoonguk@ece.cmu.edu Chris Wilkerson Intel Corporation 2200 Mission College Blvd. Santa Clara, CA 95054 chris.wilkerson@intel.com Onur Mutlu Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA 15213 onur@cmu.edu ### Mitigation of Retention Issues [SIGMETRICS'14] Samira Khan, Donghyuk Lee, Yoongu Kim, Alaa Alameldeen, Chris Wilkerson, and Onur Mutlu, "The Efficacy of Error Mitigation Techniques for DRAM Retention **Failures: A Comparative Experimental Study**" Proceedings of the ACM International Conference on Measurement and <u>Modeling of Computer Systems</u> (**SIGMETRICS**), Austin, TX, June 2014. [Slides (pptx) (pdf)] [Poster (pptx) (pdf)] [Full data sets] ### The Efficacy of Error Mitigation Techniques for DRAM Retention Failures: A Comparative Experimental Study Samira Khan⁺∗ samirakhan@cmu.edu Donghyuk Lee<sup>†</sup> donghyuk1@cmu.edu Yoongu Kim<sup>†</sup> yoongukim@cmu.edu Alaa R. Alameldeen\* alaa.r.alameldeen@intel.com chris.wilkerson@intel.com Chris Wilkerson\* Onur Mutlu<sup>†</sup> onur@cmu.edu <sup>†</sup>Carnegie Mellon University \*Intel Labs ### A Curious Discovery [Kim et al., ISCA 2014] # One can predictably induce errors in most DRAM memory chips #### DRAM RowHammer # A simple hardware failure mechanism can create a widespread system security vulnerability Forget Software—Now Hackers Are Exploiting Physics BUSINESS CULTURE DESIGN GEAR SCIENCE SHARE ANDY GREENBERG SECURITY 08.31.16 7:00 AM # FORGET SOFTWARE—NOW HACKERS ARE EXPLOITING PHYSICS #### Modern DRAM is Prone to Disturbance Errors Repeatedly reading a row enough times (before memory gets refreshed) induces disturbance errors in adjacent rows in most real DRAM chips you can buy today ### Most DRAM Modules Are Vulnerable A company **B** company **C** company Up to $1.0 \times 10^7$ errors Up to 2.7×10<sup>6</sup> errors Up to $3.3 \times 10^5$ errors ### Recent DRAM Is More Vulnerable ### Recent DRAM Is More Vulnerable ### Recent DRAM Is More Vulnerable All modules from 2012-2013 are vulnerable ### Why Is This Happening? - DRAM cells are too close to each other! - They are not electrically isolated from each other - Access to one cell affects the value in nearby cells - due to electrical interference between - the cells - wires used for accessing the cells - Also called cell-to-cell coupling/interference - Example: When we activate (apply high voltage) to a row, an adjacent row gets slightly activated as well - Vulnerable cells in that slightly-activated row lose a little bit of charge - If row hammer happens enough times, charge in such cells gets drained ### Higher-Level Implications This simple circuit level failure mechanism has enormous implications on upper layers of the transformation hierarchy **Problem** Algorithm Program/Language **Runtime System** (VM, OS, MM) ISA (Architecture) Microarchitecture Logic Devices Electrons ``` loop: mov (X), %eax mov (Y), %ebx clflush (X) clflush (Y) mfence jmp loop ``` - 1. Avoid cache hits - Flush X from cache - 2. Avoid *row hits* to X - Read Y in another row ``` loop: mov (X), %eax mov (Y), %ebx clflush (X) clflush (Y) mfence jmp loop ``` ``` loop: mov (X), %eax mov (Y), %ebx clflush (X) clflush (Y) mfence jmp loop ``` ``` loop: mov (X), %eax mov (Y), %ebx clflush (X) clflush (Y) mfence jmp loop ``` # Observed Errors in Real Systems | CPU Architecture | Errors | Access-Rate | |---------------------------|--------|-------------| | Intel Haswell (2013) | 22.9K | 12.3M/sec | | Intel Ivy Bridge (2012) | 20.7K | 11.7M/sec | | Intel Sandy Bridge (2011) | 16.1K | 11.6M/sec | | AMD Piledriver (2012) | 59 | 6.1M/sec | #### A real reliability & security issue ### One Can Take Over an Otherwise-Secure System ### Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors Abstract. Memory isolation is a key property of a reliable and secure computing system — an access to one memory address should not have unintended side effects on data stored in other addresses. However, as DRAM process technology # Project Zero Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors (Kim et al., ISCA 2014) News and updates from the Project Zero team at Google Exploiting the DRAM rowhammer bug to gain kernel privileges (Seaborn, 2015) Monday, March 9, 2015 Exploiting the DRAM rowhammer bug to gain kernel privileges # RowHammer Security Attack Example - "Rowhammer" is a problem with some recent DRAM devices in which repeatedly accessing a row of memory can cause bit flips in adjacent rows (Kim et al., ISCA 2014). - Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors (Kim et al., ISCA 2014) - We tested a selection of laptops and found that a subset of them exhibited the problem. - We built two working privilege escalation exploits that use this effect. - Exploiting the DRAM rowhammer bug to gain kernel privileges (Seaborn+, 2015) - One exploit uses rowhammer-induced bit flips to gain kernel privileges on x86-64 Linux when run as an unprivileged userland process. - When run on a machine vulnerable to the rowhammer problem, the process was able to induce bit flips in page table entries (PTEs). - It was able to use this to gain write access to its own page table, and hence gain read-write access to all of physical memory. # Security Implications ### Security Implications It's like breaking into an apartment by repeatedly slamming a neighbor's door until the vibrations open the door you were after ### Selected Readings on RowHammer (I) - Our first detailed study: Rowhammer analysis and solutions (June 2014) - Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee, Donghyuk Lee, Chris Wilkerson, Konrad Lai, and Onur Mutlu, <u>"Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors"</u> Proceedings of the <u>41st International Symposium on Computer Architecture</u> (**ISCA**), Minneapolis, MN, June 2014. [Slides (pptx) (pdf)] [Lightning Session Slides (pptx) (pdf)] [Source Code and Data] - Our Source Code to Induce Errors in Modern DRAM Chips (June 2014) - https://github.com/CMU-SAFARI/rowhammer - Google Project Zero's Attack to Take Over a System (March 2015) - Exploiting the DRAM rowhammer bug to gain kernel privileges (Seaborn+, 2015) - https://github.com/google/rowhammer-test - Double-sided Rowhammer ### Selected Readings on RowHammer (II) - Remote RowHammer Attacks via JavaScript (July 2015) - http://arxiv.org/abs/1507.06955 - https://github.com/IAIK/rowhammerjs - Gruss et al., DIMVA 2016. - CLFLUSH-free Rowhammer - "A fully automated attack that requires nothing but a website with JavaScript to trigger faults on remote hardware." - "We can gain unrestricted access to systems of website visitors." - ANVIL: Software-Based Protection Against Next-Generation Rowhammer Attacks (March 2016) - http://dl.acm.org/citation.cfm?doid=2872362.2872390 - Aweke et al., ASPLOS 2016 - CLFLUSH-free Rowhammer - Software based monitoring for rowhammer detection # Selected Readings on RowHammer (III) - Dedup Est Machina: Memory Deduplication as an Advanced Exploitation Vector (May 2016) - https://www.ieee-security.org/TC/SP2016/papers/0824a987.pdf - Bosman et al., IEEE S&P 2016. - Exploits Rowhammer and Memory Deduplication to overtake a browser - "We report on the first reliable remote exploit for the Rowhammer vulnerability running entirely in Microsoft Edge." - "[an attacker] ... can reliably "own" a system with all defenses up, even if the software is entirely free of bugs." - CAn't Touch This: Software-only Mitigation against Rowhammer Attacks targeting Kernel Memory (August 2017) - https://www.usenix.org/system/files/conference/usenixsecurity17/sec17brasser.pdf - Brasser et al., USENIX Security 2017. - Partitions physical memory into security domains, user vs. kernel; limits rowhammer-induced bit flips to the user domain. # Selected Readings on RowHammer (IV) - A New Approach for Rowhammer Attacks (May 2016) - https://ieeexplore.ieee.org/document/7495576 - Qiao et al., HOST 2016 - CLFLUSH-free RowHammer - "Libc functions memset and memcpy are found capable of rowhammer." - Triggers RowHammer with malicious inputs but benign code - One Bit Flips, One Cloud Flops: Cross-VM Row Hammer Attacks and Privilege Escalation (August 2016) - https://www.usenix.org/system/files/conference/usenixsecurity16/sec16\_pa per\_xiao.pdf - Xiao et al., USENIX Security 2016. - "Technique that allows a malicious guest VM to have read and write accesses to arbitrary physical pages on a shared machine." - Graph-based algorithm to reverse engineer mapping of physical addresses in DRAM # Selected Readings on RowHammer (V) - Curious Case of RowHammer: Flipping Secret Exponent Bits using Timing Analysis (August 2016) - https://link.springer.com/content/pdf/10.1007%2F978-3-662-53140-2\_29.pdf - □ Bhattacharya et al., CHES 2016 - Combines timing analysis to perform rowhammer on cryptographic keys stored in memory - DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks (August 2016) - https://www.usenix.org/system/files/conference/usenixsecurity16/sec16\_pa per\_pessl.pdf - Pessl et al., USENIX Security 2016 - Shows RowHammer failures on DDR4 devices despite TRR solution - Reverse engineers address mapping functions to improve existing RowHammer attacks ### Selected Readings on RowHammer (VI) - Flip Feng Shui: Hammering a Needle in the Software Stack (August 2016) - https://www.usenix.org/system/files/conference/usenixsecurity16/sec16\_paper\_ razavi.pdf - Razavi et al., USENIX Security 2016. - Combines memory deduplication and RowHammer - "A malicious VM can gain unauthorized access to a co-hosted VM running OpenSSH." - Breaks OpenSSH public key authentication - Drammer: Deterministic Rowhammer Attacks on Mobile Platforms (October 2016) - http://dl.acm.org/citation.cfm?id=2976749.2978406 - Van Der Veen et al., ACM CCS 2016 - Can take over an ARM-based Android system deterministically - Exploits predictable physical memory allocator behavior - Can deterministically place security-sensitive data (e.g., page table) in an attackerchosen, vulnerable location in memory # Selected Readings on RowHammer (VII) - When Good Protections go Bad: Exploiting anti-DoS Measures to Accelerate Rowhammer Attacks (May 2017) - https://web.eecs.umich.edu/~misiker/resources/HOST-2017-Misiker.pdf - Aga et al., HOST 2017 - "A virtual-memory based cache-flush free attack that is sufficiently fast to rowhammer with double rate refresh." - Enabled by Cache Allocation Technology - SGX-Bomb: Locking Down the Processor via Rowhammer Attack (October 2017) - https://dl.acm.org/citation.cfm?id=3152709 - Jang et al., SysTEX 2017 - "Launches the Rowhammer attack against enclave memory to trigger the processor lockdown." - Running unknown enclave programs on the cloud can shut down servers shared with other clients. # Selected Readings on RowHammer (VIII) - Another Flip in the Wall of Rowhammer Defenses (May 2018) - https://arxiv.org/pdf/1710.00551.pdf - Gruss et al., IEEE S&P 2018 - A new type of Rowhammer attack which only hammers one single address, which can be done without knowledge of physical addresses and DRAM mappings - Defeats static analysis and performance counter analysis defenses by running inside an SGX enclave - GuardION: Practical Mitigation of DMA-Based Rowhammer Attacks on ARM (June 2018) - https://link.springer.com/chapter/10.1007/978-3-319-93411-2\_5 - Van Der Veen et al., DIMVA 2018 - Presents RAMPAGE, a DMA-based RowHammer attack against the latest Android OS # Selected Readings on RowHammer (IX) - Grand Pwning Unit: Accelerating Microarchitectural Attacks with the GPU (May 2018) - https://www.vusec.net/wp-content/uploads/2018/05/glitch.pdf - Frigo et al., IEEE S&P 2018. - The first end-to-end remote Rowhammer exploit on mobile platforms that use our GPU-based primitives in orchestration to compromise browsers on mobile devices in under two minutes. - Throwhammer: Rowhammer Attacks over the Network and Defenses (July 2018) - https://www.cs.vu.nl/~herbertb/download/papers/throwhammer\_atc18.pdf - Tatar et al., USENIX ATC 2018. - "[We] show that an attacker can trigger and exploit Rowhammer bit flips directly from a remote machine by only sending network packets." # Selected Readings on RowHammer (X) - Nethammer: Inducing Rowhammer Faults through Network Requests (July 2018) - https://arxiv.org/pdf/1805.04956.pdf - Lipp et al., arxiv.org 2018. - "Nethammer is the first truly remote Rowhammer attack, without a single attacker-controlled line of code on the targeted system." - ZebRAM: Comprehensive and Compatible Software Protection Against Rowhammer Attacks (October 2018) - https://www.usenix.org/system/files/osdi18-konoth.pdf - Konoth et al., OSDI 2018 - A new pure-software protection mechanism against RowHammer. # Selected Readings on RowHammer (XI.A) - PassMark Software, memtest86, since 2014 - https://www.memtest86.com/troubleshooting.htm#hammer #### Why am I only getting errors during Test 13 Hammer Test? The Hammer Test is designed to detect RAM modules that are susceptible to disturbance errors caused by charge leakage. This phenomenon is characterized in the research paper Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors by Yoongu Kim et al. According to the research, a significant number of RAM modules manufactured 2010 or newer are affected by this defect. In simple terms, susceptible RAM modules can be subjected to disturbance errors when repeatedly accessing addresses in the same memory bank but different rows in a short period of time. Errors occur when the repeated access causes charge loss in a memory cell, before the cell contents can be refreshed at the next DRAM refresh interval. Starting from MemTest86 v6.2, the user may see a warning indicating that the RAM may be vulnerable to high frequency row hammer bit flips. This warning appears when errors are detected during the first pass (maximum hammer rate) but no errors are detected during the second pass (lower hammer rate). See MemTest86 Test Algorithms for a description of the two passes that are performed during the Hammer Test (Test 13). When performing the second pass, address pairs are hammered only at the rate deemed as the maximum allowable by memory vendors (200K accesses per 64ms). Once this rate is exceeded, the integrity of memory contents may no longer be guaranteed. If errors are detected in both passes, errors are reported as normal. The errors detected during Test 13, albeit exposed only in extreme memory access cases, are most certainly real errors. During typical name PC usage (eg. web prowsing, word processing, etc.), it is less likely that the memory usage pattern will rail into the extreme case that make it vulnerable to disturbance errors. It may be of greater concern if you were running highly sensitive equipment such as medical equipment, aircraft control systems, or bank database servers. It is impossible to predict with any accuracy if these errors will occur in real life applications. One would need to do a major scientific study of 1000 of computers and their usage patterns, then do a forensic analysis of each application to study how it makes use of the RAM while it executes. To date, we have only seen 1-bit errors as a result of running the Hammer Test. # Selected Readings on RowHammer (XI.B) - PassMark Software, memtest86, since 2014 - https://www.memtest86.com/troubleshooting.htm#hammer #### **Detection and mitigation of row hammer errors** The ability of MemTest86 to detect and report on row hammer errors depends on several factors and what mitigations are in place. To generate errors adjacent memory rows must be repeatedly accessed. But hardware features such as multiple channels, interleaving, scrambling, Channel Hashing, NUMA & XOR schemes make it nearly impossible (for an arbitrary CPU & RAM stick) to know which memory addresses correspond to which rows in the RAM. Various mitigations might also be in place. Different BIOS firmware might set the refresh interval to different values (tREFI). The shorter the interval the more resistant the RAM will be to errors. But shorter intervals result in higher power consumption and increased processing overhead. Some CPUs also support pseudo target row refresh (pTRR) that can be used in combination with pTRR-compliant RAM. This field allows the RAM stick to indicate the MAC (Maximum Active Count) level which is the RAM can support. A typical value might be 200,000 row activations. Some CPUs also support the Joint Electron Design Engineering Council (JEDEC) Targeted Row Refresh (TRR) algorithm. The TRR is an improved version of the previously implemented pTRR algorithm and does not inflict any performance drop or additional power usage. As a result the row hammer test implemented in MemTest86 maybe not be the worst case possible and vulnerabilities in the underlying RAM might be undetectable due to the mitigations in place in the BIOS and CPU. 48 # Security Implications (ISCA 2014) - Breach of memory protection - OS page (4KB) fits inside DRAM row (8KB) - Adjacent DRAM row → Different OS page - Vulnerability: disturbance attack - By accessing its own page, a program could corrupt pages belonging to another program - We constructed a proof-of-concept - Using only user-level instructions # More Security Implications (I) "We can gain unrestricted access to systems of website visitors." www.iaik.tugraz.at Not there yet, but ... ROOT privileges for web apps! Daniel Gruss (@lavados), Clémentine Maurice (@BloodyTangerine), December 28, 2015 — 32c3, Hamburg, Germany Rowhammer.js: A Remote Software-Induced Fault Attack in JavaScript (DIMVA'16) 50 # More Security Implications (II) "Can gain control of a smart phone deterministically" Hammer And Root Millions of Androids Drammer: Deterministic Rowhammer Attacks on Mobile Platforms, CCS'16 51 ### More Security Implications (III) Using an integrated GPU in a mobile system to remotely escalate privilege via the WebGL interface "GRAND PWNING UNIT" — # Drive-by Rowhammer attack uses GPU to compromise an Android phone JavaScript based GLitch pwns browsers by flipping bits inside memory chips. **DAN GOODIN - 5/3/2018, 12:00 PM** # Grand Pwning Unit: Accelerating Microarchitectural Attacks with the GPU Pietro Frigo Vrije Universiteit Amsterdam p.frigo@vu.nl Cristiano Giuffrida Vrije Universiteit Amsterdam giuffrida@cs.vu.nl Herbert Bos Vrije Universiteit Amsterdam herbertb@cs.vu.nl Kaveh Razavi Vrije Universiteit Amsterdam kaveh@cs.vu.nl ### More Security Implications (IV) Rowhammer over RDMA (I) BIZ & IT ECH SCIENCE POLICY CARS AMING & CULTURI THROWHAMMER - # Packets over a LAN are all it takes to trigger serious Rowhammer bit flips The bar for exploiting potentially serious DDR weakness keeps getting lower. **DAN GOODIN - 5/10/2018, 5:26 PM** #### Throwhammer: Rowhammer Attacks over the Network and Defenses Andrei Tatar VU Amsterdam Radhesh Krishnan VU Amsterdam Herbert Bos VU Amsterdam Elias Athanasopoulos *University of Cyprus* Kaveh Razavi VU Amsterdam Cristiano Giuffrida VU Amsterdam ### More Security Implications (V) Rowhammer over RDMA (II) Nethammer—Exploiting DRAM Rowhammer Bug Through Network Requests # Nethammer: Inducing Rowhammer Faults through Network Requests Moritz Lipp Graz University of Technology Daniel Gruss Graz University of Technology Misiker Tadesse Aga University of Michigan Clémentine Maurice Univ Rennes, CNRS, IRISA Lukas Lamster Graz University of Technology Michael Schwarz Graz University of Technology Lukas Raab Graz University of Technology ### More Security Implications (VI) Rowhammer on MLC NAND Flash (based on [Cai+, HPCA 2017]) **Security** # Rowhammer RAM attack adapted to hit flash storage Project Zero's two-year-old dog learns a new trick By Richard Chirgwin 17 Aug 2017 at 04:27 17 🖵 SHARE ▼ From random block corruption to privilege escalation: A filesystem attack vector for rowhammer-like attacks **Anil Kurmus** Nikolas Ioannou Matthias Neugschwandtner Thomas Parnell Nikolaos Papandreou IBM Research – Zurich # More Security Implications? # Understanding RowHammer ### Root Causes of Disturbance Errors - Cause 1: Electromagnetic coupling - Toggling the wordline voltage briefly increases the voltage of adjacent wordlines - Slightly opens adjacent rows → Charge leakage - Cause 2: Conductive bridges - Cause 3: Hot-carrier injection Confirmed by at least one manufacturer ### Experimental DRAM Testing Infrastructure Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors (Kim et al., ISCA 2014) Adaptive-Latency DRAM: Optimizing DRAM Timing for the Common-Case (Lee et al., HPCA 2015) <u>AVATAR: A Variable-Retention-Time (VRT)</u> <u>Aware Refresh for DRAM Systems</u> (Qureshi et al., DSN 2015) An Experimental Study of Data Retention Behavior in Modern DRAM Devices: Implications for Retention Time Profiling Mechanisms (Liu et al., ISCA 2013) The Efficacy of Error Mitigation Techniques for DRAM Retention Failures: A Comparative Experimental Study (Khan et al., SIGMETRICS 2014) # Experimental DRAM Testing Infrastructure # Tested DRAM Modules (129 total) | Manufacturer | Module | Date* (yy-ww) | $Timing^{\dagger}$ | | Organization | | Chip | | | Victims-per-Module | | | RIth (ms) | |-----------------------------|---------------------------------------|----------------|--------------------|----------------------|--------------|-------|------------------------|----------|-----------------------------|----------------------------------------|--------------------------------------------|--------------------------------------------|-------------| | | | | Freq (MT/s) | t <sub>RC</sub> (ns) | Size (GB) | Chips | Size (Gb) <sup>‡</sup> | Pins | Die Version <sup>§</sup> | Average | Minimum | Maximum | Min | | A<br>Total of<br>43 Modules | $A_1$ | 10-08 | 1066 | 50.625 | 0.5 | 4 | 1 | ×16 | В | 0 | 0 | 0 | - | | | $A_2$ | 10-20 | 1066 | 50.625 | 1 | 8 | 1 | ×8 | F | 0 | 0 | 0 | - | | | A <sub>3-5</sub> | 10-20 | 1066 | 50.625 | 0.5 | 4 | 1 | ×16 | В | 0 | 0 | 0 | - | | | A <sub>6-7</sub> | 11-24 | 1066 | 49.125 | 1 | 4 | 2 | ×16 | $\mathcal{D}$ | $7.8 \times 10^{1}$ | $5.2 \times 10^{1}$ | $1.0 \times 10^2$ | 21.3 | | | A <sub>8-12</sub> | 11-26 | 1066 | 49.125 | 1 | 4 | 2 | ×16 | $\mathcal{D}$ | $2.4 \times 10^{2}$ | $5.4 \times 10^{1}$ | $4.4 \times 10^{2}$ | 16.4 | | | A <sub>13-14</sub> | 11-50 | 1066 | 49.125 | 1 | 4 | 2 2 | ×16 | $\mathcal{D}$ | $8.8 \times 10^{1}$ | 1.7 × 10 <sup>1</sup> | $1.6 \times 10^{2}$ | 26.2 | | | A <sub>15-16</sub> | 12-22 | 1600 | 50.625 | 1 | 4 | | ×16 | M | 9.5 | | $1.0 \times 10^{1}$<br>$2.0 \times 10^{2}$ | 34.4 | | | A <sub>17-18</sub> | 12-26 | 1600 | 49.125 | 2 2 | 8 | 2 2 | ×8 | K | $1.2 \times 10^2$<br>$8.6 \times 10^6$ | $3.7 \times 10^{1}$<br>$7.0 \times 10^{6}$ | $1.0 \times 10^{7}$ | 21.3<br>8.2 | | | A <sub>19-30</sub> | 12-40<br>13-02 | 1600<br>1600 | 48.125<br>48.125 | 2 | 8 | 2 | ×8<br>×8 | _ | $1.8 \times 10^{6}$ | $1.0 \times 10^{6}$ | $3.5 \times 10^6$ | 11.5 | | | A <sub>31-34</sub> | 13-02 | 1600 | 48.125 | 2 | 8 | 2 | ×8 | _ | $4.0 \times 10^{1}$ | $1.0 \times 10^{1}$ $1.9 \times 10^{1}$ | | 21.3 | | | A <sub>35-36</sub> | 13-14 | 1600 | 48.125 | 2 | 8 | 2 | ×8 | ĸ | $1.7 \times 10^6$ | $1.4 \times 10^{6}$ | $2.0 \times 10^{6}$ | 9.8 | | | Α <sub>37-38</sub> | 13-28 | 1600 | 48.125 | 2 | 8 | 2 | ×8 | K | $5.7 \times 10^4$ | | | 16.4 | | | A <sub>39-40</sub> | 14-04 | 1600 | 49.125 | 2 | 8 | 2 | ×8 | _ | $2.7 \times 10^{5}$ | $2.7 \times 10^5$ | | 18.0 | | | Α <sub>41</sub> | 14-04 | 1600 | 48.125 | 2 | 8 | 2 | ×8 | K | 0.5 | 0 | 1 | 62.3 | | | A <sub>42-43</sub> | | | | | | | | | | | | | | B<br>Total of<br>54 Modules | B | 08-49 | 1066 | 50.625 | 1 | 8 | 1 | ×8 | $\mathcal{D}$ $\mathcal{E}$ | 0 | 0 | 0 | - | | | B <sub>2</sub> | 09-49 | 1066 | 50.625 | 1 | 8 | 1 | ×8 | E<br>F | 0 | 0 | 0 | _ | | | B <sub>3</sub><br>B <sub>4</sub> | 10-19<br>10-31 | 1066<br>1333 | 50.625<br>49.125 | 1 2 | 8 | 1 2 | ×8<br>×8 | C | 0 | 0 | 0 | - | | | B <sub>4</sub> | 11-13 | 1333 | 49.125 | 2 | 8 | 2 | ×8 | C | 0 | 0 | 0 | _ | | | B <sub>6</sub> | 11-15 | 1066 | 50.625 | 1 | 8 | 1 | ×8 | F | 0 | 0 | 0 | _ | | | B <sub>7</sub> | 11-19 | 1066 | 50.625 | 1 | 8 | 1 | ×8 | F | 0 | 0 | 0 | _ | | | B <sub>8</sub> | 11-25 | 1333 | 49.125 | 2 | 8 | 2 | ×8 | c | 0 | 0 | 0 | - | | | B <sub>9</sub> | 11-37 | 1333 | 49.125 | 2 | 8 | 2 | ×8 | $\mathcal{D}$ | $1.9 \times 10^{6}$ | $1.9 \times 10^{6}$ | $1.9 \times 10^{6}$ | 11.5 | | | B | 11-46 | 1333 | 49.125 | 2 | 8 | 2 | ×8 | $\mathcal{D}$ | $2.2 \times 10^{6}$ | $1.5 \times 10^{6}$ | | 11.5 | | | B <sub>10-12</sub><br>B <sub>13</sub> | 11-49 | 1333 | 49.125 | 2 | 8 | 2 | ×8 | c | 0 | 0 | 0 | - | | | B <sub>14</sub> | 12-01 | 1866 | 47.125 | 2 | 8 | 2 | ×8 | $\mathcal{D}$ | $9.1 \times 10^{5}$ | $9.1 \times 10^{5}$ | | 9.8 | | | B <sub>15-31</sub> | 12-10 | 1866 | 47.125 | 2 | 8 | 2 | ×8 | $\mathcal{D}$ | $9.8 \times 10^{5}$ | $7.8 \times 10^{5}$ | | 11.5 | | | B <sub>32</sub> | 12-25 | 1600 | 48.125 | 2 | 8 | 2 | ×8 | ε | | $7.4 \times 10^{5}$ | | 11.5 | | | B <sub>33-42</sub> | 12-28 | 1600 | 48.125 | 2 | 8 | 2 | ×8 | ε | | $1.9 \times 10^{5}$ | | 11.5 | | | B <sub>43-47</sub> | 12-31 | 1600 | 48.125 | 2 | 8 | 2 | ×8 | ε | | $2.9 \times 10^{5}$ | | 13.1 | | | B <sub>48-51</sub> | 13-19 | 1600 | 48.125 | 2 | 8 | 2 | ×8 | ε | | $7.4 \times 10^4$ | | 14.7 | | | B <sub>52-53</sub> | 13-40 | 1333 | 49.125 | 2 | 8 | 2 | ×8 | $\mathcal{D}$ | 2.6 × 10 <sup>4</sup> | $2.3 \times 10^4$ | | 21.3 | | | B <sub>54</sub> | 14-07 | 1333 | 49.125 | 2 | 8 | 2 | ×8 | $\mathcal{D}$ | | $7.5 \times 10^{3}$ | | 26.2 | | | C <sub>1</sub> | 10-18 | 1333 | 49.125 | 2 | 8 | 2 | ×8 | $\mathcal{A}$ | 0 | 0 | 0 | | | | C <sub>2</sub> | 10-18 | 1066 | 50.625 | 2 | 8 | 2 | ×8 | $\mathcal{A}$ | 0 | 0 | 0 | _ | | | C <sub>3</sub> | 10-20 | 1066 | 50.625 | 2 | 8 | 2 | ×8 | $\mathcal{A}$ | 0 | 0 | 0 | _ | | | C <sub>3</sub> | 10-22 | 1333 | 49.125 | 2 | 8 | 2 | ×8 | В | $8.9 \times 10^{2}$ | $6.0 \times 10^{2}$ | $1.2 \times 10^{3}$ | 29.5 | | | C <sub>4-5</sub> | 10-26 | 1333 | 49.125 | 1 | 8 | 1 | ×8 | $\tau$ | 0 | 0.0 × 10 | 0 | - | | | C <sub>6</sub><br>C <sub>7</sub> | 10-43 | 1333 | 49.125 | 2 | 8 | 2 | ×8 | B | | $4.0 \times 10^{2}$ | | 29.5 | | | C <sub>8</sub> | 11-12 | 1333 | 46.25 | 2 | 8 | 2 | ×8 | В | $6.9 \times 10^{2}$ | | | 21.3 | | | C <sub>9</sub> | 11-12 | 1333 | 46.25 | 2 | 8 | 2 | ×8 | В | | $9.2 \times 10^{2}$ | | 27.9 | | | C <sub>10</sub> | 11-19 | 1333 | 49.125 | 2 | 8 | 2 | ×8 | В | 3 | 3 | 3 | 39.3 | | | C <sub>11</sub> | 11-42 | 1333 | 49.125 | 2 | 8 | 2 | ×8 | В | $1.6 \times 10^{2}$ | $1.6 \times 10^{2}$ | $1.6 \times 10^{2}$ | 39.3 | | С | C <sub>12</sub> | 11-48 | 1600 | 48.125 | 2 | 8 | 2 | ×8 | C | | $7.1 \times 10^4$ | | 19.7 | | Total of | C <sub>13</sub> | 12-08 | 1333 | 49.125 | 2 | 8 | 2 | ×8 | c | | $3.9 \times 10^4$ | | 21.3 | | Total of<br>32 Modules | C <sub>14-15</sub> | 12-12 | 1333 | 49.125 | 2 | 8 | 2 | ×8 | c | | $2.1 \times 10^4$ | | 21.3 | | 32 Modules | C <sub>16-18</sub> | 12-20 | 1600 | 48.125 | 2 | 8 | 2 | ×8 | c | | $1.2 \times 10^{3}$ | | 27.9 | | | C <sub>19</sub> | 12-23 | 1600 | 48.125 | 2 | 8 | 2 | ×8 | ε | | $1.4 \times 10^{5}$ | | 18.0 | | | C <sub>20</sub> | 12-24 | 1600 | 48.125 | 2 | 8 | 2 | ×8 | c | $6.5 \times 10^4$ | | | 21.3 | | | C <sub>21</sub> | 12-26 | 1600 | 48.125 | 2 | 8 | 2 | ×8 | C | $2.3 \times 10^{4}$ | $2.3 \times 10^4$ | | 24.6 | | | C <sub>22</sub> | 12-32 | 1600 | 48.125 | 2 | 8 | 2 | ×8 | c | | $1.7 \times 10^4$ | | 22.9 | | | C <sub>23-24</sub> | 12-37 | 1600 | 48.125 | 2 | 8 | 2 | ×8 | c | | $1.1 \times 10^4$ | | 18.0 | | | C <sub>25-30</sub> | 12-41 | 1600 | 48.125 | 2 | 8 | 2 | ×8 | c | $2.0 \times 10^4$ | $1.1 \times 10^{4}$ | | 19.7 | | | C <sub>31</sub> | 13-11 | 1600 | 48.125 | 2 | 8 | 2 | ×8 | c | | $3.3 \times 10^{5}$ | | 14.7 | | | C <sub>32</sub> | 13-35 | 1600 | 48.125 | 2 | 8 | 2 | ×8 | c | | $3.7 \times 10^4$ | | 21.3 | | | | | | | | | | | | | | | | <sup>\*</sup> We report the manufacture date marked on the chip packages, which is more accurate than other dates that can be gleaned from a module. † We report timing constraints stored in the module's on-board ROM [33], which is read by the system BIOS to calibrate the memory controller. ‡ The maximum DRAM chip size supported by our testing platform is 2Gb. <sup>§</sup> We report DRAM die versions marked on the chip packages, which typically progress in the following manner: $\mathcal{M} \to \mathcal{A} \to \mathcal{B} \to \mathcal{C} \to \cdots$ . Table 3. Sample population of 129 DDR3 DRAM modules, categorized by manufacturer and sorted by manufacture date #### RowHammer Characterization Results - 1. Most Modules Are at Risk - 2. Errors vs. Vintage - 3. Error = Charge Loss - 4. Adjacency: Aggressor & Victim - 5. Sensitivity Studies - 6. Other Results in Paper - 7. Solution Space # 4. Adjacency: Aggressor & Victim Note: For three modules with the most errors (only first bank) Most aggressors & victims are adjacent # Access Interval (Aggressor) Note: For three modules with the most errors (only first bank) Less frequent accesses → Fewer errors # 2 Refresh Interval Note: Using three modules with the most errors (only first bank) *More frequent refreshes* $\rightarrow$ *Fewer errors* # B Data Pattern # Solid ~Solid 00000 00000 00000 00000 Errors affected by data stored in other cells # 6. Other Results (in Paper) - Victim Cells ≠ Weak Cells (i.e., leaky cells) - Almost no overlap between them - Errors not strongly affected by temperature - Default temperature: 50°C - At 30°C and 70°C, number of errors changes <15%</li> - Errors are repeatable - Across ten iterations of testing, >70% of victim cells had errors in every iteration # 6. Other Results (in Paper) cont'd - As many as 4 errors per cache-line - Simple ECC (e.g., SECDED) cannot prevent all errors - Number of cells & rows affected by aggressor - Victims cells per aggressor: ≤110 - Victims rows per aggressor: ≤9 - Cells affected by two aggressors on either side - Very small fraction of victim cells (<100) have an error when either one of the aggressors is toggled ### More on RowHammer Analysis Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee, Donghyuk Lee, Chris Wilkerson, Konrad Lai, and Onur Mutlu, "Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors" Proceedings of the 41st International Symposium on Computer Architecture (ISCA), Minneapolis, MN, June 2014. [Slides (pptx) (pdf)] [Lightning Session Slides (pptx) (pdf)] [Source Code and Data] ### Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors Yoongu Kim<sup>1</sup> Ross Daly\* Jeremie Kim<sup>1</sup> Chris Fallin\* Ji Hye Lee<sup>1</sup> Donghyuk Lee<sup>1</sup> Chris Wilkerson<sup>2</sup> Konrad Lai Onur Mutlu<sup>1</sup> <sup>1</sup>Carnegie Mellon University <sup>2</sup>Intel Labs 69 ### Retrospective on RowHammer & Future Onur Mutlu, "The RowHammer Problem and Other Issues We May Face as **Memory Becomes Denser**" Invited Paper in Proceedings of the <u>Design, Automation, and Test in</u> Europe Conference (DATE), Lausanne, Switzerland, March 2017. [Slides (pptx) (pdf)] #### The RowHammer Problem and Other Issues We May Face as Memory Becomes Denser Onur Mutlu ETH Zürich onur.mutlu@inf.ethz.ch https://people.inf.ethz.ch/omutlu ### A More Recent RowHammer Retrospective Onur Mutlu and Jeremie Kim, "RowHammer: A Retrospective" IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) Special Issue on Top Picks in Hardware and Embedded Security, 2019. [Preliminary arXiv version] # RowHammer: A Retrospective Onur Mutlu<sup>§‡</sup> Jeremie S. Kim<sup>‡§</sup> §ETH Zürich <sup>‡</sup>Carnegie Mellon University 71 # RowHammer Solutions ## Two Types of RowHammer Solutions ## Immediate - To protect the vulnerable DRAM chips in the field - Limited possibilities - Longer-term - To protect future DRAM chips - Wider range of protection mechanisms - Our ISCA 2014 paper proposes both types of solutions - Seven solutions in total - □ PARA proposed as best solution → already employed in the field ## Some Potential Solutions Make better DRAM chips Cost • Refresh frequently Power, Performance Sophisticated ECC Cost, Power Access counters Cost, Power, Complexity ## **Naive Solutions** - 1 Throttle accesses to same row - Limit access-interval: ≥500ns - Limit number of accesses: $\leq 128 \text{K} (=64 \text{ms}/500 \text{ns})$ - 2 Refresh more frequently - Shorten refresh-interval by $\sim 7x$ Both naive solutions introduce significant overhead in performance and power ## Apple's Patch for RowHammer https://support.apple.com/en-gb/HT204934 Available for: OS X Mountain Lion v10.8.5, OS X Mavericks v10.9.5 Impact: A malicious application may induce memory corruption to escalate privileges Description: A disturbance error, also known as Rowhammer, exists with some DDR3 RAM that could have led to memory corruption. This issue was mitigated by increasing memory refresh rates. CVE-ID CVE-2015-3693 : Mark Seaborn and Thomas Dullien of Google, working from original research by Yoongu Kim et al (2014) HP, Lenovo, and other vendors released similar patches ## Our Solution to RowHammer PARA: <u>Probabilistic Adjacent Row Activation</u> ## Key Idea – After closing a row, we activate (i.e., refresh) one of its neighbors with a low probability: p = 0.005 ## Reliability Guarantee - When p=0.005, errors in one year: $9.4 \times 10^{-14}$ - By adjusting the value of p, we can vary the strength of protection against errors ## Advantages of PARA - PARA refreshes rows infrequently - Low power - Low performance-overhead - Average slowdown: 0.20% (for 29 benchmarks) - Maximum slowdown: 0.75% - PARA is stateless - Low cost - Low complexity - PARA is an effective and low-overhead solution to prevent disturbance errors ## Requirements for PARA - If implemented in DRAM chip - Enough slack in timing parameters - Plenty of slack today: - Lee et al., "Adaptive-Latency DRAM: Optimizing DRAM Timing for the Common Case," HPCA 2015. - Chang et al., "Understanding Latency Variation in Modern DRAM Chips," SIGMETRICS 2016. - Lee et al., "Design-Induced Latency Variation in Modern DRAM Chips," SIGMETRICS 2017. - Chang et al., "Understanding Reduced-Voltage Operation in Modern DRAM Devices," SIGMETRICS 2017. - Ghose et al., "What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study," SIGMETRICS 2018. - If implemented in memory controller - Better coordination between memory controller and DRAM - Memory controller should know which rows are physically adjacent ## Probabilistic Activation in Real Life (I) ## Probabilistic Activation in Real Life (II) ## Seven RowHammer Solutions Proposed Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee, Donghyuk Lee, Chris Wilkerson, Konrad Lai, and Onur Mutlu, "Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors" Proceedings of the 41st International Symposium on Computer Architecture (ISCA), Minneapolis, MN, June 2014. [Slides (pptx) (pdf)] [Lightning Session Slides (pptx) (pdf)] [Source Code and Data] ## Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors Yoongu Kim<sup>1</sup> Ross Daly\* Jeremie Kim<sup>1</sup> Chris Fallin\* Ji Hye Lee<sup>1</sup> Donghyuk Lee<sup>1</sup> Chris Wilkerson<sup>2</sup> Konrad Lai Onur Mutlu<sup>1</sup> SAFARI <sup>1</sup>Carnegie Mellon University <sup>2</sup>Intel Labs # Main Memory Needs Intelligent Controllers for Security ## Industry Is Writing Papers About It, Too ## **DRAM Process Scaling Challenges** ## Refresh - Difficult to build high-aspect ratio cell capacitors decreasing cell capacitance - · Leakage current of cell access transistors increasing ### tWR - Contact resistance between the cell capacitor and access transistor increasing - · On-current of the cell access transistor decreasing - Bit-line resistance increasing ### VRT Occurring more frequently with cell capacitance decreasing 3 / 12 ## Call for Intelligent Memory Controllers ## **DRAM Process Scaling Challenges** ### Refresh Difficult to build high-aspect ratio cell capacitors decreasing cell capacitance THE MEMORY FORUM 2014 ## Co-Architecting Controllers and DRAM to Enhance DRAM Process Scaling Uksong Kang, Hak-soo Yu, Churoo Park, \*Hongzhong Zheng, \*\*John Halbert, \*\*Kuljit Bains, SeongJin Jang, and Joo Sun Choi Samsung Electronics, Hwasung, Korea / \*Samsung Electronics, San Jose / \*\*Intel ## Aside: Intelligent Controller for NAND Flash [DATE 2012, ICCD 2012, DATE 2013, ITJ 2013, ICCD 2013, SIGMETRICS 2014, HPCA 2015, DSN 2015, MSST 2015, JSAC 2016, HPCA 2017, DFRWS 2017, PIEEE 2017, HPCA 2018, SIGMETRICS 2018] NAND Daughter Board ## Aside: Intelligent Controller for NAND Flash Proceedings of the IEEE, Sept. 2017 ## Error Characterization, Mitigation, and Recovery in Flash-Memory-Based Solid-State Drives This paper reviews the most recent advances in solid-state drive (SSD) error characterization, mitigation, and data recovery techniques to improve both SSD's reliability and lifetime. By Yu Cai, Saugata Ghose, Erich F. Haratsch, Yixin Luo, and Onur Mutlu https://arxiv.org/pdf/1706.08642 ## Main Memory Needs Intelligent Controllers ## Challenge and Opportunity for Future # Fundamentally Secure, Reliable, Safe Computing Architectures ## Solution Direction: Principled Designs ## Design fundamentally secure computing architectures Predict and prevent such safety issues ## Understand and Model with Experiments (DRAM) ## Understand and Model with Experiments (Flash) [DATE 2012, ICCD 2012, DATE 2013, ITJ 2013, ICCD 2013, SIGMETRICS 2014, HPCA 2015, DSN 2015, MSST 2015, JSAC 2016, HPCA 2017, DFRWS 2017, PIEEE 2017, HPCA 2018, SIGMETRICS 2018] NAND Daughter Board ## Recall: Collapse of the "Galloping Gertie" ## Another Example (1994) ## Yet Another Example (2007) ## A More Recent Example (2018) ## In-Field Patch-ability (Intelligent Memory) Can Avoid Such Failures ## Final Thoughts on RowHammer ## Some Thoughts on RowHammer A simple hardware failure mechanism can create a widespread system security vulnerability - How to exploit and fix the vulnerability requires a strong understanding across the transformation layers - And, a strong understanding of tools available to you - Fixing needs to happen for two types of chips - Existing chips (already in the field) - Future chips - Mechanisms for fixing are different between the two types ## Aside: Byzantine Failures - This class of failures is known as Byzantine failures - Characterized by - Undetected erroneous computation - Opposite of "fail fast (with an error or no result)" - "erroneous" can be "malicious" (intent is the only distinction) - Very difficult to detect and confine Byzantine failures - Do all you can to avoid them - Lamport et al., "The Byzantine Generals Problem," ACM TOPLAS 1982. ## RowHammer, Revisited - One can predictably induce bit flips in commodity DRAM chips - □ >80% of the tested DRAM chips are vulnerable - First example of how a simple hardware failure mechanism can create a widespread system security vulnerability Forget Software—Now Hackers Are Exploiting Physics BUSINESS CULTURE DESIGN GEAR SCIENCE NDY GREENBERG SECURITY 08.31.16 7:00 AM ## FORGET SOFTWARE—NOW HACKERS ARE EXPLOITING PHYSICS ## RowHammer: Retrospective - New mindset that has enabled a renewed interest in HW security attack research: - □ Real (memory) chips are vulnerable, in a simple and widespread manner → this causes real security problems - □ Hardware reliability → security connection is now mainstream discourse - Many new RowHammer attacks... - Tens of papers in top security venues - More to come as RowHammer is getting worse (DDR4 & beyond) - Many new RowHammer solutions... - Apple security release; Memtest86 updated - Many solution proposals in top venues (latest in ISCA 2019) - Principled system-DRAM co-design (in original RowHammer paper) - More to come... ## Perhaps Most Importantly... - RowHammer enabled a shift of mindset in mainstream security researchers - General-purpose hardware is fallible, in a widespread manner - Its problems are exploitable - This mindset has enabled many systems security researchers to examine hardware in more depth - And understand HW's inner workings and vulnerabilities - It is no coincidence that two of the groups that discovered Meltdown and Spectre heavily worked on RowHammer attacks before - More to come... ## Summary: RowHammer - DRAM reliability is reducing - Reliability issues open up security vulnerabilities - Very hard to defend against - Rowhammer is a prime example - First example of how a simple hardware failure mechanism can create a widespread system security vulnerability - Its implications on system security research are tremendous & exciting - Bad news: RowHammer is getting worse. - Good news: We have a lot more to do. - We are now fully aware hardware is easily fallible. - We are developing both attacks and solutions. - We are developing principled models, methodologies, solutions. 104 ## For More on RowHammer... Onur Mutlu and Jeremie Kim, "RowHammer: A Retrospective" <u>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems</u> (TCAD) Special Issue on Top Picks in Hardware and Embedded Security, 2019. [Preliminary arXiv version] ## RowHammer: A Retrospective Onur Mutlu<sup>§‡</sup> Jeremie S. Kim<sup>‡§</sup> §ETH Zürich <sup>‡</sup>Carnegie Mellon University SAFARI 105 ## **Memory Systems** ## and Memory-Centric Computing Systems Lecture 2b: RowHammer Prof. Onur Mutlu omutlu@gmail.com https://people.inf.ethz.ch/omutlu 13 June 2019 TU Wien Fast Course 2019 Carnegie Mellon