# **Digital Design & Computer Arch.** Lecture 22: Memory Overview, Organization & Technology

Prof. Onur Mutlu

ETH Zürich Spring 2022 19 May 2022

# Extra Assignment 3: Amdahl's Law

### Paper review

□ G. M. Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities," AFIPS 1967.

### Optional Assignment – for 1% extra credit

- Write a 1-page review
- Upload PDF file to Moodle Deadline: June 15

 Strongly recommended that you follow my guidelines for (paper) review

### Readings for This Lecture and Next

- Memory Hierarchy and Caches
- Required
  - □ H&H Chapters 8.1-8.3
  - Refresh: P&P Chapter 3.5
  - □ Kim & Mutlu, "Memory Systems," Computing Handbook, 2014.
    - https://people.inf.ethz.ch/omutlu/pub/memory-systems-introduction\_computing-handbook14.pdf
  - Recommended
    - An early cache paper by Maurice Wilkes
      - Wilkes, "Slave Memories and Dynamic Storage Allocation," IEEE Trans. On Electronic Computers, 1965.

# We Are **Done** With This...

- Dataflow (at the ISA level)
- Superscalar Execution
- VLIW
- Systolic Arrays
- Decoupled Access Execute
- SIMD Processing (Vector and Array processors)
- Graphics Processing Units (GPUs)

ProblemAlgorithmProgram/LanguageSystem SoftwareSW/HW InterfaceMicro-architectureLogicDevicesElectrons

### Approaches to (Instruction-Level) Concurrency

- Pipelining
- Fine-Grained Multithreading
- Out-of-order Execution
- Dataflow (at the ISA level)
- Superscalar Execution
- VLIW
- Systolic Arrays
- Decoupled Access Execute
- SIMD Processing (Vector and Array processors, GPUs)

Now you are very familiar with many processing paradigms

### Approaches to (Instruction-Level) Concurrency

- Pipelining
- Fine-Grained Multithreading
- Out-of-order Execution
- Dataflow (at the ISA level)
- Superscalar Execution
- VLIW
- Systolic Arrays
- Decoupled Access Execute
- SIMD Processing (Vector and Array processors, GPUs)

### Food for thought: tradeoffs of these different processing paradigms

# Tradeoffs of Processing Paradigms



### Food for thought: tradeoffs of these different processing paradigms

# Tradeoffs of Processing Paradigms



#### Food for thought: what determines the widespread success of a paradigm

# Let Us Now Take A Step Back

# A Computing System

- Three key components
- Computation
- Communication
- Storage/memory



Burks, Goldstein, von Neumann, "Preliminary discussion of the logical design of an electronic computing instrument," 1946.

#### **Computing System**



Image source: https://lbsitbytes2010.wordpress.com/2013/03/29/john-von-neumann-roll-no-15/

# A Computing System

- Three key components
- Computation
- Communication
- Storage/memory



Burks, Goldstein, von Neumann, "Preliminary discussion of the logical design of an electronic computing instrument," 1946.

#### **Computing System**



Image source: https://lbsitbytes2010.wordpress.com/2013/03/29/john-von-neumann-roll-no-15/

# Recall: What is A Computer?

• We will cover all three components



# Memory Is Critically Important

# Memory in a Modern System



#### AMD Barcelona, 2006

# A Large Fraction of Modern Chips is Memory



Apple M1, 2021

**SAFARI** 

Source: https://www.anandtech.com/show/16252/mac-mini-apple-m1-tested



#### Apple M1 Ultra System (2022)

#### **SAFARI**

https://www.gsmarena.com/apple\_announces\_m1\_ultra\_with\_20core\_cpu\_and\_64core\_gpu-news-53481.php



#### Intel Pentium Pro, 1995



https://download.intel.com/newsroom/kits/40thanniversary/gallery/images/Pentium\_4\_6xx-die.jpg

Intel Pentium 4, 2000



Core Count: 8 cores/16 threads

L1 Caches: 32 KB per core

L2 Caches: 512 KB per core

L3 Cache: 32 MB shared

#### AMD Ryzen 5000, 2020



https://www.it-techblog.de/ibm-power10-prozessor-mehr-speicher-mehr-tempo-mehr-sicherheit/09/2020/



#### Cores:

128 Streaming Multiprocessors

L1 Cache or Scratchpad: 192KB per SM Can be used as L1 Cache and/or Scratchpad

L2 Cache: 40 MB shared

# Cerebras's Wafer Scale Engine (2019)



- The largest ML accelerator chip
- 400,000 cores
- 18 GB of on-chip memory
- 9 PB/s memory bandwidth



Cerebras WSE 1.2 Trillion transistors 46,225 mm<sup>2</sup> Largest GPU 21.1 Billion transistors 815 mm<sup>2</sup>

https://www.anandtech.com/show/14758/hot-chips-31-live-blogs-cerebras-wafer-scale-deep-learning

https://www.cerebras.net/cerebras-wafer-scale-engine-why-we-need-big-chips-for-deep-learning2

# Cerebras's Wafer Scale Engine-2 (2021)



The largest ML accelerator chip

850,000 cores

### 40 GB of on-chip memory

20 PB/s memory bandwidth



Largest GPU 54.2 Billion transistors 826 mm<sup>2</sup> NVIDIA Ampere GA100

46,225 mm<sup>2</sup>

2.6 Trillion transistors

https://cerebras.net/product/#overview

### Memory System: Most of the Platform



#### Most of the system is dedicated to storing and moving data

#### **SAFARI** Yet, system is still bottlenecked by memory

# Memory is Critical for Performance

- We have seen it many times in this course
- Load-related stalls in pipelining
  - Even with magic "1-cycle" memory assumption
- Load/store handling in OoO execution processors
- OoO execution and memory latency tolerance
- VLIW stalls due to long-latency memory operations
- VLIW memory bank disambiguation
- Many memory banks needed in SIMD processors
   SIMD vector processing performance example
- GPU register files and memory systems
- Fine-grained multithreading to tolerate memory latency



# Computing is Bottlenecked by Data

# Computation is Bottlenecked by Memory

- Important workloads are all data intensive
  - ML/AI, Genomics, Data Analytics, Databases, Graph Analytics, ...

 They require rapid and efficient processing of large amounts of data

- Data is increasing
  - We can generate much more than we can process

# Application Perspective

# Memory Is Critical for Performance (I)



#### **In-memory Databases**

[Mao+, EuroSys'12; Clapp+ (**Intel**), IISWC'15]



#### **In-Memory Data Analytics**

[Clapp+ (**Intel**), IISWC'15; Awan+, BDCloud'15]



**Graph/Tree Processing** [Xu+, IISWC'12; Umuroglu+, FPL'15]



**Datacenter Workloads** [Kanev+ (**Google**), ISCA'15]

# Memory Is Critical for Performance (I)





#### **In-memory Databases**

#### **Graph/Tree Processing**

### Memory → bottleneck



#### In-Memory Data Analytics

[Clapp+ (**Intel**), IISWC'15; Awan+, BDCloud'15]



### Datacenter Workloads

[Kanev+ (Google), ISCA'I5]

#### SAFARI

# Memory Is Critical for Performance (II)



Chrome

**Google's web browser** 



### **TensorFlow Mobile**

Google's machine learning framework



Google's video codec



# Memory Is Critical for Performance (II)



Chrome

### **TensorFlow Mobile**

### Memory → bottleneck



Google's video codec



### Data is Key for Modern & Future Workloads



http://www.economist.com/news/21631808-so-much-genetic-data-so-many-uses-genes-unzipped

SAFARI

33





### Memory → bottleneck

| reference. | TTATCOCTTCCATGACGCAG |
|------------|----------------------|
| read1:     | ATCGCATCC            |
|            |                      |
| read2:     | TATCGCATC            |
| read3:     | CATCCATGA            |
|            |                      |
| read4:     | CGCTTCCAT            |
| read5:     | CCATGACGC            |
|            |                      |
| read6:     | TTCCATGAC            |
|            |                      |

### 3 Variant Calling



### Scientific Discovery 4

# New Genome Sequencing Technologies

Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions

Damla Senol Cali 🖾, Jeremie S Kim, Saugata Ghose, Can Alkan, Onur Mutlu

Briefings in Bioinformatics, bby017, https://doi.org/10.1093/bib/bby017 Published: 02 April 2018 Article history ▼



**Oxford Nanopore MinION** 

Senol Cali+, "Nanopore Sequencing Technology and Tools for Genome Assembly: Computational Analysis of the Current State, Bottlenecks and Future Directions," Briefings in Bioinformatics, 2018. [Open arxiv.org version]

#### SAFARI

### New Genome Sequencing Technologies

Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions

Damla Senol Cali 🖾, Jeremie S Kim, Saugata Ghose, Can Alkan, Onur Mutlu

Briefings in Bioinformatics, bby017, https://doi.org/10.1093/bib/bby017 Published: 02 April 2018 Article history ▼



**Oxford Nanopore MinION** 

#### Memory → bottleneck

#### SAFARI

### Future of Genome Sequencing & Analysis

Mohammed Alser, Zülal Bingöl, Damla Senol Cali, Jeremie Kim, Saugata Ghose, Can Alkan, Onur Mutlu <u>"Accelerating Genome Analysis: A Primer on an Ongoing Journey"</u> IEEE Micro, August 2020.



Sept.-Oct. 2020, pp. 65-75, vol. 40 DOI Bookmark: 10.1109/MM.2020.3013728

#### FPGA-Based Near-Memory Acceleration of Modern Data-Intensive Applications

July-Aug. 2021, pp. 39-48, vol. 41 DOI Bookmark: 10.1109/MM.2021.3088396

MinION from ONT

#### SmidgION from ONT

#### Accelerating Genome Analysis

 Mohammed Alser, Zulal Bingol, Damla Senol Cali, Jeremie Kim, Saugata Ghose, Can Alkan, and Onur Mutlu,
 "Accelerating Genome Analysis: A Primer on an Ongoing Journey" *IEEE Micro* (*IEEE MICRO*), Vol. 40, No. 5, pages 65-75, September/October 2020.
 [Slides (pptx)(pdf)]
 [Talk Video (1 hour 2 minutes)]

### Accelerating Genome Analysis: A Primer on an Ongoing Journey

Mohammed Alser ETH Zürich

Zülal Bingöl Bilkent University

SAFA

Damla Senol Cali Carnegie Mellon University

Jeremie Kim ETH Zurich and Carnegie Mellon University Saugata Ghose University of Illinois at Urbana–Champaign and Carnegie Mellon University

Can Alkan Bilkent University

**Onur Mutlu** ETH Zurich, Carnegie Mellon University, and Bilkent University

### FPGA-based Near-Memory Analytics

 Gagandeep Singh, Mohammed Alser, Damla Senol Cali, Dionysios Diamantopoulos, Juan Gómez-Luna, Henk Corporaal, and Onur Mutlu, "FPGA-based Near-Memory Acceleration of Modern Data-Intensive Applications" <u>IEEE Micro</u> (IEEE MICRO), 2021.

# FPGA-based Near-Memory Acceleration of Modern Data-Intensive Applications

Gagandeep Singh<sup>◊</sup> Mohammed Alser<sup>◊</sup> Damla Senol Cali<sup>⋈</sup>

**Dionysios Diamantopoulos**<sup>∇</sup> **Juan Gómez-Luna**<sup>◊</sup>

Henk Corporaal<sup>★</sup> Onur Mutlu<sup>◇ ⋈</sup>

◇ETH Zürich <sup>™</sup>Carnegie Mellon University
 \*Eindhoven University of Technology <sup>▽</sup>IBM Research Europe

#### GenASM Acceleration Framework [MICRO 2020]

Damla Senol Cali, Gurpreet S. Kalsi, Zulal Bingol, Can Firtina, Lavanya Subramanian, Jeremie S. Kim, Rachata Ausavarungnirun, Mohammed Alser, Juan Gomez-Luna, Amirali Boroumand, Anant Nori, Allison Scibisz, Sreenivas Subramoney, Can Alkan, Saugata Ghose, and Onur Mutlu, "GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis" *Proceedings of the <u>53rd International Symposium on Microarchitecture</u> (<i>MICRO*), Virtual, October 2020.
 [Lighting Talk Video (1.5 minutes)]
 [Lightning Talk Slides (pptx) (pdf)]
 [Slides (pptx) (pdf)]

#### GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis

Damla Senol Cali<sup>†</sup><sup>™</sup> Gurpreet S. Kalsi<sup>™</sup> Zülal Bingöl<sup>▽</sup> Can Firtina<sup>◊</sup> Lavanya Subramanian<sup>‡</sup> Jeremie S. Kim<sup>◊†</sup> Rachata Ausavarungnirun<sup>⊙</sup> Mohammed Alser<sup>◊</sup> Juan Gomez-Luna<sup>◊</sup> Amirali Boroumand<sup>†</sup> Anant Nori<sup>™</sup> Allison Scibisz<sup>†</sup> Sreenivas Subramoney<sup>™</sup> Can Alkan<sup>▽</sup> Saugata Ghose<sup>\*†</sup> Onur Mutlu<sup>◊†▽</sup> <sup>†</sup>Carnegie Mellon University <sup>™</sup>Processor Architecture Research Lab, Intel Labs <sup>¬</sup>Bilkent University <sup>◊</sup>ETH Zürich <sup>‡</sup>Facebook <sup>⊙</sup>King Mongkut's University of Technology North Bangkok <sup>\*</sup>University of Illinois at Urbana–Champaign 41

### In-Storage Genome Filtering [ASPLOS 2022]

 Nika Mansouri Ghiasi, Jisung Park, Harun Mustafa, Jeremie Kim, Ataberk Olgun, Arvid Gollwitzer, Damla Senol Cali, Can Firtina, Haiyu Mao, Nour Almadhoun Alserr, Rachata Ausavarungnirun, Nandita Vijaykumar, Mohammed Alser, and Onur Mutlu, "GenStore: A High-Performance and Energy-Efficient In-Storage Computing System for Genome Sequence Analysis" Proceedings of the <u>27th International Conference on Architectural Support for</u> Programming Languages and Operating Systems (ASPLOS), Virtual, February-March 2022. [Talk Slides (pptx) (pdf)] [Lightning Talk Slides (pptx) (pdf)] [Lightning Talk Video (90 seconds)]

[Talk Video (17 minutes)]

#### GenStore: A High-Performance In-Storage Processing System for Genome Sequence Analysis

Nika Mansouri Ghiasi<sup>1</sup> Jisung Park<sup>1</sup> Harun Mustafa<sup>1</sup> Jeremie Kim<sup>1</sup> Ataberk Olgun<sup>1</sup> Arvid Gollwitzer<sup>1</sup> Damla Senol Cali<sup>2</sup> Can Firtina<sup>1</sup> Haiyu Mao<sup>1</sup> Nour Almadhoun Alserr<sup>1</sup> Rachata Ausavarungnirun<sup>3</sup> Nandita Vijaykumar<sup>4</sup> Mohammed Alser<sup>1</sup> Onur Mutlu<sup>1</sup>

<sup>1</sup>ETH Zürich <sup>2</sup>Bionano Genomics <sup>3</sup>KMUTNB <sup>4</sup>University of Toronto

#### SeGraM: A Universal Hardware Accelerator for Genomic Sequence-to-Graph and Sequence-to-Sequence Mapping

Damla Senol Cali<sup>1</sup> Konstantinos Kanellopoulos<sup>2</sup> Joël Lindegger<sup>2</sup> Zülal Bingöl<sup>3</sup> Gurpreet S. Kalsi<sup>4</sup> Ziyi Zuo<sup>5</sup> Can Firtina<sup>2</sup> Meryem Banu Cavlak<sup>2</sup> Jeremie Kim<sup>2</sup> Nika Mansouri Ghiasi<sup>2</sup> Gagandeep Singh<sup>2</sup> Juan Gómez-Luna<sup>2</sup> Nour Almadhoun Alserr<sup>2</sup> Mohammed Alser<sup>2</sup> Sreenivas Subramoney<sup>4</sup> Can Alkan<sup>3</sup> Saugata Ghose<sup>6</sup> Onur Mutlu<sup>2</sup>

> <sup>1</sup>Bionano Genomics <sup>2</sup>ETH Zürich <sup>3</sup>Bilkent University <sup>4</sup>Intel Labs <sup>5</sup>Carnegie Mellon University <sup>6</sup>University of Illinois Urbana-Champaign

> > https://arxiv.org/pdf/2205.05883.pdf

#### More on Fast & Efficient Genome Analysis ...

Onur Mutlu, "Accelerating Genome Analysis: A Primer on an Ongoing Journey" *Invited Lecture at <u>Technion</u>*, Virtual, 26 January 2021. [Slides (pptx) (pdf)] [Talk Video (1 hour 37 minutes, including Q&A)] [Related Invited Paper (at IEEE Micro, 2020)]



Onur Mutlu - Invited Lecture @Technion: Accelerating Genome Analysis: A Primer on an Ongoing Journey





**Onur Mutlu Lectures** 15.9K subscribers

SHARE EL SAVE ANALYTICS

EDIT VIDEO

### Detailed Lectures on Genome Analysis

- Computer Architecture, Fall 2020, Lecture 3a
  - Introduction to Genome Sequence Analysis (ETH Zürich, Fall 2020)
  - https://www.youtube.com/watch?v=CrRb32v7SJc&list=PL5Q2soXY2Zi9xidyIgBxUz7 xRPS-wisBN&index=5
- Computer Architecture, Fall 2020, Lecture 8
  - **Intelligent Genome Analysis** (ETH Zürich, Fall 2020)
  - https://www.youtube.com/watch?v=ygmQpdDTL7o&list=PL5Q2soXY2Zi9xidyIgBxU z7xRPS-wisBN&index=14
- Computer Architecture, Fall 2020, Lecture 9a

SAFARI

- **GenASM: Approx. String Matching Accelerator** (ETH Zürich, Fall 2020)
- https://www.youtube.com/watch?v=XoLpzmN-Pas&list=PL5Q2soXY2Zi9xidyIgBxUz7xRPS-wisBN&index=15
- Accelerating Genomics Project Course, Fall 2020, Lecture 1
  - Accelerating Genomics (ETH Zürich, Fall 2020)
  - https://www.youtube.com/watch?v=rgjl8ZyLsAg&list=PL5Q2soXY2Zi9E2bBVAgCqL gwiDRQDTyId

#### https://www.youtube.com/onurmutlulectures

### Performance Perspective

### Memory Bottleneck

I expect that over the coming decade memory subsystem design will be the *only* important design issue for microprocessors.

#### "It's the Memory, Stupid!" (Richard Sites, MPR, 1996)



Mutlu+, "Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors," HPCA 2003?

### The Performance Perspective

 Onur Mutlu, Jared Stark, Chris Wilkerson, and Yale N. Patt, <u>"Runahead Execution: An Alternative to Very Large Instruction Windows</u> <u>for Out-of-order Processors"</u>

Proceedings of the <u>9th International Symposium on High-Performance Computer</u> <u>Architecture</u> (**HPCA**), pages 129-140, Anaheim, CA, February 2003. <u>Slides (pdf)</u>

One of the 15 computer arch. papers of 2003 selected as Top Picks by IEEE Micro. HPCA Test of Time Award (awarded in 2021).

[Lecture Slides (pptx) (pdf)] [Lecture Video (1 hr 54 mins)] [Retrospective HPCA Test of Time Award Talk Slides (pptx) (pdf)] [Retrospective HPCA Test of Time Award Talk Video (14 minutes)]

#### **Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors**

Onur Mutlu § Jared Stark † Chris Wilkerson ‡ Yale N. Patt §

§ECE Department The University of Texas at Austin {onur,patt}@ece.utexas.edu †Microprocessor Research Intel Labs jared.w.stark@intel.com ‡Desktop Platforms Group Intel Corporation chris.wilkerson@intel.com

 Onur Mutlu, Jared Stark, Chris Wilkerson, and Yale N. Patt, "Runahead Execution: An Effective Alternative to Large <u>Instruction Windows"</u> <u>IEEE Micro, Special Issue: Micro's Top Picks from Microarchitecture</u>

<u>Conferences</u> (**MICRO TOP PICKS**), Vol. 23, No. 6, pages 20-25, November/December 2003.

## RUNAHEAD EXECUTION: AN EFFECTIVE ALTERNATIVE TO LARGE INSTRUCTION WINDOWS

#### **RICHARD SITES**

#### It's the Memory, Stupid!

When we started the Alpha architecture design in 1988, we estimated a 25-year lifetime and a relatively modest 32% per year compounded performance improvement of implementations over that lifetime (1,000× total). We guestimated about 10× would come from CPU clock improvement, 10× from multiple instruction issue, and 10× from multiple processors.

5, 1996 🏈 MICROPROCESSOR REPORT

#### All of Google's Data Center Workloads (2015):



Kanev+, "Profiling a Warehouse-Scale Computer," ISCA 2015.

#### All of Google's Data Center Workloads (2015):



#### Figure 11: Half of cycles are spent stalled on caches.

#### An Informal Interview on Memory

## Madeleine Gray and Onur Mutlu, ""It's the memory, stupid': A conversation with Onur Mutlu" *HiPEAC info 55*, *HiPEAC Newsletter*, October 2018. [Shorter Version in Newsletter] [Longer Online Version with References]

#### 'It's the memory, stupid': A conversation with Onur Mutlu

'We're beyond computation; we know how to do computation really well, we can optimize it, we can build all sorts of accelerators ... but the memory – how to feed the data, how to get the data into the accelerators – is a huge problem.'

This was how ETH Zürich and Carnegie Mellon Professor Onur Mutlu opened his course on memory systems and memory-centric computing systems at HiPEAC's summer school, ACACES18. A prolific publisher – he recently bagged the top spot on the International Symposium on Computer Architecture (ISCA) hall of fame – Onur is passionate about computation and communication that are efficient and secure by design. In advance of our Computing Systems Week focusing on data centres, storage, and networking, which takes place



next week in Heraklion, HiPEAC picked his brains on all things data-based.

### Energy Perspective



#### SAFARI



### A memory access consumes ~100-1000X the energy of a complex addition





| 32-bit Operation     | Energy (pJ) | ADD (int) Relative Cost |  |
|----------------------|-------------|-------------------------|--|
| ADD (int)            | 0.1         | 1                       |  |
| ADD (float)          | 0.9         | 9                       |  |
| <b>Register File</b> | 1           | 10                      |  |
| MULT (int)           | 3.1         | 31                      |  |
| MULT (float)         | 3.7         | 37                      |  |
| SRAM Cache           | 5           | 50                      |  |
| DRAM                 | 640         | 6400                    |  |

#### A memory access consumes ~6400X the energy of an integer addition

**SAFARI** Han+, "EIE: Efficient Inference Engine on Compressed Deep Neural Network," ISCA 2016. <sup>59</sup>

### Memory is Critical for Energy

 Amirali Boroumand, Saugata Ghose, Youngsok Kim, Rachata Ausavarungnirun, Eric Shiu, Rahul Thakur, Daehyun Kim, Aki Kuusela, Allan Knies, Parthasarathy Ranganathan, and Onur Mutlu, "Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks" Proceedings of the <u>23rd International Conference on Architectural Support for Programming</u> <u>Languages and Operating Systems</u> (ASPLOS), Williamsburg, VA, USA, March 2018.

## 62.7% of the total system energy is spent on data movement

#### Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks

Amirali Boroumand<sup>1</sup>Saugata Ghose<sup>1</sup>Youngsok Kim<sup>2</sup>Rachata Ausavarungnirun<sup>1</sup>Eric Shiu<sup>3</sup>Rahul Thakur<sup>3</sup>Daehyun Kim<sup>4,3</sup>Aki Kuusela<sup>3</sup>Allan Knies<sup>3</sup>Parthasarathy Ranganathan<sup>3</sup>Onur Mutlu<sup>5,1</sup>60

#### Processing in Memory: Faster & Low Energy

#### **ETH** zürich

News & events ETH Zurich Studies at ETH Zurich Doctorate Research Industry & Knowledge Transfer Campus

Homepage > Industry & Knowledge Transfer > ... 2022 > 03 > In-Memory-Computing: faster and more energy efficient

#### In-Memory-Computing: faster and more energy efficient

10.03.2022 | Sustainability, Industry Projects By: Anna Julia Schlegel

Big Data applications require high computing performance while consuming as little power as possible. Current computer systems are reaching their limits in both areas. Professor Onur Mutlu is working on alternative systems and has just received the Intel 2021 Outstanding Researcher Award for his work.

You may have heard that Moore's law is coming to an end. This empirical observation states that computers double their performance approximately every 2 years. Alternative approaches to improve the efficiency of computing are therefore in great demand. Prof. Onur Mutlu, whose research interests include hardware/software co-design at ETH Zurich, is pursuing the approach of combining computing and memory. **Processing-in-memory (PIM) computing** makes Big Data applications such as genome analysis both substantially faster and more energy-efficient.

Recently, the Grenoble-based company UPMEM launched the first commercially available PIM architecture. Instead of a processor or CPUs (Central Processing Units), it contains DPUs (DRAM Processing Units), which are memory elements that also process the data. Mutlu and his research group have characterised, analysed, and tested the new system and compared it with a previous state-of-the-art system with CPUs. They have learned that the novel system makes computing up to 23 times faster and five times more energy efficient. The new system is most interesting for data-intensive applications - specific examples include gene analysis or weather forecast models. "Not bad for the first commercial version of a processing-in-memory system," Mutlu says, "compared to a processoricentric CPU system that has been optimised for decades."



The UPMEM Processing-In-Memory-System. (Source: Onur Mutlu)

#### SUSTAINABILITY . INDUSTRY PROJECTS

## In-Memory-Computing: faster and more energy efficient

https://ethz.ch/en/industry/industry/news/data/202 2/03/mehr-daten-schneller-und-energiesparenderverarbeiten.html

#### Much faster and more energy-efficient

Mutlu and his colleagues have tested the novel system for applications in the fields of data analysis, databases, bioinformatics, image- and video analysis, and neural networks, among others. The PIMsystem is best suited for workloads requiring little communication between DPUs (e.g. database and image applications) and primarily simple arithmetic operations (e.g. video analytics or data filtering). "We expect that as these systems evolve, they will become even faster and more energy efficient, and their applications will become even more diverse," Mutlu reckons.

### Tutorial on Processing in Memory

Onur Mutlu, <u>"Memory-Centric Computing"</u> *Education Class at <u>Embedded Systems Week (ESWEEK)</u>, Virtual, 9 October 2021. [<u>Slides (pptx) (pdf)</u>] [<u>Abstract (pdf)</u>] [<u>Talk Video (2 hours, including Q&A)</u>] [<u>Invited Paper at DATE 2021</u>] [<u>"A Modern Primer on Processing in Memory" paper</u>]* 

https://www.youtube.com/watch?v=N1Ac1ov1JOM

| ]                                                             | Memory-Centric                                                                     |                 |                               |                     |
|---------------------------------------------------------------|------------------------------------------------------------------------------------|-----------------|-------------------------------|---------------------|
|                                                               | Computing                                                                          |                 |                               |                     |
|                                                               | Onur Mutlu<br>omutlu@gmail.com<br>https://people.inf.ethz.ch/omu<br>9 October 2021 | <u>tlu</u>      |                               | I Onur Mutiu        |
|                                                               | ESWEEK Education Class                                                             |                 |                               |                     |
| SAFARI                                                        | <b>ETH</b> zürich                                                                  | Carnegie Mellon |                               | L <sub>2</sub>      |
| I I:08 / 2                                                    | :00:10                                                                             |                 | CC                            | ♦ 🖬 🗆 🖸             |
| mbedded Systems Week (ESW<br>)9 views • Premiered Dec 6, 2021 | EEK) 2021 Lecture - Memory-Centric Computing                                       |                 | ▲ 28 5 <sup>□</sup> DISLIKE 2 | ⇒ SHARE =+ SAVE ··· |



20.7K subscribers

https://www.youtube.com/watch?v=N1Ac1ov1JOM

ANALYTICS EDIT VIDEO

https://www.youtube.com/onurmutlulectures

### Reliability & Security Perspectives

### Memory is Critical for Reliability

- Data from all of Facebook's servers worldwide
- Meza+, "Revisiting Memory Errors in Large-Scale Production Data Centers," DSN'15.



Chip density (Gb)

### Large-Scale Failure Analysis of DRAM Chips

- Analysis and modeling of memory errors found in all of Facebook's server fleet
- Justin Meza, Qiang Wu, Sanjeev Kumar, and Onur Mutlu, "Revisiting Memory Errors in Large-Scale Production Data Centers: Analysis and Modeling of New Trends from the Field" Proceedings of the <u>45th Annual IEEE/IFIP International Conference on</u> Dependable Systems and Networks (DSN), Rio de Janeiro, Brazil, June 2015. [Slides (pptx) (pdf)] [DRAM Error Model]

Revisiting Memory Errors in Large-Scale Production Data Centers: Analysis and Modeling of New Trends from the Field

Justin Meza Qiang Wu\* Sanjeev Kumar\* Onur Mutlu

Carnegie Mellon University \* Facebook, Inc.

A Curious Discovery [Kim et al., ISCA 2014]

## One can predictably induce errors in most DRAM memory chips

### A simple hardware failure mechanism can create a widespread system security vulnerability



#### One Can Take Over an Otherwise-Secure System

#### Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors

Abstract. Memory isolation is a key property of a reliable and secure computing system — an access to one memory address should not have unintended side effects on data stored in other addresses. However, as DRAM process technology

### Project Zero

Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors (Kim et al., ISCA 2014)

News and updates from the Project Zero team at Google

Exploiting the DRAM rowhammer bug to gain kernel privileges (Seaborn+, 2015)

Monday, March 9, 2015

Exploiting the DRAM rowhammer bug to gain kernel privileges

#### A RowHammer Survey Across the Stack

Onur Mutlu and Jeremie Kim,
 "RowHammer: A Retrospective"
 *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) Special Issue on Top Picks in Hardware and Embedded Security*, 2019.
 [Preliminary arXiv version]
 [Slides from COSADE 2019 (pptx)]
 [Slides from VLSI-SOC 2020 (pptx) (pdf)]
 [Talk Video (1 hr 15 minutes, with Q&A)]

### RowHammer: A Retrospective

Onur Mutlu§‡Jeremie S. Kim‡§§ETH Zürich‡Carnegie Mellon University

#### Memory is Critical for Security



#### Detailed Lectures on RowHammer

- Computer Architecture, Fall 2021, Lecture 5
  - RowHammer (ETH Zürich, Fall 2021)
  - https://www.youtube.com/watch?v=7wVKnPj3NVw&list=P L5Q2soXY2Zi-Mnk1PxjEIG32HAGILkTOF&index=5
- Computer Architecture, Fall 2021, Lecture 6
  - RowHammer and Secure & Reliable Memory (ETH Zürich, Fall 2021)
  - https://www.youtube.com/watch?v=HNd4skQrt6I&list=PL 5Q2soXY2Zi-Mnk1PxjEIG32HAGILkTOF&index=6

#### https://www.youtube.com/onurmutlulectures

#### 10 Years of RowHammer in 20 Minutes

#### Onur Mutlu,

#### "The Story of RowHammer"

402 views · Premiered Apr 27, 2022

*Invited Talk at the <u>Workshop on Robust and Safe Software 2.0</u> (RSS2), held with <u>the</u> <u>27th International Conference on Architectural Support for Programming Languages and</u> <u>Operating Systems</u> (ASPLOS), Virtual, 28 February 2022. [<u>Slides (pptx) (pdf)</u>]* 



**1**7

 $\bigtriangledown$  dislike  $\Rightarrow$  share  $\pm$  download % clip =+ save ...

# Memory Is Critical for Computing

## Memory Is Critical for Computing

- Performance
- Energy
- Reliability
- Security & Safety
- Cost
- Form Factor
- Quality of Service & Predictability

## Memory Fundamentals

# Memory Organization & Technology

#### Memory (Programmer's View)



#### Abstraction: Virtual vs. Physical Memory

- Programmer sees virtual memory
  - Can assume the memory is "infinite"
- Reality: Physical memory size is much smaller than what the programmer assumes
- The system (system software + hardware, cooperatively) maps virtual memory addresses to physical memory
  - The system automatically manages the physical memory space transparently to the programmer
- + Programmer does not need to know the physical size of memory nor manage it  $\rightarrow$  A small physical memory can appear as a huge one to the programmer  $\rightarrow$  Life is easier for the programmer
- -- More complex system software and architecture

A classic example of the programmer/(micro)architect tradeoff

#### (Physical) Memory System

- You need a larger level of storage to manage a small amount of physical memory automatically
   → Physical memory has a backing store: disk
- We will first start with the physical memory system
- For now, ignore the virtual  $\rightarrow$  physical indirection
- We will get back to it later, if time permits...

#### Idealism



- Enough functional units
- Zero latency compute

# Quick Overview of Memory Arrays

#### How Can We Store Data?

- Flip-Flops (or Latches)
  - Very fast, parallel access
  - Very expensive (one bit costs tens of transistors)
- Static RAM (we will describe them in a moment)
  - Relatively fast, only one data word at a time
  - Expensive (one bit costs 6+ transistors)
- Dynamic RAM (we will describe them in a moment)
  - Slower, one data word at a time, reading destroys content (refresh), needs special process for manufacturing
  - Cheap (one bit costs only one transistor plus one capacitor)
- Other storage technology (flash memory, hard disk, tape)
  - Much slower, access takes a long time, non-volatile
  - Very cheap (one transistor stores many bits or no transistors involved)

#### Array Organization of Memories

- Goal: Efficiently store large amounts of data
  - A memory array (stores data)
  - Address selection logic (selects one row of the array)
  - Readout circuitry (reads data out)



Address —

- All values can be accessed, but only M-bits at a time
- Access restriction allows more compact organization

Array

#### Recall: A Bigger Memory Array (4 locations X 3 bits)



DDCA Lecture 6 https://www.youtube.com/watch?v=QcFP4kNdKt0&list=PL5Q2soXY2Zi97Ya5DEUpMpO2bbAoaG7c6&index=6

#### Memory Arrays

- Two-dimensional array of bit cells
  - Each bit cell stores one bit
- An array with N address bits and M data bits:
  - $\square$  2<sup>N</sup> rows and M columns
  - Depth: number of rows (can be number of "words")
  - Width: number of columns (can be the "word" size)
  - Array size: depth  $\times$  width =  $2^{N} \times M$



#### Memory Array Example

- 2<sup>2</sup> × 3-bit array
- Number of rows: 4
- Row size: 3 bits
- For example, the 3-bit data stored at row 10 is 100



#### Larger and Wider Memory Array Example



#### Memory Array Organization (I)

- Storage nodes in one column connected to one bitline
- Address decoder activates only ONE wordline
- Content of one line of storage available at output



#### Memory Array Organization (II)

- Storage nodes in one column connected to one bitline
- Address decoder activates only ONE wordline
- Content of one line of storage available at output



#### How is Access Controlled?

- Access transistors (that are configured as switches) connect the bit storage to the bitline
- Access controlled by the wordline



#### Building Larger Memories

- Requires larger memory arrays
- Large  $\rightarrow$  slow
- How do we make the memory large without making it too slow?
- Idea: Divide the memory into smaller arrays and interconnect the arrays to input/output buses
  - Large memories are hierarchical array structures
  - □ DRAM: Channel  $\rightarrow$  Rank  $\rightarrow$  Bank  $\rightarrow$  Subarrays  $\rightarrow$  Mats

#### General Principle: Interleaving (Banking)

#### Interleaving (banking)

- Problem: a single monolithic large memory array takes long to access and does not enable multiple accesses in parallel
- Goal: Reduce the latency of memory array access and enable multiple accesses in parallel
- Idea: Divide a large array into multiple banks that can be accessed independently (in the same cycle or in consecutive cycles)
  - Each bank is smaller than the entire memory storage
  - Accesses to different banks can be overlapped
- A Key Issue: How do you map data to different banks? (i.e., how do you interleave data across banks?)

#### Recall: Memory Banking

- Memory is divided into banks that can be accessed independently; banks share address and data buses (to minimize pin cost)
- Can start and complete one bank access per cycle
- Can sustain N concurrent accesses if all N go to different banks



#### Generalized Memory Structure



#### Generalized Memory Structure



Kim+, "A Case for Exploiting Subarray-Level Parallelism in DRAM," ISCA 2012. Lee+, "Decoupled Direct Memory Access," PACT 2015.

#### Cutting Edge: 3D-Stacking of Memory & Logic



# Hybrid Memory Cube



#### SAFARI

## The DRAM Subsystem A Top-Down View

## DRAM Subsystem Organization

- Channel
- DIMM
- Rank
- Chip
- Bank
- Row/Column



#### The DRAM Subsystem



### Breaking down a DIMM (module)



### Breaking down a DIMM (module)



#### Rank



#### Breaking down a Rank



#### Breaking down a Chip



#### Breaking down a Bank



## Digging Deeper: DRAM Bank Operation



#### A DRAM Bank Internally Has Sub-Banks



Figure 1. DRAM bank organization

### Another View of a DRAM Bank



Seshadri+, "In-DRAM Bulk Bitwise Execution Engine," ADCOM 2020.

### More on DRAM Basics & Organization

 Vivek Seshadri and Onur Mutlu,
 <u>"In-DRAM Bulk Bitwise Execution Engine"</u> Invited Book Chapter in Advances in Computers, 2020.
 [Preliminary arXiv version]

See Section 2 for comprehensive DRAM Background

### In-DRAM Bulk Bitwise Execution Engine

Vivek Seshadri Microsoft Research India visesha@microsoft.com Onur Mutlu ETH Zürich onur.mutlu@inf.ethz.ch

https://arxiv.org/pdf/1905.09822.pdf

# DRAM Subsystem Organization

- Channel
- DIMM
- Rank
- Chip
- Bank
- Row/Column















**Physical memory space** 



A 64B cache block takes 8 I/O cycles to transfer.

During the process, 8 columns are read sequentially.

# Memory Technology: DRAM and SRAM

### Memory Technology: DRAM

- Dynamic random access memory
- Capacitor charge state indicates stored value
  - Whether the capacitor is charged or discharged indicates storage of 1 or 0
  - 1 capacitor
  - 1 access transistor
- Capacitor leaks through the RC path
   DRAM cell loses charge over time
  - DRAM cell needs to be refreshed



### Memory Technology: SRAM

- Static random access memory
- Two cross coupled inverters store a single bit
  - □ Feedback path enables the stored value to persist in the "cell"
  - 4 transistors for storage
  - 2 transistors for access



### Memory Bank Organization and Operation



#### Read access sequence:

1. Decode row address & drive word-lines

2. Selected bits drive bit-lines

• Entire row read

3. Amplify row data

4. Decode column address & select subset of row

- Send to output
- 5. Precharge bit-lines
  - For next access

### SRAM (Static Random Access Memory)



#### **Read Sequence**

- 1. address decode
- 2. drive row select
- selected bit-cells drive bitlines (entire row is read together)
- differential sensing and column select (data is ready)
- 5. precharge all bitlines (for next read or write)

Access latency dominated by steps 2 and 3 Cycling time dominated by steps 2, 3 and 5 step 2 proportional to 2<sup>m</sup>

step 3 and 5 proportional to 2<sup>n</sup>

### DRAM (Dynamic Random Access Memory)



### DRAM vs. SRAM

### DRAM

- Slower access (capacitor)
- Higher density (1T 1C cell)
- Lower cost
- Requires refresh (power, performance, circuitry)
- Manufacturing requires putting capacitor and logic together

### SRAM

- Faster access (no capacitor)
- Lower density (6T cell)
- Higher cost
- No need for refresh
- Manufacturing compatible with logic process (no capacitor)

### An Aside: Phase Change Memory

- Phase change material (chalcogenide glass) exists in two states:
  - Amorphous: Low optical reflexivity and high electrical resistivity
  - Crystalline: High optical reflexivity and low electrical resistivity



PCM is resistive memory: High resistance (0), Low resistance (1)

Lee, Ipek, Mutlu, Burger, "Architecting Phase Change Memory as a Scalable DRAM Alternative," ISCA 2009.

### PCM-based Main Memory

How should PCM-based (main) memory be organized?



- Pure PCM main memory [Lee et al., ISCA'09, Top Picks'10]
   How to redesign the system to tolerate PCM shortcomings
- Hybrid PCM+DRAM [Qureshi+ ISCA'09, Dhiman+ DAC'09]
   How to partition/migrate data between PCM and DRAM

### Reading: PCM as Main Memory: Idea in 2009

Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger,
"Architecting Phase Change Memory as a Scalable DRAM Alternative"
Proceedings of the <u>36th International Symposium on Computer</u>
<u>Architecture</u> (ISCA), pages 2-13, Austin, TX, June 2009. <u>Slides (pdf)</u>
One of the 13 computer architecture papers of 2009 selected as Top
Picks by IEEE Micro. Selected as a CACM Research Highlight.
2022 Persistent Impact Prize.

### Architecting Phase Change Memory as a Scalable DRAM Alternative

Benjamin C. Lee† Engin Ipek† Onur Mutlu‡ Doug Burger†

†Computer Architecture Group Microsoft Research Redmond, WA {blee, ipek, dburger}@microsoft.com

SAFARI

‡Computer Architecture Laboratory Carnegie Mellon University Pittsburgh, PA onur@cmu.edu

# Reading: More on PCM As Main Memory

 Benjamin C. Lee, Ping Zhou, Jun Yang, Youtao Zhang, Bo Zhao, Engin Ipek, Onur Mutlu, and Doug Burger,
 "Phase Change Technology and the Future of Main Memory" <u>IEEE Micro</u>, Special Issue: Micro's Top Picks from 2009 Computer Architecture Conferences (MICRO TOP PICKS), Vol. 30, No. 1, pages 60-70, January/February 2010.

# Phase-Change Technology and the Future of Main Memory

### Intel Optane Persistent Memory (2019)

- Non-volatile main memory
- Based on 3D-XPoint Technology



### SAFARI <u>https://www.storagereview.com/intel\_optane\_dc\_persistent\_memory\_module\_pmm</u>

# DRAM vs. PCM

#### DRAM

- Faster access (capacitor)
- □ Lower density (capacitor less scalable)  $\rightarrow$  higher cost
- Requires refresh (power, performance, circuitry)
- Manufacturing requires putting capacitor and logic together
- Volatile (loses data at loss of power)
- No endurance problems
- Lower access energy

### PCM

- Slower access (heating and cooling based "phase change" operation)
- □ Higher density (phase change material more scalable)  $\rightarrow$  lower cost
- No need for refresh
- Manufacturing requires less conventional processes less mature
- Non-volatile (does **not** lose data at loss of power)
- Endurance problems (a cell cannot be used after N writes to it)
- Higher access energy

#### SAFARI

### Charge vs. Resistive Memories

### Charge Memory (e.g., DRAM, Flash)

- Write data by capturing charge Q
- Read data by detecting voltage V

### Resistive Memory (e.g., PCM, STT-MRAM, memristors)

- Write data by pulsing current dQ/dt
- Read data by detecting resistance R

# Promising Resistive Memory Technologies

#### PCM

- Inject current to change material phase
- Resistance determined by phase

### STT-MRAM

- Inject current to change magnet polarity
- Resistance determined by polarity
- Memristors/RRAM/ReRAM
  - Inject current to change atomic structure
  - Resistance determined by atom distance

# More on Emerging Memory Technologies



# More on Emerging Memory Technologies



Onur Mutlu Lectures 16.3K subscribers

https://www.youtube.com/watch?v=pmLszWGmMGQ&list=PL5Q2soXY2Zi9xidyIgBxUz7xRPS-wisBN&index=29

EDIT VIDEO

ANALYTICS

### More on Memory Technologies





EDIT VIDEO

ANALYTICS

### A Bit on Flash Memory & SSDs

Flash memory was a very "doubtful" emerging technology
 for at least two decades



By YU CAI, SAUGATA GHOSE, ERICH F. HARATSCH, YIXIN LUO, AND ONUR MUTLU

ABSTRACT | NAND flash memory is ubiquitous in everyday life today because its capacity has continuously increased and **KEYWORDS** | Data storage systems; error recovery; fault tolerance; flash memory; reliability; solid-state drives

SAFARI

https://arxiv.org/pdf/1711.11427.pdf

### A Flash Memory SSD Controller



#### Fig. 1. (a) SSD system architecture, showing controller (Ctrl) and chips. (b) Detailed view of connections between controller components and chips.

Cai+, "Error Characterization, Mitigation, and Recovery in Flash Memory Based Solid State Drives," Proc. IEEE 2017.

#### https://arxiv.org/pdf/1711.11427.pdf

### Lecture on Flash Memory & SSDs

| <section-header><section-header><text><text><text><text><text><text><text><text><text><text><text></text></text></text></text></text></text></text></text></text></text></text></section-header></section-header> |                          |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------|
| © ETH ZÜRICH HAUPTGEBÄUDE<br>Computer Architecture - Lecture 26: Flash Memory and Solid-State Drives (ETH Zürich, Fall 2020)<br>1,771 views • Dec 31, 2020                                                        | ▲ 43 ♀ 0 ↔ SHARE =+ SAVE |
| Onur Mutlu Lectures<br>19.7K subscribers                                                                                                                                                                          | ANALYTICS EDIT VIDEO     |



### Special Course on Flash Memory & SSDs



#### SAFARI

https://www.youtube.com/watch?v=dSsZA6JGcLE&list=PL5Q2soXY2Zi\_4tPsgX9m\_D7AI\_c1T86cO

### Lectures on Memory Technologies

- Computer Architecture, Fall 2020, Lecture 15
  - Emerging Memory Technologies (ETH, Fall 2020)
  - https://www.youtube.com/watch?v=AlE1rD9G\_YU&list=PL5Q2soXY2Zi9xidyIgBxUz 7xRPS-wisBN&index=28
- Computer Architecture, Fall 2020, Lecture 16a
  - Opportunities & Challenges of Emerging Memory Tech (ETH, Fall 2020)
  - https://www.youtube.com/watch?v=pmLszWGmMGQ&list=PL5Q2soXY2Zi9xidyIgBx Uz7xRPS-wisBN&index=29
- Computer Architecture, Fall 2020, Lecture 3b
  - Memory Systems: Challenges & Opportunities (ETH, Fall 2020)
  - https://www.youtube.com/watch?v=Q2FbUxD7GHs&list=PL5Q2soXY2Zi9xidyIgBxU z7xRPS-wisBN&index=6

#### https://www.youtube.com/onurmutlulectures

### A Tutorial on Memory-Centric Systems

Onur Mutlu, <u>"Memory-Centric Computing Systems"</u> Invited Tutorial at 66th International Electron Devices Meeting (IEDM), Virtual, 12 December 2020. [Slides (pptx) (pdf)] [Executive Summary Slides (pptx) (pdf)] [Tutorial Video (1 hour 51 minutes)] [Executive Summary Video (2 minutes)] [Abstract and Bio] [Related Keynote Paper from VLSI-DAT 2020] [Related Review Paper on Processing in Memory]

https://www.youtube.com/watch?v=H3sEaINPBOE

https://www.youtube.com/onurmutlulectures



https://www.youtube.com/onurmutlulectures

### Tutorial on Processing in Memory

Onur Mutlu, "Memory-Centric Computing" *Education Class at Embedded Systems Week (ESWEEK)*, Virtual, 9 October 2021. [Slides (pptx) (pdf)] [Abstract (pdf)] [Talk Video (2 hours, including Q&A)] [Invited Paper at DATE 2021] ["A Modern Primer on Processing in Memory" paper]

https://www.youtube.com/watch?v=N1Ac1ov1JOM

| Memory-Centric                                               |                                                                  |                 |     |            |
|--------------------------------------------------------------|------------------------------------------------------------------|-----------------|-----|------------|
|                                                              | Computing                                                        |                 |     |            |
|                                                              | Onur Mutlu<br>omutlu@gmail.com<br>https://people.inf.ethz.ch/omu | <u>tlu</u>      | at. | Onur Mutlu |
|                                                              | 9 October 2021<br>ESWEEK Education Class                         |                 |     |            |
| SAFARI                                                       | <b>ETH</b> zürich                                                | Carnegie Mellon |     | ŀ}         |
| I I:08 / 2                                                   | :00:10                                                           |                 | CC  | ♦ 🗉 🗆 🖸    |
| nbedded Systems Week (ESW<br>9 views • Premiered Dec 6, 2021 | EEK) 2021 Lecture - Memory-Centric Computing                     |                 |     | > SHARE    |

neropologi (je i se najbelo) — to konstruction no instali i je o najboli magni instali instali na se na sport

**Onur Mutlu Lectures** 

20.7K subscribers

https://www.youtube.com/watch?v=N1Ac1ov1JOM

ANALYTICS EDIT VIDEO

https://www.youtube.com/onurmutlulectures

## **Digital Design & Computer Arch.** Lecture 22: Memory Overview, Organization & Technology

Prof. Onur Mutlu

ETH Zürich Spring 2022 19 May 2022

#### Goal: Processing Inside Memory



# Backup Slides: Inside A DRAM Chip

#### **DRAM Module and Chip**





## **Goals in DRAM Design**

- Cost
- Latency
- Bandwidth
- Parallelism
- Power
- Energy
- Reliability
- Security

### **DRAM Chip**



#### **Sense Amplifier**



#### Sense Amplifier – Two Stable States



#### **Sense Amplifier Operation**



#### **DRAM Cell – Capacitor**





Small – Cannot drive circuits

Reading destroys the state

#### **Capacitor to Sense Amplifier**



#### **DRAM Cell Operation**



#### DRAM Subarray – Building Block for DRAM Chip



### **DRAM Bank**



## **DRAM Chip**

#### Shared internal bus



Memory channel - 8bits ←

### **DRAM Operation**



#### More on DRAM Operation: Section 2

 Vivek Seshadri and Onur Mutlu, <u>"In-DRAM Bulk Bitwise Execution Engine"</u> *Invited Book Chapter in Advances in Computers*, to appear in 2020. [Preliminary arXiv version]

See Section 2 for comprehensive DRAM Background

#### In-DRAM Bulk Bitwise Execution Engine

Vivek Seshadri Microsoft Research India visesha@microsoft.com

Onur Mutlu ETH Zürich onur.mutlu@inf.ethz.ch

#### SAFARI

https://arxiv.org/pdf/1905.09822.pdf