Flash Memory Summit 2023

We attended the Flash Memory Summit, August 8-10, 2023 in Santa Clara, California, and gave several presentations. You can find all our talks and related abstracts and papers below.

Check back here soon for the talk recordings!


Title Presentation Date/Time (Local time) Session Presented by
Benchmarking a New Paradigm: Analysis of a Real Processing-in-Memory System August 8, 2023
Tuesday
3:20 PM – 4:25 PM
SARC-103-2:
Memory and Storage
Juan Gómez Luna
Fundamentally Understanding and Solving RowHammer August 10, 2023
Thursday
8:30 AM – 9:35 AM
DRAM-301-1:
DRAM System Factors
Onur Mutlu
pLUTo: Enabling Massively Parallel Computation in
DRAM via Lookup Tables
August 10, 2023
Thursday
11:00 AM – 12:05 PM
ACAD-303-1:
DRAM Applications
Juan Gómez Luna
GenPIP: In-Memory Acceleration of Genome Analysis August 10, 2023
Thursday
11:00 AM – 12:05 PM
ACAD-303-1:
DRAM Applications
Onur Mutlu
Open Source and Easy-to-Use DRAM Testing
Infrastructure
August 10, 2023
Thursday

11:00 AM – 12:05 PM
ACAD-303-1:
DRAM Applications
Onur Mutlu
Sibyl: Data Placement in Hybrid Storage Systems Using Reinforcement Learning August 10, 2023
Thursday
12:10 PM – 1:15 PM
ACAD-304-1:
Flash Applications
Onur Mutlu
Flash-Cosmos: High-Performance and Reliable In-Flash
Bulk Bitwise Operations
August 10, 2023
Thursday
12:10 PM – 1:15 PM
ACAD-304-1:
Flash Applications
Juan Gómez Luna


Abstracts and related links (papers, code, talk recordings)

[1] Benchmarking a New Paradigm: Analysis of a Real Processing-in-Memory System

This talk provides the first comprehensive analysis of the first publicly-available real-world PIM architecture. We make two key contributions. First, we conduct an experimental characterization of the UPMEM-based PIM system using microbenchmarks to assess various architecture limits such as compute throughput and memory bandwidth, yielding new insights. Second, we present PrIM (Processing-In-Memory benchmarks), a benchmark suite of 16 workloads from different application domains (e.g., dense/sparse linear algebra, databases, data analytics, graph processing, neural networks, bioinformatics, image processing), which we identify as memory-bound. We evaluate the performance and scaling characteristics of PrIM benchmarks on the UPMEM PIM architecture, and compare their performance and energy consumption to their state-of-the-art CPU and GPU counterparts. Our extensive evaluation conducted on two real UPMEM-based PIM systems with 640 and 2,556 DPUs provides new insights about suitability of different workloads to the PIM system, programming recommendations for software designers, and suggestions and hints for hardware and architecture designers of future PIM systems.

[Talk Slides (pdf) (pptx)]

Related paper and links:

Juan Gomez-Luna, Izzat El Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, and Onur Mutlu, “Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System”, IEEE Access, 10 May 2022.
[arXiv version] [IEEE version] [PrIM Benchmarks Source Code] [Slides (pptx) (pdf)] [Long Talk Slides (pptx) (pdf)] [Short Talk Slides (pptx) (pdf)] [SAFARI Live Seminar Slides (pptx)(pdf)] [SAFARI Live Seminar Video (2 hrs 57 mins)] [Lightning Talk Video (3 minutes)] [Short Talk Video (21 minutes)] [1-hour Talk Video (58 minutes)] [ETH New for Industry Article]

[2] Fundamentally Understanding and Solving RowHammer

RowHammer is the phenomenon in which repeatedly accessing a row in a real DRAM chip causes bitflips (i.e., data corruption) in physically nearby rows. This phenomenon leads to a serious and widespread system security vulnerability, as many works since the original RowHammer paper in 2014 have shown. Recent analysis of the RowHammer phenomenon reveals that the problem is getting much worse as DRAM technology scaling continues: newer DRAM chips are fundamentally more vulnerable to RowHammer at the device and circuit levels. After reviewing various recent developments in exploiting, understanding, and mitigating RowHammer, we discuss future directions that we believe are critical for solving the RowHammer problem. We argue for two major directions to amplify research and development efforts in: 1) building a much deeper understanding of the problem and its many dimensions, in both cutting-edge DRAM chips and computing systems deployed in the field, and 2) the design and development of extremely efficient and fully-secure solutions via system-memory cooperation.

[Talk slides (pdf) (pptx)]

Related paper and links:

Onur Mutlu, Ataberk Olgun, and A. Giray Yaglikci, “Fundamentally Understanding and Solving RowHammer”, Invited Special Session Paper at the 28th Asia and South Pacific Design Automation Conference (ASP-DAC), Tokyo, Japan, January 2023.
[arXiv version] [Slides (pptx) (pdf)] [Talk Video (26 minutes)]

[3] pLUTo: Enabling Massively Parallel Computation in DRAM via Lookup Tables

Data movement between the main memory and the processor is a key contributor to execution time and energy consumption in memory-intensive applications. This data movement bottleneck can be alleviated using Processing-in-Memory (PiM). One category of PiM is Processing-using-Memory (PuM), in which computation takes place inside the memory array by exploiting intrinsic analog properties of the memory device. PuM yields high performance and energy efficiency, but existing PuM techniques support a limited range of operations. As a result, current PuM architectures cannot efficiently perform some complex operations without large increases in chip area and design complexity. We introduce pLUTo, a DRAM-based PuM that leverages the high storage density of DRAM to enable the massively parallel storing and querying of lookup tables. The key idea of pLUTo is to replace complex operations with low-cost, bulk memory reads. Our experimental results show that pLUTo outperforms optimized CPU and GPU baselines by an average of 713x and 1.2x, respectively, while simultaneously reducing energy consumption by an average of 1855x and 39.5x. pLUTo is open source at https://github.com/CMU-SAFARI/pLUTo.

[Talk Slides (pdf) (pptx)]

Related paper and links:

João Dinis Ferreira, Gabriel Falcao, Juan Gómez-Luna, Mohammed Alser, Lois Orosa, Mohammad Sadrosadati, Jeremie S. Kim, Geraldo F. Oliveira, Taha Shahroodi, Anant Nori, and Onur Mutlu, “pLUTo: Enabling Massively Parallel Computation in DRAM via Lookup Tables”, Proceedings of the 55th International Symposium on Microarchitecture (MICRO), Chicago, IL, USA, October 2022.
[Slides (pptx) (pdf)] [Longer Lecture Slides (pptx) (pdf)] [Lecture Video (26 minutes)] [arXiv version] [Source Code (Officially Artifact Evaluated with All Badges)]
Officially artifact evaluated as available, reusable and reproducible.

[4] GenPIP: In-Memory Acceleration of Genome Analysis

Nanopore sequencing is a widely-used high-throughput genome sequencing technology that can sequence long fragments of a genome into raw electrical signals at low cost. Nanopore sequencing requires two computationally-costly processing steps for accurate downstream genome analysis. The first step, basecalling, translates the raw electrical signals into nucleotide bases (i.e., A, C, G, T). The second step, read mapping, finds the correct location of a read in a reference genome. In existing genome analysis pipelines, basecalling and read mapping are executed separately. We observe in this work that such separate execution of the two most time-consuming steps inherently leads to (1) significant data movement and (2) redundant computations on the data, slowing down the genome analysis pipeline.

This paper proposes GenPIP, an in-memory genome analysis accelerator that tightly integrates basecalling and read mapping. GenPIP improves the performance of the genome analysis pipeline with two key mechanisms: (1) in-memory fine-grained collaborative execution of the major genome analysis steps in parallel; (2) a new technique for early-rejection of low-quality and unmapped reads to timely stop the execution of genome analysis for such reads, reducing inefficient computation.  Our experiments show that, for the execution of the genome analysis pipeline, GenPIP provides 41.6x (8.4x) speedup and 32.8x (20.8x) energy savings with negligible accuracy loss compared to the state-of-the-art software genome analysis tools executed on a state-of-the-art CPU (GPU). Compared to a design that combines state-of-the-art in-memory basecalling and read mapping accelerators, GenPIP provides 1.39x speedup and 1.37x energy savings.

[Talk Slides (pdf) (pptx)]

Related paper and links:

Haiyu Mao, Mohammed Alser, Mohammad Sadrosadati, Can Firtina, Akanksha Baranwal, Damla Senol Cali, Aditya Manglik, Nour Almadhoun Alserr, and Onur Mutlu, “GenPIP: In-Memory Acceleration of Genome Analysis via Tight Integration of Basecalling and Read Mapping”, Proceedings of the 55th International Symposium on Microarchitecture (MICRO), Chicago, IL, USA, October 2022.
[Slides (pptx) (pdf)] [Longer Lecture Slides (pptx) (pdf)] [Lecture Video (25 minutes)] [arXiv version] [Poster presented at RECOMB 2023 (PDF)] [Talk Recording BIO-Arch Workshop]

[5] Open Source and Easy-to-Use DRAM Testing Infrastructure

To improve DRAM in all aspects and overcome DRAM scaling challenges, it is critical to experimentally understand the operation and characteristics of real DRAM chips. As such, it is critical to develop experimental testing infrastructures to efficiently and easily test real state-of-the-art DRAM chips.

We present an overview of past open source DRAM testing infrastructure. We introduce DRAM Bender, a new FPGA-based infrastructure that enables experimental studies on DDR4 DRAM chips. DRAM Bende enables directly interfacing with a DRAM chip through its low-level interface and exposes easy-to-use C++ and Python programming interfaces. The modular design of DRAM Bender allows extending it to (i) support existing and emerging DRAM interfaces, and (ii) run on new commercial or custom FPGA boards with little effort. To demonstrate that DRAM Bender is a versatile infrastructure, we conduct three case studies, two of which lead to new observations about the DRAM RowHammer vulnerability.

[Talk Slides (pdf) (pptx)]

Related paper and links:

Ataberk Olgun, Hassan Hasan, A. Giray Yağlıkçı, Yahya Can Tuğrul, Lois Orosa, Haocong Luo, Minesh Patel, Oğuz Ergin, and Onur Mutlu. “DRAM Bender: An Extensible and Versatile FPGA-based Infrastructure to Easily Test State-of-the-art DRAM Chips.” to appear in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2023.
[arXiv version] [DRAM-Bender Source Code] [DRAM Bender Tutorial Video (43 minutes)]

[6] Sibyl: Data Placement in Hybrid Storage Systems Using Reinforcement Learning

Hybrid storage systems (HSS) use multiple storage devices to provide high storage capacity and performance. Data placement across different devices is critical to maximize the benefits of HSSs. Prior data placement techniques are rigid, which (1) limits their adaptivity to perform well for a wide range of workloads and storage device configurations, and (2) makes it difficult to extend them to different HSS configurations. Our goal is to design a new adaptive and extensible data placement technique that overcomes these issues. We introduce Sibyl, the first technique that uses reinforcement learning for data placement in HSSs. Sibyl observes workload and device features to make system-aware data placement decisions. Sibyl evaluates the long-term performance impact of its decisions and continuously optimizes its policy. We implement Sibyl on real HSS configurations and compare its performance against four heuristic- and machine-learning-based data placement techniques over many workloads. Sibyl outperforms the best previous policy by 21.6%/19.9% on a performance-/cost-oriented HSS configuration. On an HSS with three devices, Sibyl outperforms the best previous policy by up to 48.2%.

[Talk Slides (pdf) (pptx)]

Related paper and links:

Gagandeep Singh, Rakesh Nadig, Jisung Park, Rahul Bera, Nastaran Hajinazar, David Novo, Juan Gomez-Luna, Sander Stuijk, Henk Corporaal, and Onur Mutlu, “Sibyl: Adaptive and Extensible Data Placement in Hybrid Storage Systems Using Online Reinforcement Learning”, Proceedings of the 49th International Symposium on Computer Architecture(ISCA), New York, June 2022.
[arXiv version] [ACM version] [Talk Video (16 minutes)] [Slides (pptx) (pdf)] [Sibyl Source Code]

[7] Flash-Cosmos: High-Performance and Reliable In-Flash Bulk Bitwise Operations

Bulk bitwise operations are prevalent in many application domains (e.g., databases, graph processing, genome analysis, cryptography). In conventional systems, the performance and energy efficiency of bulk bitwise operations are bottlenecked by data movement between compute units and memory hierarchy. In-flash processing (i.e., processing data inside flash memory) accelerates bulk bitwise operations by reducing data movement across the memory hierarchy. We propose Flash-Cosmos, a new in-flash processing technique that significantly increases the performance, energy efficiency and reliability of bulk bitwise operations. Flash-Cosmos introduces two key mechanisms that can be easily supported in modern NAND flash chips: (i) Multi-Wordline Sensing to enable bulk bitwise operations on multiple operands with a single sensing operation, and (ii) Enhanced SLC-mode Programming to enable reliable in-flash computation. We test Flash-Cosmos’s feasibility on 160 real 3D NAND flash chips. Flash-Cosmos improves performance and energy efficiency by 3.5x/32x and 3.3x/95x, respectively, over the state-of-the-art in-flash/outside-storage processing techniques across three real-world applications.

[Talk Slides (pdf) (pptx)]

Related Paper and links:

Jisung Park, Roknoddin Azizi, Geraldo F. Oliveira, Mohammad Sadrosadati, Rakesh Nadig, David Novo, Juan Gómez-Luna, Myungsuk Kim, and Onur Mutlu, “Flash-Cosmos: In-Flash Bulk Bitwise Operations Using Inherent Computation Capability of NAND Flash Memory”, Proceedings of the 55th International Symposium on Microarchitecture (MICRO), Chicago, IL, USA, October 2022.
[Slides (pptx) (pdf)] [Longer Lecture Slides (pptx) (pdf)] [Lecture Video (44 minutes)] [arXiv version]

 

Posted in Conference, Papers, Seminar, Talks.