ISCA 2023 Tutorial: Real-world Processing-in-Memory Systems for Modern Workloads
Sunday, June 18 (held during ISCA 2023, June 17 – 21, 2023 in Orlando, Florida).
You can join our tutorial in person or online (Livestream Link).
Tutorial Website: https://events.safari.ethz.ch/isca-pim-tutorial/
ISCA 2023 Workshops and Tutorials: https://www.iscaconf.org/isca2023/program/workshops.php
Processing-in-Memory (PIM) is a computing paradigm that aims at overcoming the data movement bottleneck (i.e., the waste of execution cycles and energy resulting from the back-and-forth data movement between memory units and compute units) by making memory compute-capable.
Explored over several decades since the 1960s, PIM systems are becoming a reality with the advent of the first commercial products and prototypes.
A number of startups (e.g., UPMEM, Neuroblade) are already commercializing real PIM hardware, each with its own design approach and target applications. Several major vendors (e.g., Samsung, SK Hynix, Alibaba) have presented real PIM chip prototypes in the last two years.
Most of these architectures have in common that they place compute units near the memory arrays. But, there is more to come: Academia and Industry are actively exploring other types of PIM by, e.g., exploiting the analog operation of DRAM, SRAM, flash memory and emerging non-volatile memories.
PIM can provide large improvements in both performance and energy consumption, thereby enabling a commercially viable way of dealing with huge amounts of data that is bottlenecking our computing systems. Yet, it is critical to examine and research adoption issues of PIM using especially learnings from real PIM systems that are available today.
This tutorial focuses on the latest advances in PIM technology. We will (1) provide an introduction to PIM and taxonomy of PIM systems, (2) give an overview and a rigorous analysis of existing real-world PIM hardware, (3) conduct hand-on labs using real PIM systems, and (4) shed light on how to enable the adoption of PIM in future computing systems.
Tutorial Website, full program: https://events.safari.ethz.ch/isca-pim-tutorial/
Livestream on YouTube (Link)
Sukhan Lee, Samsung Electronics (in person)
Title: Introducing Real-world HBM-PIM Powered System for Memory-bound Applications
Since the introduction of Samsung’s groundbreaking high bandwidth memory with in-memory processing (HBM-PIM) in 2021, a number of HBM-PIM enabled systems, including a GPU cluster and FPGA, have been developed. These advancements have been presented and showcased at various conferences and journals, spanning from ISSCC 2021 to Memcon 2023. In this tutorial, we provide a comprehensive overview of HBM-PIM, covering its architectural aspects, the associated software ecosystem, and the structure of PIM-powered systems. We also present energy-efficient performance results for memory-bound applications achieved by these systems.
Sukhan Lee received a Ph.D. degree in intelligent convergence systems from Seoul National University. In 2018, he joined the Memory Division, Samsung Electronics, Hwaseong, Korea, where he has been involved in DRAM circuit design. His research interests include memory microarchitecture and neural network system hardware architecture and design.
Izzat El Hajj, American University of Beirut (online)
Title: High-throughput Sequence Alignment using Real Processing-in-Memory Systems
Sequence alignment is a memory bound computation whose performance in modern systems is limited by the memory bandwidth bottleneck. Processing-in-memory architectures alleviate this bottleneck by providing the memory with computing competencies. We present Alignment-in-Memory (AIM), a framework for high-throughput sequence alignment using processing-in-memory which we have implemented and evaluated it on UPMEM, the first publicly-available general-purpose programmable processing-in-memory system. Our evaluation shows that a real processing-in-memory system can substantially outperform server-grade multi-threaded CPU systems running at full-scale when performing sequence alignment for a variety of algorithms, read lengths, and edit distance thresholds. We hope that our findings inspire more work on creating and accelerating bioinformatics algorithms for such real processing-in-memory systems. Our code is available at: https://github.com/safaad/aim.
Izzat El Hajj is an Assistant Professor in the Department of Computer Science at the American University of Beirut. His research interests are in application acceleration and programming support for emerging parallel processors and memory technologies, with a particular interest in GPUs and processing-in-memory. Izzat received his M.S. and Ph.D. in Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign and his B.E. in Electrical and Computer Engineering at the American University of Beirut.
Christina Giannoula, University of Toronto (in person)
Title: SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems
Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures. Near-bank PIM architectures place simple cores close to DRAM banks and can yield significant performance and energy improvements in parallel applications by alleviating data access costs. Real PIM systems can provide high levels of parallelism, large aggregate memory bandwidth and low memory access latency, thereby being a good fit to accelerate the widely-used, memory-bound Sparse Matrix Vector Multiplication (SpMV) kernel.
This talk provides the first comprehensive analysis of SpMV on a real-world PIM architecture, and presents SparseP, the first SpMV library for real PIM architectures. We will discuss three key contributions that we make in our Sigmetrics 2022 paper. First, we implement a wide variety of software strategies on SpMV for a multithreaded PIM core and characterize the computational limits of a single multithreaded PIM core. Second, we design various load balancing schemes across multiple PIM cores, and two types of data partitioning techniques to execute SpMV on thousands of PIM cores: (1) 1D-partitioned kernels to perform the complete SpMV computation only using PIM cores, and (2) 2D-partitioned kernels to strive a balance between computation and data transfer costs to PIM-enabled memory. Third, we compare SpMV execution on a real-world PIM system with 2528 PIM cores to state-of-the-art CPU and GPU systems to study the performance and energy efficiency of various devices. SparseP software package provides 25 SpMV kernels for real PIM systems supporting the four most widely used compressed matrix formats, and a wide range of data types. Our extensive evaluation provides new insights and recommendations for software designers and hardware architects to efficiently accelerate SpMV on real PIM systems.
Christina Giannoula is a Postdoctoral Researcher at the University of Toronto working with Prof. Gennady Pekhimenko and the EcoSystem research group. She is also working with the SAFARI research group, which is led by Prof. Onur Mutlu. She received her Ph.D. in October 2022 from School of Electrical and Computer Engineering (ECE) at the National Technical University of Athens (NTUA) advised by Prof. Georgios Goumas, Prof. Nectarios Koziris and Prof. Onur Mutlu. Her research interests lie in the intersection of computer architecture, computer systems and high-performance computing. Specifically, her research focuses on the hardware/software co-design of emerging applications, including graph processing, pointer-chasing data structures, machine learning workloads, and sparse linear algebra, with modern computing paradigms, such as large-scale multicore systems, disaggregated memory systems and near-data processing architectures. She has several publications and awards for her research on these topics.