Our latest paper in Genome Biology is out

Our latest paper in Genome Biology reviews the developments in read alignment algorithms since 1988 until now.  We investigate how the development of read alignment algorithms is impacted by changes in sequencing technologies, such as read length, throughput, and sequencing error rates. 

Mohammed Alser, Jeremy Rotman, Kodi Taraszka, Huwenbo Shi, Pelin Icer Baykal, Harry Taegyun Yang, Victor Xue, Sergey Knyazev, Benjamin D. Singer, Brunilda Balliu, David Koslicki, Pavel Skums, Alex Zelikovsky, Can Alkan, Onur Mutlu, Serghei Mangul, “Technology dictates algorithms: Recent developments in read alignment”, Genome Biology , August 2021.
[arXiv preprint]
[Source Code and Data]

Abstract:
Aligning sequencing reads onto a reference is an essential step of the majority of genomic analysis pipelines. Computational algorithms for read alignment have evolved in accordance with technological advances, leading to today’s diverse array of alignment methods. We provide a systematic survey of algorithmic foundations and methodologies across 107 alignment methods, for both short and long reads. We provide a rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read alignment. We discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology.

Recent publicity: 

SAFARI Live Seminar: Nastaran Hajinazar 27 Oct 2021

Join us for our SAFARI Live Seminar with Nastaran Hajinazar.

SAFARI Live Seminar: Jawad Haj-Yahya 4 October 2021

Monday, October 4 at 5:30 pm Zurich time (CEST)

Security Implications of Power Management Mechanisms In Modern Processors, Current Studies and Future Trends
Jawad Haj-Yahya, Huawei Research Center Zurich

Livestream at 5:30 pm Zurich time (CEST) on YouTube link

Abstract:
Despite the failure of Dennard scaling, the slow-down in Moore’s Law, and the high power density of modern processors, power management mechanisms have enabled significant advances in modern microprocessor performance and energy efficiency. Yet, current power management architectures also pose serious security implications. This is mainly because functionality rather than security has been the main consideration in the design of power management mechanisms in commodity microprocessors.

In this seminar, we provide a detailed overview of state-of-the-art power management mechanisms used in modern microprocessors. Based on this background, we present our recently-revealed set of new vulnerabilities, called IChannels. IChannels is a set of covert channels that exploits multi-level throttling mechanisms used by the current management mechanisms in modern processors. These covert channels can be established between two execution contexts 1) on the same hardware thread, 2) across simultaneous multithreading (SMT) threads, and 3) across different physical cores.  Finally, we discuss a set of practical mitigation mechanisms to protect a system against known covert channels resulting from current management mechanisms.

We conclude by discussing future follow-up works on vulnerabilities due to power management mechanisms and possible mitigations to explore in these critical and exciting areas.

This talk is based on the following paper: 

Jawad Haj-Yahya, Jeremie S. Kim, A. Giray Yaglikci, Ivan Puddu, Lois Orosa, Juan Gomez Luna, Mohammed Alser, and Onur Mutlu, IChannels: Exploiting Current Management Mechanisms to Create Covert Channels in Modern Processors, Proceedings of the 48th International Symposium on Computer Architecture (ISCA), Virtual, June 2021.
[Slides (pptx) (pdf)]
[Short Talk Slides (pptx) (pdf)]
[Talk Video (21 minutes)]

Speaker Bio:
Jawad Haj-Yahya received his Ph.D. degree in Computer Science from Haifa University, Israel.  Jawad was a processor architect for many years at Intel. His awards and honors include the Intel Achievement Award (the highest award at Intel), for his significant contribution to Intel processors. 
Jawad worked at Nanyang Technological University (NTU), Singapore as a cybersecurity Research Scientist where he led the architecture and design of a secure-processor project based on RISC-V architecture.  He then moved to the Institute of Microelectronics (IME) at A*STAR Singapore where he was a Scientist III and worked on hardware security and an AI accelerator. Jawad next worked as a Senior Researcher in the SAFARI Research Group at ETH Zurich, where he led multiple projects on Energy-Efficient Computing and Hardware Security, before moving to his current position as a principal researcher at Huawei Research Center in Zurich. 

SAFARI Live Seminar: Christina Giannoula 27 September 2021

Monday, September 27 at 5:30 pm Zurich time (CEST)

Efficient Synchronization Support for Near-Data-Processing Architectures
Christina Giannoula, National Technical University of Athens

Livestream at 5:30 pm Zurich time (CEST) on YouTube: Link

Abstract:

Recent advances in 3D-stacked memories have renewed interest in Near-Data Processing (NDP). NDP architectures perform computation close to where the application data resides, and constitute a promising way to alleviate data movement costs. These architectures can provide significant performance and energy benefits to parallel applications. Typical NDP architectures support several NDP units, each including multiple simple cores placed close to memory. To fully leverage the benefits of NDP and achieve high performance for parallel workloads, efficient synchronization among the NDP cores of a system is necessary. However, supporting synchronization in many NDP systems is challenging due to three architectural characteristics: (i) most NDP architectures lack shared caches that can enable low-cost communication and synchronization among NDP cores of the system, (ii) hardware cache coherence protocols are typically not supported in NDP systems due to high area and traffic overheads, (iii) NDP systems are non-uniform, distributed architectures, in which inter-unit communication is more expensive (both in performance and energy) than intra-unit communication.

In this  seminar, we comprehensively examine the synchronization problem in NDP systems, and propose SynCron, an end-to-end synchronization solution for NDP systems. SynCron is designed to achieve the goals of performance, cost, programming ease, and generality to cover a wide range of synchronization primitives through four key techniques. First, SynCron adds low-cost hardware support near memory for synchronization acceleration. Second, SynCron includes a specialized cache memory structure to avoid memory accesses for synchronization and minimize latency overheads. Third, it implements a hierarchical message-passing communication protocol to minimize expensive communication across NDP units of the system. Fourth, SynCron integrates a hardware-only overflow management scheme to avoid performance degradation when hardware resources for synchronization tracking are exceeded.

Our work is the first one to analyze synchronization primitives in NDP systems using a variety of parallel workloads, covering various contention scenarios, and evaluating various NDP configurations. We demonstrate that SynCron achieves significant performance and energy improvements both under high-contention and low-contention scenarios, while it also has low hardware area and power overheads. We conclude that SynCron is an efficient synchronization mechanism for NDP systems, and hope that this work encourages further research on the synchronization problem in heterogeneous systems, including NDP systems.

Bio:

Christina Giannoula is a Ph.D. student in the School of Electrical and Computer Engineering at the National Technical University of Athens (NTUA). She is working in the Computing Systems Laboratory, and is an affiliated Ph.D. researcher in the SAFARI research group at ETH Zürich, which is led by Prof. Onur Mutlu. She received a 5-year Diploma degree (Masters equivalent) in Electrical and Computer Engineering from NTUA in 2016, being awarded with several distinctions including the ‘Paris Kanellakis’ NTUA award, and graduating in the top 2% of her class. Since 2017, she has been working toward a Ph.D. degree at NTUA, and in 2019 she was a visiting PhD researcher in the SAFARI research group at ETH Zürich advised by Prof. Onur Mutlu and mentored by Prof. Nandita Vijaykumar. Her research interests lie in the intersection of computer architecture and high-performance computing. Specifically, her research focuses on the hardware/software co-design of emerging applications, including graph processing, pointer-chasing data structures, machine learning workloads, and sparse linear algebra, with modern computing paradigms, such as large-scale multicore systems and near-data processing architectures. She has several publications and awards for her research on these topics.


Christina Giannoula, Nandita Vijaykumar, Nikela Papadopoulou, Vasileios Karakostas, Ivan Fernandez, Juan Gómez-Luna, Lois Orosa, Nectarios Koziris, Georgios Goumas, and Onur Mutlu, “SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures”, Proceedings of the 27th International Symposium on High-Performance Computer Architecture (HPCA), Virtual, February-March 2021.
[Slides (pptx) (pdf)]
[Short Talk Slides (pptx) (pdf)]
[Talk Video (21 minutes)]
[Short Talk Video (7 minutes)]

SAFARI Live Seminar: Minesh Patel 21 September 2021

Join us for our SAFARI Live Seminar with Minesh Patel.
Tuesday, September 21 at 5:00 pm Zurich time (CEST)

Enabling Effective Error Mitigation in Memory Chips That Use On-Die Error-Correcting Codes
Minesh Patel, SAFARI Research Group, ETH Zurich

Livestream at 5:00 pm Zurich time (CEST) on YouTube: Link

Abstract: 

Improvements in main memory storage density are primarily driven by process technology shrinkage (i.e., technology scaling), which negatively impacts reliability by exacerbating various circuit-level error mechanisms. To offset growing error rates, both memory manufacturers and consumers develop and incorporate error-mitigation mechanisms that improve manufacturing yield and allow system designers to meet desired reliability targets. Developing effective error mitigation techniques requires understanding the errors’ characteristics (e.g., worst-case behavior, statistical properties). Unfortunately, we observe that proprietary on-die Error-Correcting Codes (ECC) used in modern memory chips introduces new challenges to efficient error mitigation by obfuscating CPU-visible error characteristics in an unpredictable, ECC-dependent manner.

In this seminar, we experimentally study memory errors, examine how on-die ECC obfuscates their statistical characteristics, and develop new testing techniques to overcome the obfuscation through four key steps. First, we experimentally study DRAM data-retention error characteristics to understand the challenges inherent in understanding and mitigating technology-scaling-related errors. Second, we study how on-die ECC affects these characteristics to develop Error Inference (EIN), a statistical inference methodology for inferring details of the on-die ECC mechanism and the pre-correction errors. Third, we examine the on-die ECC mechanism in detail to understand exactly how on-die ECC obfuscates raw bit error patterns. Using this knowledge, we introduce Bit Exact ECC Recovery (BEER), a new testing methodology that exploits uncorrectable error patterns to (1) reverse-engineer the exact on-die ECC implementation used in a given chip and (2) identify the bit-exact locations of pre-correction errors that correspond to a given set of observed post-correction errors. Fourth, we study how on-die ECC impacts error profiling and show that on-die ECC introduces three key challenges that impact profiling practicality and effectiveness. To overcome these challenges, we introduce Hybrid Active-Reactive Profiling (HARP), a new profiling strategy that uses simple modifications to the on-die ECC mechanism to quickly and effectively identify bits at risk of error. Finally, we conclude by discussing the need for transparency in DRAM reliability characteristics in order to enable DRAM consumers to better understand and adapt commodity DRAM chips to their system-specific needs. In general, we hope and believe that these new testing techniques will enable scientists and engineers to make informed decisions towards building smarter systems.

Bio:

Minesh Patel is a Ph.D. candidate at ETH Zurich working with Prof. Onur Mutlu. He received B.S. degrees in ECE and Physics from the University of Texas, in 2015. Since then, he has been working toward his Ph.D. degree with a focus on memory systems reliability. His current research interests broadly span computer systems and architecture topics, including support for speculative and/or unreliable systems, performance modeling and analysis, and application characterization and optimization.


This talk is based on four papers we published respectively at ISCA 2017, DSN 2019, MICRO 2020 and MICRO 2021 (to appear). The links to available individual papers and slides are below.

Minesh Patel, Jeremie S. Kim, and Onur Mutlu, “The Reach Profiler (REAPER): Enabling the Mitigation of DRAM Retention Failures via Profiling at Aggressive Conditions”, Proceedings of the 44th International Symposium on Computer Architecture (ISCA), Toronto, Canada, June 2017.
[Slides (pptx) (pdf)]
[Lightning Session Slides (pptx) (pdf)]

Minesh Patel, Jeremie S. Kim, Hasan Hassan, and Onur Mutlu, “Understanding and Modeling On-Die Error Correction in Modern DRAM: An Experimental Study Using Real Devices”, Proceedings of the 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Portland, OR, USA, June 2019.
[Slides (pptx) (pdf)]
[Talk Video (26 minutes)]
[Full Talk Lecture (29 minutes)]
[Source Code for EINSim, the Error Inference Simulator]
Best paper award.

Minesh Patel, Jeremie S. Kim, Taha Shahroodi, Hasan Hassan, and Onur Mutlu, “Bit-Exact ECC Recovery (BEER): Determining DRAM On-Die ECC Functions by Exploiting DRAM Data Retention Characteristics”, Proceedings of the 53rd International Symposium on Microarchitecture (MICRO), Virtual, October 2020.
[Slides (pptx) (pdf)]
[Short Talk Slides (pptx) (pdf)]
[Lightning Talk Slides (pptx) (pdf)]
[Lecture Slides (pptx) (pdf)]
[Talk Video (15 minutes)]
[Short Talk Video (5.5 minutes)]
[Lightning Talk Video (1.5 minutes)]
[Lecture Video (52.5 minutes)]
[BEER Source Code]
Best paper award.

Minesh Patel, Geraldo Francisco de Oliveira Jr., Onur Mutlu, “HARP: Practically and Effectively Identifying Uncorrectable Errors in Main Memory Chips That Use On-Die ECC”, Proceedings of the 54rd International Symposium on Microarchitecture (MICRO), Virtual, October 2021.

SAFARI Live Seminar: Ataberk Olgun 15 September 2021

Join us for our next SAFARI Live Seminar with Ataberk Olgun.

Wednesday, September 15 at 5:00 pm Zurich time (CEST)

QUAC-TRNG: High-Throughput True Random Number Generation Using Quadruple Row Activation in Commodity DRAM Chips
Ataberk Olgun, TOBB University of Economics and Technology & SAFARI Research Group, ETH Zurich

Livestream at 5:00 pm Zurich time (CEST) on YouTube: Link

Abstract:

True random number generators (TRNG) sample random physical processes to create large amounts of random numbers for various use cases, including security-critical cryptographic primitives, scientific simulations, machine learning applications, and even recreational entertainment. Unfortunately, not every computing system is equipped with dedicated TRNG hardware, limiting the application space and security guarantees for such systems. To open the application space and enable security guarantees for the overwhelming majority of computing systems that do not necessarily have dedicated TRNG hardware, we develop QUAC-TRNG.

QUAC-TRNG exploits the new observation that a carefully-engineered sequence of DRAM commands activates four consecutive DRAM rows in rapid succession. This QUadruple ACtivation (QUAC) causes the bitline sense amplifiers to non-deterministically converge to random values when we activate four rows that store conflicting data because the net deviation in bitline voltage fails to meet reliable sensing margins.

We experimentally demonstrate that QUAC reliably generates random values across 136 commodity DDR4 DRAM chips from one major DRAM manufacturer. We describe how to develop an effective TRNG (QUAC-TRNG) based on QUAC. We evaluate the quality of our TRNG using NIST STS and find that QUAC-TRNG successfully passes each test. Our experimental evaluations show that QUAC-TRNG generates true random numbers with a throughput of 3.44 Gb/s (per DRAM channel), outperforming the state-of-the-art DRAM-based TRNG by 15.08x and 1.41x for basic and throughput-optimized versions, respectively. We show that QUAC-TRNG utilizes DRAM bandwidth better than the state-of-the-art, achieving up to 2.03x the throughput of a throughput-optimized baseline when scaling bus frequencies to 12 GT/s.

Bio:

Ataberk Olgun received his BSc degree in Computer Engineering from TOBB University of Economics and Technology, where he is currently studying for a Masters Degree. He joined SAFARI Research Group as an undergraduate intern in 2019. Since then he has worked on many projects on DRAM, Security, and Processing-in-Memory. 

===================

Ataberk Olgun, Minesh Patel, A. Giray Yaglikci, Haocong Luo, Jeremie S. Kim, F. Nisa Bostanci, Nandita Vijaykumar, Oguz Ergin, and Onur Mutlu, “QUAC-TRNG: High-Throughput True Random Number Generation Using Quadruple Row Activation in Commodity DRAM Chips” Proceedings of the 48th International Symposium on Computer Architecture (ISCA), Virtual, June 2021.
[Long Talk Video (25 minutes)]
[Long Talk Slides (pptx) (pdf)]
[Short Talk Video (7 minutes)]
[Short Talk Slides (pptx) (pdf)]
[Conference Talk and Q&A (15 minutes)]

===================

Related talks & lectures: 

===================

D-RaNGe: True Random Number Generation with Commodity DRAM https://www.youtube.com/watch?v=Y3hPv1I5f8Y&list=PL5Q2soXY2Zi-DyoI3HbqcdtUm9YWRR_z-&index=16 

DRAM Latency PUFs (Physical Unclonable Functions) https://www.youtube.com/watch?v=7gqnrTZpjxE&list=PL5Q2soXY2Zi-DyoI3HbqcdtUm9YWRR_z-&index=15 

CODIC: A Low-Cost Substrate for Enabling Custom In-DRAM Functionalities and Optimizations https://www.youtube.com/watch?v=ofBJnFQA6ic&list=PL5Q2soXY2Zi8_VVChACnON4sfh2bJ5IrD&index=133

Computer Architecture – Lecture 10: Low-Latency Memory (ETH Zürich, Fall 2020) https://www.youtube.com/watch?v=vQd1YgOH1Mw&list=PL5Q2soXY2Zi9xidyIgBxUz7xRPS-wisBN&index=19

Computer Architecture – Lecture 11a: Memory Controllers (ETH Zürich, Fall 2020) https://www.youtube.com/watch?v=TeG773OgiMQ&list=PL5Q2soXY2Zi9xidyIgBxUz7xRPS-wisBN&index=20

Public Lecture, Croucher ASI Workshop August 25

Join us next week at the Croucher Advanced Study Institute workshop on “Frontiers of AI Accelerators: Technologies, Circuits and Applications”, August 24 – 27, 2021, for Onur’s public lecture on “Intelligent Architectures for Intelligent Systems”

Talk time:  August 25, 11:15 Zurich time (CEST)

Registration & Program: accessasi.hkust.edu.hk

Abstract: https://accessasi.hkust.edu.hk/lecture-4

Computing is bottlenecked by data. Large amounts of data overwhelm storage capability, communication capability, and computation capability of the modern machines we design today. As a result, many key applications’ performance, efficiency and scalability are bottlenecked by data movement. We describe three major shortcomings of modern architectures in terms of 1) dealing with data, 2) taking advantage of the vast amounts of data, and 3) exploiting different semantic properties of application data. We argue that an intelligent architecture should be designed to handle data well. We show that handling data well requires designing system architectures based on three key principles: 1) data-centric, 2) data-driven, 3) data-aware. We give several examples for how to exploit each of these principles to design a much more efficient and high-performance computing system. We will especially discuss recent research that aims to fundamentally reduce memory latency and energy, and practically enable computation close to data, with at least two promising novel directions: 1) performing computation in memory by exploiting the analog operational properties of memory, with low-cost changes, 2) exploiting the logic layer in 3D-stacked memory technology in various ways to accelerate important data-intensive applications. We discuss how to enable adoption of such fundamentally more intelligent architectures, which we believe are key to efficiency, performance, and sustainability. We conclude with some guiding principles for future computing architecture and system designs.

Reference papers:

  1. “Intelligent Architectures for Intelligent Computing Systems”
  2. “A Modern Primer on Processing in Memory”

Reference video:

  1. IEDM 2020 Tutorial: Memory-Centric Computing Systems, Onur Mutlu, 12 December 2020 

SAFARI Live Seminar: Jawad Haj-Yahya 16 August 2021

Join us for our next SAFARI Live Seminar with Jawad Haj-Yahya.

Monday, August 16 at 5:30 pm Zurich time (CEST)

Power Management Mechanisms in Modern Microprocessors and Their Security Implications
Jawad Haj-Yahya, Principal researcher at Huawei Research Center in Zurich

Livestream at 5:30 pm Zurich time (CEST) on YouTube:
https://www.youtube.com/watch?v=uSuRWYa3k2g

Abstract:
Billions of new devices (e.g., sensors, wearables, smartphones, tablets, laptops, servers) are being deployed each year with new services and features that are driving a higher demand for high performance microprocessors, which often have high power consumption. Despite the failure of Dennard scaling, the slow-down in Moore’s Law, and the high power-density of modern processors, power management mechanisms have enabled significant advances in modern microprocessor performance and energy efficiency. Yet, current power management architectures also pose serious security implications. This is mainly because functionality rather than security has been the main consideration in the design of power management mechanisms in commodity microprocessors.

In this seminar, we provide a detailed overview of the state-of-the-art in power management mechanisms, power delivery networks (PDNs), and security vulnerabilities of current management mechanisms in modern microprocessors. We first present, analyze and enhance the advanced power management mechanisms of modern microprocessors to improve energy and performance in active and idle power states. Second, we present the design and tradeoffs of modern power delivery networks, evaluate their implications on performance and energy-efficiency, and describe new techniques to mitigate PDN inefficiencies. We will especially introduce the idea and benefits of hybrid power delivery networks. Third, we present some of the security vulnerabilities that exist in current management mechanisms of modern processors and propose mitigation techniques. We conclude that power management, power delivery and resulting security implications are critical and exciting areas to research to make modern systems both more energy-efficient and higher performance.

Bio:
Jawad Haj-Yahya received his Ph.D. degree in Computer Science from Haifa University, Israel. Jawad was a processor architect for many years at Intel. His awards and honors include the Intel Achievement Award (the highest award at Intel), for his significant contribution to Intel processors. Jawad worked at Nanyang Technological University (NTU), Singapore as a cybersecurity Research Scientist where he led the architecture and design of a secure-processor project based on RISC-V architecture. He then moved to the Institute of Microelectronics (IME) at A*STAR Singapore where he was a Scientist III and worked on hardware security and an AI accelerator. Jawad next worked as a Senior Researcher in the SAFARI Research Group at ETH Zurich, where he led multiple projects on Energy-Efficient Computing and Hardware Security, before moving to his current position as principal researcher at Huawei Research Center in Zurich.


This talk is based on four papers we published respectively at HPCA 2020, ISCA 2020, MICRO 2020 and ISCA 2021. The links to individual papers and slides are below.

Congratulations to Damla Senol Cali on successfully defending her PhD!

Thesis:  Accelerating Genome Sequence Analysis via Efficient Hardware/Algorithm Co-Design

Abstract:

Genome sequence analysis plays a pivotal role in enabling many medical and scientific advancements in personalized medicine, outbreak tracing, the understanding of evolution, and forensics. Modern genome sequencing machines can rapidly generate massive amounts of genomics data at low cost. However, the analysis of genome sequencing data is currently bottlenecked by the computational power and memory bandwidth limitations of existing systems, as many of the steps in genome sequence analysis must process a large amount of data. Our goals in this dissertation are to (1) characterize the real-system behavior of the genome sequence analysis pipeline and its associated tools, (2) expose the bottlenecks and tradeoffs of the pipeline and tools, and (3) co-design fast and efficient algorithms along with scalable and energy-efficient customized hardware accelerators for the key pipeline bottlenecks to enable faster genome sequence analysis.

First, we comprehensively analyze the tools in the genome assembly pipeline for long reads in multiple dimensions (i.e., accuracy, performance, memory usage, and scalability), uncovering bottlenecks and tradeoffs that different combinations of tools and different underlying systems lead to. We show that we need high-performance, memory-efficient, low-power, and scalable designs for genome sequence analysis in order to exploit the advantages that genome sequencing provides. Second, we propose GenASM, an acceleration framework that builds upon bitvector-based approximate string matching (ASM) to accelerate multiple steps of the genome sequence analysis pipeline. We co-design our highly-parallel, scalable and memory-efficient algorithms with low-power and area-efficient hardware accelerators. We evaluate GenASM for three different use cases of ASM in genome sequence analysis and show that GenASM is significantly faster and more power- and area-efficient than state-of-the-art software and hardware tools for each of these use cases. Third, we implement an FPGA-based prototype for GenASM, where state-of-the-art 3D-stacked memory (HBM2) offers high memory bandwidth and FPGA resources offer high parallelism by instantiating multiple copies of the GenASM accelerators. Fourth, we propose GenGraph, the first hardware acceleration framework for sequence-to-graph mapping. Instead of representing the reference genome as a single linear DNA sequence, genome graphs provide a better representation of the diversity among populations by encoding variations across individuals in a graph data structure, avoiding a bias towards any one reference. GenGraph enables the efficient mapping of a sequenced genome to a graph-based reference, providing more comprehensive and accurate genome sequence analysis.

Overall, we demonstrate that genome sequence analysis can be accelerated by co- designing scalable and energy-efficient customized accelerators along with efficient algorithms for the key steps of genome sequence analysis.

Examining Committee

Onur Mutlu, Co-advisor, CMU-ECE, ETH Zurich
Saugata Ghose, Co-advisor, CMU-ECE, University of Illinois Urbana-Champaign
James C. Hoe, CMU-ECE
Can Alkan, Bilkent University

More on Damla’s publications, talks and research interests can be found on her website.

 

 

 

 

 

 

 

SAFARI Live Seminar: Geraldo F. Oliveira 22 July 2021

We are pleased to have Geraldo F. Oliveira give a 3rd talk in our SAFARI Live Seminars!

Thursday, July 22 at 5:00 pm Zurich time (CEST)

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks
Geraldo F. Oliveira, SAFARI Research Group, D-ITET, ETH Zurich

Livestream at 5:00 pm Zurich time (CEST) on YouTube:
https://www.youtube.com/watch?v=GWideVyo0nM

Paper: https://arxiv.org/pdf/2105.03725.pdf
Repository: https://github.com/CMU-SAFARI/DAMOV

Abstract:
Data movement between the CPU and main memory is a first-order obstacle against improving performance, scalability, and energy efficiency in modern systems. Computer systems employ different techniques to reduce overheads caused by data movement, from traditional processor-centric mechanisms (e.g., deep multi-level cache hierarchies, aggressive hardware prefetchers) to emerging paradigms, such as near-data processing (NDP), where computation is moved closer to or inside memory. However, there is a lack of understanding about (1) the key metrics that identify different sources of data movement bottlenecks and (2) how different data movement bottlenecks can be alleviated by traditional and emerging data movement mitigation mechanisms.

In this work, we make two key contributions. First, we propose the first methodology to characterize data-intensive workloads based on the source of their data movement bottlenecks. This methodology is driven by insights obtained from a large-scale experimental characterization of 345 applications from 37 different benchmark suites and an evaluation of the performance of memory-bound functions from these applications with three data-movement mitigation mechanisms. Second, we release DAMOV, the first open-source benchmark suite for main memory data movement-related studies, based on our systematic characterization methodology. This suite consists of 144 functions representing different sources of data movement bottlenecks and can be used as a baseline benchmark set for future data-movement mitigation research. We show how DAMOV can aid the study of open research problems for NDP architectures via four case studies.

Our work provides new insights about the suitability of different classes of data movement bottlenecks to the different data movement mitigation mechanisms, including analyses on how the different data movement mitigation mechanisms impact performance and energy for memory bottlenecked applications. All our bottleneck analysis toolchains and DAMOV benchmarks are publicly and freely available (https://github.com/CMU-SAFARI/DAMOV). We believe and hope that our work can enable further studies and research on hardware and software solutions for data movement bottlenecks, including near-data processing.

Speaker Bio:
Geraldo F. Oliveira is a Ph.D. student in the SAFARI Research Group @ETH Zurich. He received a B.S. degree in computer science from the Federal University of Viçosa, Viçosa, Brazil, in 2015, and an M.S. degree in computer science from the Federal University of Rio Grande do Sul, Porto Alegre, Brazil, in 2017. Since 2018, he has been working toward a Ph.D. degree with Onur Mutlu at ETH Zürich, Zürich, Switzerland. His current research interests include system support for processing-in-memory and processing-using-memory architectures, data-centric accelerators for emerging applications, approximate computing, and emerging memory systems for consumer devices. He has several publications on these topics.