Our latest paper in Genome Biology is out

Our latest paper in Genome Biology reviews the developments in read alignment algorithms since 1988 until now.  We investigate how the development of read alignment algorithms is impacted by changes in sequencing technologies, such as read length, throughput, and sequencing error rates. 

Mohammed Alser, Jeremy Rotman, Kodi Taraszka, Huwenbo Shi, Pelin Icer Baykal, Harry Taegyun Yang, Victor Xue, Sergey Knyazev, Benjamin D. Singer, Brunilda Balliu, David Koslicki, Pavel Skums, Alex Zelikovsky, Can Alkan, Onur Mutlu, Serghei Mangul, “Technology dictates algorithms: Recent developments in read alignment”, Genome Biology , August 2021.
[arXiv preprint]
[Source Code and Data]

Abstract:
Aligning sequencing reads onto a reference is an essential step of the majority of genomic analysis pipelines. Computational algorithms for read alignment have evolved in accordance with technological advances, leading to today’s diverse array of alignment methods. We provide a systematic survey of algorithmic foundations and methodologies across 107 alignment methods, for both short and long reads. We provide a rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read alignment. We discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology.

Recent publicity: 

SAFARI Live Seminar: Geraldo F. Oliveira 22 July 2021

We are pleased to have Geraldo F. Oliveira give a 3rd talk in our SAFARI Live Seminars!

Thursday, July 22 at 5:00 pm Zurich time (CEST)

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks
Geraldo F. Oliveira, SAFARI Research Group, D-ITET, ETH Zurich

Livestream at 5:00 pm Zurich time (CEST) on YouTube:
https://www.youtube.com/watch?v=GWideVyo0nM

Paper: https://arxiv.org/pdf/2105.03725.pdf
Repository: https://github.com/CMU-SAFARI/DAMOV

Abstract:
Data movement between the CPU and main memory is a first-order obstacle against improving performance, scalability, and energy efficiency in modern systems. Computer systems employ different techniques to reduce overheads caused by data movement, from traditional processor-centric mechanisms (e.g., deep multi-level cache hierarchies, aggressive hardware prefetchers) to emerging paradigms, such as near-data processing (NDP), where computation is moved closer to or inside memory. However, there is a lack of understanding about (1) the key metrics that identify different sources of data movement bottlenecks and (2) how different data movement bottlenecks can be alleviated by traditional and emerging data movement mitigation mechanisms.

In this work, we make two key contributions. First, we propose the first methodology to characterize data-intensive workloads based on the source of their data movement bottlenecks. This methodology is driven by insights obtained from a large-scale experimental characterization of 345 applications from 37 different benchmark suites and an evaluation of the performance of memory-bound functions from these applications with three data-movement mitigation mechanisms. Second, we release DAMOV, the first open-source benchmark suite for main memory data movement-related studies, based on our systematic characterization methodology. This suite consists of 144 functions representing different sources of data movement bottlenecks and can be used as a baseline benchmark set for future data-movement mitigation research. We show how DAMOV can aid the study of open research problems for NDP architectures via four case studies.

Our work provides new insights about the suitability of different classes of data movement bottlenecks to the different data movement mitigation mechanisms, including analyses on how the different data movement mitigation mechanisms impact performance and energy for memory bottlenecked applications. All our bottleneck analysis toolchains and DAMOV benchmarks are publicly and freely available (https://github.com/CMU-SAFARI/DAMOV). We believe and hope that our work can enable further studies and research on hardware and software solutions for data movement bottlenecks, including near-data processing.

Speaker Bio:
Geraldo F. Oliveira is a Ph.D. student in the SAFARI Research Group @ETH Zurich. He received a B.S. degree in computer science from the Federal University of Viçosa, Viçosa, Brazil, in 2015, and an M.S. degree in computer science from the Federal University of Rio Grande do Sul, Porto Alegre, Brazil, in 2017. Since 2018, he has been working toward a Ph.D. degree with Onur Mutlu at ETH Zürich, Zürich, Switzerland. His current research interests include system support for processing-in-memory and processing-using-memory architectures, data-centric accelerators for emerging applications, approximate computing, and emerging memory systems for consumer devices. He has several publications on these topics.

SAFARI Live Seminar: Juan Gomez-Luna 12 July 2021

We are excited to kick off our summer SAFARI Live Seminars with our first talk next week!

Monday, July 12 at 5:00 pm Zurich time (CEST)

Understanding a Modern Processing-in-Memory Architecture: Benchmarking and Experimental Characterization
Dr. Juan Gomez-Luna, SAFARI Research Group, D-ITET, ETH Zurich

Livestream at 5:00 pm Zurich time (CEST) on YouTube:
https://www.youtube.com/watch?v=D8Hjy2iU9l4

Paper: https://arxiv.org/pdf/2105.03814.pdf
Repository: https://github.com/CMU-SAFARI/prim-benchmarks
Talk slides (pptx) (pdf)

Abstract:
Many modern workloads, such as neural networks, databases, and graph processing, are fundamentally memory-bound. For such workloads, the data movement between main memory and CPU cores imposes a significant overhead in terms of both latency and energy. A major reason is that this communication happens through a narrow bus with high latency and limited bandwidth, and the low data reuse in memory-bound workloads is insufficient to amortize the cost of main memory access. Fundamentally addressing this data movement bottleneck requires a paradigm where the memory system assumes an active role in computing by integrating processing capabilities. This paradigm is known as processing-in-memory (PIM). Recent research explores different forms of PIM architectures, motivated by the emergence of new 3D-stacked memory technologies that integrate memory with a logic layer where processing elements can be easily placed. Past works evaluate these architectures in simulation or, at best, with simplified hardware prototypes. In contrast, the UPMEM company has designed and manufactured the first publicly-available real-world PIM architecture. The UPMEM PIM architecture combines traditional DRAM memory arrays with generalpurpose in-order cores, called DRAM Processing Units (DPUs), integrated in the same chip.

This paper provides the first comprehensive analysis of the first publicly-available real-world PIM architecture. We make two key contributions. First, we conduct an experimental characterization of the UPMEM-based PIM system using microbenchmarks to assess various architecture limits such as compute throughput and memory bandwidth, yielding new insights. Second, we present PrIM (Processing-In-Memory benchmarks), a benchmark suite of 16 workloads from different application domains (e.g., dense/sparse linear algebra, databases, data analytics, graph processing, neural networks, bioinformatics, image processing), which we identify as memory-bound. We evaluate the performance and scaling characteristics of PrIM benchmarks on the UPMEM PIM architecture, and compare their performance and energy consumption to their state-of-the-art CPU and GPU counterparts. Our extensive evaluation conducted on two real UPMEM-based PIM systems with 640 and 2,556 DPUs provides new insights about suitability of different workloads to the PIM system, programming recommendations for software designers, and suggestions and hints for hardware and architecture designers of future PIM systems.

Speaker Bio:
Juan Gomez-Luna is a senior researcher and lecturer in the SAFARI Research Group @ETH Zurich. He received the BS and MS degrees in Telecommunication Engineering from the University of Sevilla, Spain, in 2001, and the PhD degree in Computer Science from the University of Cordoba, Spain, in 2012. Between 2005 and 2017, he was a faculty member of the University of Cordoba. His research interests focus on processing-in-memory, memory systems, heterogeneous computing, and hardware and software acceleration of medical imaging and bioinformatics. He is the lead author of PrIM (https://github.com/CMU-SAFARI/prim-benchmarks), the first publicly-available benchmark suite for a real-world processing-in-memory architecture, and Chai (https://github.com/chai-benchmarks/chai), a benchmark suite for heterogeneous systems with CPU/GPU/FPGA.


Juan Gomez-Luna, Izzat El Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, and Onur Mutlu,
“Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture”
Preprint in arXiv, 9 May 2021.
[arXiv preprint]
[PrIM Benchmarks Source Code]
[Slides (pptx) (pdf)]
[Long Talk Slides (pptx) (pdf)]
[Short Talk Slides (pptx) (pdf)]
[SAFARI Live Seminar Slides (pptx) (pdf)]
[SAFARI Live Seminar Video (2 hrs 57 mins)]
[Lightning Talk Video (3 minutes)]

 

Join us at ASPLOS 2021 online

We are at ASPLOS 2021 this week and next.  Join us for our talks and learn more about our recent works:

Session 2: Memory Systems, Monday, April 19 4:00 PM Pacific Tiime:

Irina Calciu, M. Talha Imran, Ivan Puddu, Sanidhya Kashyap, Hasan Al Maruf, Onur Mutlu, and Aasheesh Kolli,
“Rethinking Software Runtimes for Disaggregated Memory”
Proceedings of the 26th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Virtual, March-April 2021.
[2-page Extended Abstract]
[Source Code (Officially Artifact Evaluated)]


Session 8: Tools & Frameworks, Tuesday, April 20 4:00 PM Pacific Time: 

Nastaran Hajinazar, Geraldo F. Oliveira, Sven Gregorio, Joao Dinis Ferreira, Nika Mansouri Ghiasi, Minesh Patel, Mohammed Alser, Saugata Ghose, Juan Gomez-Luna, and Onur Mutlu,
“SIMDRAM: An End-to-End Framework for Bit-Serial SIMD Computing in DRAM”
Proceedings of the 26th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Virtual, March-April 2021.
[2-page Extended Abstract]
[Short Talk Slides (pptx) (pdf)]
[Talk Slides (pptx) (pdf)]
[Short Talk Video (5 mins)]
[Full Talk Video (27 mins)]


Session 17: Solid State Drives, Thursday, April 22 7:00 AM Pacific Time:

Jisung Park, Myungsuk Kim, Myoungjun Chun, Lois Orosa, Jihong Kim, and Onur Mutlu,
“Reducing Solid-State Drive Read Latency by Optimizing Read-Retry”
Proceedings of the 26th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Virtual, March-April 2021.
[2-page Extended Abstract]
[Short Talk Slides (pptx) (pdf)]
[Full Talk Slides (pptx) (pdf)]
[Short Talk Video (5 mins)]
[Full Talk Video (19 mins)]

 

ASPLOS Program:  https://asplos-conference.org/program/

TRRespass wins the Pwnie Award for Most Innovative Research

TRRespass won the Pwnie Award for “Most Innovative Research” at the annual BlackHat Europe conference this week.  Pwnies are the most prestigious industrial awards in the security community.   Congratulations to the authors: Pietro Frigo, Emanuele Vannacci, Hasan Hassan, Victor van der Veen, Onur Mutlu, Cristiano Giuffrida, Herbert Bos, and Kaveh Razavi on this prestigious prize!

We recently interviewed Hasan Hassan about his contribution to TRRespass.  Here’s what he had to say:

You were a co-author on TRRespass, which recently won a Best Paper Award at IEEE S&P. What is the significance of this paper?

Shortly after the discovery of the RowHammer vulnerability of DRAM, DRAM vendors announced RowHammer-free DRAM devices that implement in-DRAM solutions to protect against RowHammer. However, in TRRespass, we find that such solutions, commonly referred to as Target Row Refresh (TRR), do not effectively protect against RowHammer attacks when many rows are hammered at the same time. We show that the RowHammer vulnerability is not only still intact on the current DDR4 devices, but it has also become worse due to technology node scaling.

How was your experience in collaborating with the Systems and Network Security Group at VU Amsterdam on this work?

I am glad that our combined effort with the Systems and Network Security Group at VU Amsterdam won us the Best Paper Award at IEEE S&P. It has been a great experience for me to collaborate with experts in hardware security. I hope there will be more such collaborations that result in impactful research.

Which tools did you use in this work?

I think SoftMC, our FPGA-based DRAM testing infrastructure, was one of the key enablers of this research. We used SoftMC to interface with DDR4 DRAM chips in a much more flexible way than anyone can do using commodity desktop and mobile systems. Specifically, we used SoftMC to communicate with DRAM chips using low-level DDR4 commands as opposed to using load/store instructions provided by typical instruction set architectures. In a way, SoftMC lets us be the memory controller and provides the flexibility of issuing any DDR4 command at any time, which is not possible with commodity systems.

An earlier version of SoftMC that supports DDR3 devices is open-source and can be accessed here. In 2017, we published a paper that describes the design of SoftMC in detail.

I am also involved in maintaining Ramulator, a cycle-accurate DRAM simulator that we describe in this paper, and Scarab, which is a cycle-accurate simulator for state-of-the-art multicore CPUs.


Pietro Frigo, Emanuele Vannacci, Hasan Hassan, Victor van der Veen, Onur Mutlu, Cristiano Giuffrida, Herbert Bos, and Kaveh Razavi, “TRRespass: Exploiting the Many Sides of Target Row Refresh”Proceedings of the 41st IEEE Symposium on Security and Privacy (S&P), San Francisco, CA, USA, May 2020.
Slides (pptx) (pdf)
Lecture Slides (pptx) (pdf)
Talk Video (17 minutes)
Lecture Video (59 minutes)
Source Code
Web Article
Project Overview
Best paper award.
Pwnie Award 2020 for Most Innovative Research. Pwnie Awards 2020

Paper: SneakySnake🐍: A Fast and Accurate Universal Genome Pre-Alignment Filter for CPUs, GPUs, and FPGAs

Our recent paper is accepted in Bioinformatics!

Mohammed Alser, Taha Shahroodi, Juan-Gomez Luna, Can Alkan, and Onur Mutlu,
“SneakySnake: A Fast and Accurate Universal Genome Pre-Alignment Filter for CPUs, GPUs, and FPGAs”
Bioinformatics, to appear in 2020.

Source Code:  SneakySnake🐍: A Fast and Accurate Universal Genome Pre-Alignment Filter for CPUs, GPUs, and FPGAs

We are at MICRO 2020 this week! Join Lois Orosa for his talk on FIGARO, Monday, October 19 6:30PM CEST

Our new paper: FIGARO: Improving System Performance via Fine-Grained In-DRAM Data Relocation and Caching will be presented by Lois Orosa at MICRO 2020 on Monday, October 19 at 6:30 PM CEST.  Join us at MICRO 2020 online!

Authors: Yaohua Wang, Lois Orosa, Xiangjun Peng, Yang Guo, Saugata Ghose, Minesh Patel, Jeremie S. Kim, Juan Gómez Luna, Mohammad Sadrosadati, Nika Mansouri Ghiasi, and Onur Mutlu

Proceedings of the 53rd International Symposium on Microarchitecture (MICRO), Virtual, October 2020.
[Slides (pptx) (pdf)]
[Lightning Talk Slides (pptx) (pdf)]
[Talk Video (16 minutes)]
[Lightning Talk Video (1.5 minutes)]

Best Paper Award MICRO 2020: Congratulations Minesh Patel and co-authors!

Our new paper: Bit-Exact ECC Recovery (BEER): Determining DRAM On-Die ECC Functions by Exploiting DRAM Data Retention Characteristics  will be presented by Minesh Patel at MICRO 2020 on Monday, October 19 at 6:00 PM CEST.  Join us at MICRO 2020 online!

Update: This paper won the Best Paper Award!  Congratulations to Minesh Patel and co-authors: Jeremie S. Kim, Taha Shahroodi, Hasan Hassan, and Onur Mutlu,

We asked Minesh a couple questions about his paper, here’s what he had to say:

You recently won the Best Paper Award at MICRO.  Can you tell us more about the
significance of this paper?
This paper addresses the larger problem that hidden proprietary features
implemented by DRAM manufacturers impede end-users from bringing out the best of
DRAM technology. We believe BEER takes an important step towards bridging the
gap between industry and end-users, starting by focusing on a key example of
such features: on-die ECC. Our work discusses how and why on-die ECC limits
third-party DRAM consumers and then introduces techniques that the consumers can
use to overcome these limitations.

What were the biggest challenges for you during the writing and review process?I would say that the biggest challenge we faced when writing this paper was to
clearly articulate the problem of on-die ECC limiting third-party users. This
includes both (i) describing how and why this limitation arises and (ii)
providing concrete examples that the reader can relate to. We spent considerable
effort in crafting these arguments such that both we and the reader have a clear
understanding of the problem we tackle, our goal in this work, and the final
value of our contributions.

Proceedings of the 53rd International Symposium on Microarchitecture (MICRO), Virtual, October 2020.
[Slides (pptx) (pdf)]
[Short Talk Slides (pptx) (pdf)]
[Lightning Talk Slides (pptx) (pdf)]
[Talk Video (15 minutes)]
[Short Talk Video (5.5 minutes)]
[Lightning Talk Video (1.5 minutes)]
[BEER Source Code]