Join us as ISCA 2021 for our talks

ISCA 2021 Program:  https://www.iscaconf.org/isca2021/program/

Tuesday, June 15 Session 6B: Memory II 12 pm EDT:

Lois Orosa, Yaohua Wang, Mohammad Sadrosadati, Jeremie S. Kim, Minesh Patel, Ivan Puddu, Haocong Luo, Kaveh Razavi, Juan Gomez-Luna, Hasan Hassan, Nika Mansouri-Ghiasi, Saugata Ghose, and Onur Mutlu, “CODIC: A Low-Cost Substrate for Enabling Custom In-DRAM Functionalities and Optimizations”Proceedings of the 48th International Symposium on Computer Architecture (ISCA), Virtual, June 2021.
[Slides (pptx) (pdf)]
[Short Talk Slides (pptx) (pdf)]
[Talk Video (22 minutes)]


Wednesday, June 16 Session 11B 1:15 pm EDT:

Jawad Haj-Yahya, Jeremie S. Kim, A. Giray Yaglikci, Ivan Puddu, Lois Orosa, Juan Gomez Luna, Mohammed Alser, and Onur Mutlu, “IChannels: Exploiting Current Management Mechanisms to Create Covert Channels in Modern Processors”, Proceedings of the 48th International Symposium on Computer Architecture (ISCA), Virtual, June 2021.
[Slides (pptx) (pdf)]
[Short Talk Slides (pptx) (pdf)]
[Talk Video (21 minutes)]


Wednesday, June 16 Session 11A 1:15 pm EDT:

Ataberk Olgun, Minesh Patel, A. Giray Yaglikci, Haocong Luo, Jeremie S. Kim, F. Nisa Bostanci, Nandita Vijaykumar, Oguz Ergin, and Onur Mutlu, “QUAC-TRNG: High-Throughput True Random Number Generation Using Quadruple Row Activation in Commodity DRAM Chips”, Proceedings of the 48th International Symposium on Computer Architecture (ISCA), Virtual, June 2021.
[Slides (pptx) (pdf)]
[Short Talk Slides (pptx) (pdf)]
[Talk Video (25 minutes)]


 

Congratulations to Nastaran Hajinazar on her successful PhD Defence

Nastaran successfully defended her PhD thesis in June 2021.  Congratulations Nastaran!  We look forward to many more collaborations with you in the future.  

Thesis: Data-Centric and Data-Aware Frameworks for Fundamentally Efficient Data Handling in Modern Computing Systems

Abstract:

There is an explosive growth in the size of the input and/or intermediate data used and generated by modern and emerging applications. Unfortunately, modern computing systems are not capable of handling large amounts of data efficiently. Major concepts and components (e.g., the virtual memory system) and predominant execution models (e.g., the processor-centric execution model) used in almost all computing systems are designed without having modern applications’ overwhelming data demand in mind. As a result, accessing, moving, and processing large amounts of data faces important challenges in today’s systems, making data a first-class concern and a prime performance and energy bottleneck in such systems. This thesis studies the root cause of inefficiency in modern computing systems when handling modern applications’ data demand, and aims to fundamentally address such inefficiencies, with a focus on two directions.

First, we design a new framework that aids the widespread adoption of processing-using-DRAM, a data-centric computation paradigm that improves the overall performance and efficiency of the system when computing large amounts of data by minimizing the cost of data movement and enabling computation where the data resides. To this end, we introduce SIMDRAM, an end-to-end processing-using-DRAM framework that (1) efficiently computes complex operations required by modern data intensive applications, and (2) provides the ability to implement new arbitrary operations as required, all inan in-DRAM massively-parallel SIMD substrate that requires minimal changes to the DRAM architecture.

Second, we design a new, more scalable virtual memory framework that (1) eliminates the inefficiencies of the conventional virtual memory frameworks when handling the high memory demand in modern applications, and (2) is built from the ground up to understand, convey, and exploit data properties, to create opportunities for performance and efficiency improvements. To this end, we introduce the Virtual Block Interface (VBI), a novel virtual memory framework that (1) efficiently handles modern applications’ high data demand, (2) conveys properties of different pieces of program data (e.g., data structures) to the hardware and exploits this knowledge for performance and efficiency optimizations, (3) better extracts performance from the wide variety of new system configurations that are designed to process large amounts of data (e.g., hybrid memory systems), and (4) provides all the key features of the conventional virtual memory frameworks, at low overhead.

Keywords: Efficient Data Handling, Data-Centric Architectures, Data-Aware Architectures, Virtual Memory, Processing-in-Memory

Examining Committee

Onur Mutlu, Co-Senior Supervisor
Arrvindh Shriraman, Co-Senior Supervisor
Saugata Ghose, Supervisor
Vivek Seshadri, Supervisor
Alaa Alameldeen, Internal Examiner
Myoungsoo Jung, External Examiner
Zhenman fang, Chair 

 

Congratulations to Gagandeep Singh on his successful PhD Defence

Gagan successfully defended his PhD thesis in March 2021.  We are excited that Gagan will stay on with SAFARI as a postdoc and we look forward many successful collaborations with him.  Congratulations Gagan!

Gagandeep Singh, March 2021 (defended 29 March 2021)

Thesis title: “Designing, Modeling, and Optimizing Data-Intensive Computing Systems”
[Slides (pptx) (pdf)]

Abstract:  

The cost of moving data between the memory units and the compute units is a major contributor to the execution time and energy consumption of modern workloads in computing systems. At the same time, we are witnessing an enormous amount of data being generated across multiple application domains. Moreover, the end of Dennard scaling, the slowing of Moore’s law, and the emergence of dark silicon limit the attainable performance on current computing systems. These trends suggest a need for a paradigm shift towards a data-centric approach where computation is performed close to where the data resides. This approach allows us to overcome our current systems’ performance and energy limitations by minimizing the data movement overhead by ensuring that data does not overwhelm system components. Further, a data-centric approach can enable a data-driven view where we take advantage of vast amounts of available data to improve architectural decisions. Our current systems are designed to follow rigid and simple policies that lack adaptability. Therefore, current system policies fail to provide robust improvement across varying workloads and system conditions.

As a step towards modern architectures, this dissertation contributes to various aspects of the data-centric approach and further proposes several data-driven mechanisms.

First, we design NERO, a data-centric accelerator for a real-world weather prediction application. NERO overcomes the memory bottleneck of weather prediction stencil kernels by exploiting near-memory computation capability on specialized field-programmable gate array (FPGA) accelerators with high-bandwidth memory (HBM) that are attached to the host CPU.

Second, we explore the applicability of different number formats, including fixed-point, floating-point, and posit, for different stencil kernels. We search for the appropriate bit-width that reduces the memory footprint and improves the performance and energy efficiency with minimal loss in the accuracy.

Third, we propose NAPEL, an ML-based application performance and energy prediction framework for data-centric architectures. NAPEL uses ensemble learning to build a model that, once trained for a fraction of programs, can predict the performance and energy consumption of different applications.

Fourth, we present the first use of few-shot learning to transfer FPGA-based computing models across different hardware platforms and applications. LEAPER provides the ability to reuse a prediction model built on an inexpensive low-end local system to a new, unknown, high-end FPGA-based system.

Fifth, we propose QRator, a reinforcement learning (RL)-based data-placement technique for hybrid storage systems. QRator is a data-driven technique, which uses RL to develop a data-placement policy agent. The data-placement agent decides which data should be stored in what storage device to achieve the best performance while minimizing the migration overhead taking into account the device and the workload characteristics. Our evaluation results show that QRator significantly improves a hybrid storage subsystem’s performance compared to state-of-the-art data placement techniques.

Overall, this thesis provides two key conclusions: (1) hardware acceleration on an FPGA+HBM fabric is a promising solution to overcome the data movement bottleneck of our current computing systems; (2) data should drive system and design decisions by leveraging inherent data characteristics to make our computing systems more efficient. Thus, we conclude that the mechanisms proposed by this dissertation provide promising solutions to handle data well by following a data-centric approach and further demonstrates the importance of leveraging data to devise data-driven policies. We hope that the proposed architectural techniques and detailed experimental data presented in this dissertation will enable the development of energy-efficient data-intensive computing systems and drive the exploration of new mechanisms to improve the performance and energy efficiency of future computing systems.


Reducing Solid-State Drive Read Latency by Optimizing Read-Retry

Watch our recent talk at ASPLOS 2021:

Jisung Park, Myungsuk Kim, Myoungjun Chun, Lois Orosa, Jihong Kim, and Onur Mutlu,
“Reducing Solid-State Drive Read Latency by Optimizing Read-Retry”
Proceedings of the 26th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Virtual, March-April 2021.
[2-page Extended Abstract]
[Short Talk Slides (pptx) (pdf)]
[Full Talk Slides (pptx) (pdf)]
[Short Talk Video (5 mins)]
[Full Talk Video (19 mins)]

SIMDRAM: A Framework for Bit-Serial SIMD Processing using DRAM

Watch our recent talks at ASPLOS 2021!

Nastaran Hajinazar, Geraldo F. Oliveira, Sven Gregorio, Joao Dinis Ferreira, Nika Mansouri Ghiasi, Minesh Patel, Mohammed Alser, Saugata Ghose, Juan Gomez-Luna, and Onur Mutlu,
SIMDRAM: A Framework for Bit-Serial SIMD Processing using DRAM”
Proceedings of the 26th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Virtual, March-April 2021.
[2-page Extended Abstract]
[Short Talk Slides (pptx) (pdf)]
[Talk Slides (pptx) (pdf)]
[Short Talk Video (5 mins)]
[Full Talk Video (27 mins)]

 

Join us at ASPLOS 2021 online

We are at ASPLOS 2021 this week and next.  Join us for our talks and learn more about our recent works:

Session 2: Memory Systems, Monday, April 19 4:00 PM Pacific Tiime:

Irina Calciu, M. Talha Imran, Ivan Puddu, Sanidhya Kashyap, Hasan Al Maruf, Onur Mutlu, and Aasheesh Kolli,
“Rethinking Software Runtimes for Disaggregated Memory”
Proceedings of the 26th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Virtual, March-April 2021.
[2-page Extended Abstract]
[Source Code (Officially Artifact Evaluated)]


Session 8: Tools & Frameworks, Tuesday, April 20 4:00 PM Pacific Time: 

Nastaran Hajinazar, Geraldo F. Oliveira, Sven Gregorio, Joao Dinis Ferreira, Nika Mansouri Ghiasi, Minesh Patel, Mohammed Alser, Saugata Ghose, Juan Gomez-Luna, and Onur Mutlu,
“SIMDRAM: An End-to-End Framework for Bit-Serial SIMD Computing in DRAM”
Proceedings of the 26th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Virtual, March-April 2021.
[2-page Extended Abstract]
[Short Talk Slides (pptx) (pdf)]
[Talk Slides (pptx) (pdf)]
[Short Talk Video (5 mins)]
[Full Talk Video (27 mins)]


Session 17: Solid State Drives, Thursday, April 22 7:00 AM Pacific Time:

Jisung Park, Myungsuk Kim, Myoungjun Chun, Lois Orosa, Jihong Kim, and Onur Mutlu,
“Reducing Solid-State Drive Read Latency by Optimizing Read-Retry”
Proceedings of the 26th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Virtual, March-April 2021.
[2-page Extended Abstract]
[Short Talk Slides (pptx) (pdf)]
[Full Talk Slides (pptx) (pdf)]
[Short Talk Video (5 mins)]
[Full Talk Video (19 mins)]

 

ASPLOS Program:  https://asplos-conference.org/program/

Onur Mutlu and Co-authors Receive the 2021 HPCA Test of Time Award

Congratulations to Onur Mutlu and co-authors on receiving the HPCA Test of Time Award for their 2003 HPCA paper:

Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
Onur Mutlu, Jared Stark, Chris Wilkerson, Yale N. Patt

The IEEE International Symposium on High-Performance Computer Architecture (HPCA) Test of Time Award recognizes the most influential papers published in prior sessions of HPCA (held 18-22 years ago), and that have had a significant impact in the field.

The paper was Professor Onur Mutlu’s first publication during his PhD at the University of Texas with his PhD advisor Professor Yale Patt and colleagues from Intel, and Dr. Jared Stark and Chris Wilkerson.  The significance of the paper was described by the award committee as: “Runahead Execution is a pioneering paper that opened up new avenues in dynamic prefetching. The basic idea of run ahead execution effectively increases the instruction window very significantly, without having to increase physical resource size (e.g. the issue queue). This seminal paper spawned off a new area of ILP-enhancing microarchitecture research. This work has had strong industry impact as evidenced by IBM’s POWER6 – Load Lookahead, NVIDIA Denver, and Sun ROCK’s hardware scouting.” The award was presented last week at HPCA 2021 on March 2, 2021.

Watch Onur’s Retrospective HPCA Test of Time Award Talk Video (14 minutes)


Onur Mutlu
, Jared Stark, Chris Wilkerson, and Yale N. Patt,
“Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors”
Proceedings of the 9th International Symposium on High-Performance Computer Architecture (HPCA), pages 129-140, Anaheim, CA, February 2003.
[Talk Slides (pdf)]
[Lecture Slides (pptx) (pdf)]
[Lecture Video (1 hr 54 mins)]
[Retrospective HPCA Test of Time Award Talk Slides (pptx) (pdf)]
[Retrospective HPCA Test of Time Award Talk Video (14 minutes)]
One of the 15 computer architecture papers of 2003 selected as Top Picks by IEEE Micro.
HPCA Test of Time Award (awarded in 2021).

Interview with Mohammed Alser: on his recent papers and his future work


Mohammed Alser
 is a Senior Researcher and Lecturer with SAFARI. 
He was previously a PhD student in SAFARI, co-advised with Can Alkan. Mohammed co-teaches two Projects and Seminars courses on Genome Sequencing Analysis and Mobile Genomics along with the Seminar on Computer Architecture.  We recently interviewed Mohammed for the January 2021 issue of the SAFARI Newsletter.  


You have been busy this past year, and have published quite a few papers. Your recent work, SneakySnake, was recently published in Bioinformatics. This is an important work in improving computations for genome analysis. Can you tell us more about the significance of this work, and what broader impacts you hope for it?

SneakySnake is one of the projects that I enjoyed the most working on. We try in this work to significantly reduce the time spent on finding the similarities and differences between two genomic sequences without sacrificing solution optimality. Finding the similarities and differences between two sequences is a well-known computer science problem, called approximate string matching (ASM), which is solved using computationally expensive algorithms.

SneakySnake quickly finds the sequence pairs that have a large (greater than a user-defined threshold) number of differences and prevents applying computationally expensive algorithms for these sequence pairs, as such sequence pairs are usually not useful for genomic studies. SneakySnake is inspired by the single net routing (SNR) problem in VLSI design that was introduced in 1976. SneakySnake is the first work that proposes to convert the ASM problem into an instance of the SNR problem, which provides several key benefits as we discussed in the paper, and proposes a new efficient algorithm for comparing genomic sequences at scale.

SneakySnake is very beneficial for analyzing both short (e.g., Illumina) and long (e.g., nanopore) sequences as it accelerates the analysis of genomic sequences by up to two orders of magnitude compared to the state-of-the-art algorithms. SneakySnake works efficiently and fast on modern CPU, FPGA, and GPU architectures, which can potentially enable new applications of genome sequencing such as rapid surveillance of disease outbreaks including Ebola and COVID-19, near-patient testing, and bringing precision medicine to remote locations, without the need for large infrastructure.

One of the Bioinformatics journal’s reviewers states that: “SneakySnake is a valuable contribution to bioinformatics and it was innovative to reduce the ASM problem to the SNR problem in VLSI CAD”.


You also recently published Accelerating Genome Analysis, which reviews the improvements made in hardware accelerators for genome analysis. What are your take away messages from this paper, and what do you see as future priorities in hardware improvements for genome analysis?

Most speedup comes from parallelism enabled by novel architectures and algorithms. We need to develop acceleration solutions that exploit new efficient hardware-aware algorithms, hardware/software co-design, and hardware accelerators to achieve a high degree of parallelism.

Accelerating the entire genome analysis pipeline is important. Accelerating only a single step of genome analysis is not an effective acceleration approach as it limits the overall achieved speedup according to Amdahl’s Law.

Genome analysis is currently heavily bottlenecked by data movement. We need to reduce the high amount of data movement that takes place during genome analysis. Moving data (1) between compute units and main memory, (2) between multiple hardware accelerators, and (3) between the sequencing machine and the computer performing the analysis incurs high costs in terms of execution time and energy. These costs are a significant barrier to enabling efficient analysis that can keep up with sequencing technologies.

The need for flexible hardware architectures. We need to develop flexible hardware architectures that do not conservatively limit the range of supported parameter values at design time. Rapid changes in sequencing technologies (e.g., those that result in high sequencing error rates and longer read lengths) can quickly make specialized hardware with restricted parameter values obsolete.

The need for new genomic data formats. We need to adapt existing genomic data formats for hardware accelerators or develop more efficient file formats to maximize the benefits of hardware accelerators and reduce resource utilization.

Looking into the future, building a genome sequencing machine that provides the entire genome as a single string, rather than its short subsequences, might be possible. However, we believe that the need for hardware acceleration of whole-genome analysis will continue to remain necessary. We also believe performing genome analysis inside the sequencing machine itself can significantly improve efficiency by eliminating sequencer-to-computer data movement.


Your work has many topical applications that are highly relevant to society, including COVID modeling. Can you talk a bit about this, and your future research directions?

As the entire world is largely negatively impacted by the recent COVID-19 outbreak, we believe that everyone can help to end this pandemic based on their skills, expertise, and available resources. At SAFARI research group, we are helping with two main directions.

We are working on developing an accurate and configurable prediction model that evaluates the existing mitigation measures that the government applies in a region and provides suggestions on what strength the future mitigation measures should be. We are quantifying the spread of COVID-​19 in Switzerland (as a use-case) by calculating the daily reproduction number of COVID-19, which quantifies how many people are infected on average by an infected person. The reproduction number is directly affected by the mitigation measures that the government applies to a region. We are also considering other important factors such as daylight temperature that significantly affect the spread of COVID-​19 as we observed during the year 2020.

We are also working on developing new algorithms and hardware accelerators that perform fast and accurate metagenomic profiling for assessing microbial diversity, identifying potential new species, and investigating microbiomes associated with COVID-19 and other diseases. Performing genomic tests at scale during a pandemic highlights the dire need for building efficient specialized hardware that is both scalable and portable to enable genome analysis anywhere and anytime. We hope that the progress we make in this direction will also enable new applications that benefit human life and society.

Mohammed Alser, Taha Shahroodi, Juan-Gomez Luna, Can Alkan, and Onur Mutlu, SneakySnake: A Fast and Accurate Universal Genome Pre-Alignment Filter for CPUs, GPUs, and FPGAsBioinformatics, December 2020.
Paper PDF | Paper link Bioinformatics | Source Code

Mohammed Alser, Zulal Bingol, Damla Senol Cali, Jeremie Kim, Saugata Ghose, Can Alkan, and Onur Mutlu, Accelerating Genome Analysis: A Primer on an Ongoing Journey, IEEE MICRO, September/October 2020.
Paper | Slides (pptx) (pdf)

Interview with Damla Senol Cali: about her work, experience in SAFARI, and her future directions

Damla Senol Cali is a PhD student with SAFARI at CMU co-advised by Onur Mutlu and Saugata Ghose.
We recently interviewed Damla about her work, her experience as a PhD student in SAFARI, and her future directions.  Damla’s interview appears as the first video contribution to the SAFARI Meet our Members section in our January 2021 newsletter.

Watch Damla’s video interview here    |    Read the transcript here 

Damla Senol Cali, Gurpreet S. Kalsi, Zulal Bingol, Can Firtina, Lavanya Subramanian, Jeremie S. Kim, Rachata Ausavarungnirun, Mohammed Alser, Juan Gomez-Luna, Amirali Boroumand, Anant Nori, Allison Scibisz, Sreenivas Subramoney, Can Alkan, Saugata Ghose, and Onur Mutlu, GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence AnalysisProceedings of the 53rd International Symposium on Microarchitecture (MICRO), Virtual, October 2020.

Full paper link | Talk Video (18 mins) | Talk Slides (pptx) (pdf) |
Lecture Video (37 mins) | Lecture Slides (pptx) (pdf) | GenASM Source Code |
More information here

Damla Senol Cali, Jeremie Kim, Saugata Ghose, Can Alkan, and Onur Mutlu, Nanopore Sequencing Technology and Tools for Genome Assembly: Computational Analysis of the Current State, Bottlenecks and Future DirectionsBriefings in Bioinformatics (BIB), 2018.

Paper link | Paper PDF | AACBB’19 Talk Video | Slides (ppt) (pdf) |
More information here

Read the latest edition of our SAFARI Newsletter

Dear SAFARI friends,

Happy New Year!  We are excited to share our group highlights with you in this second edition of the SAFARI newsletter: https://safari.ethz.ch/safari-newsletter-january-2021/

In this second edition of the SAFARI newsletter, we share our research, teaching and outreach highlights from 2020, and look ahead to a new and inspiring future in 2021.

We wish you a wonderful 2021, in all aspects of your lives!

Onur Mutlu