Congratulations to Damla Senol Cali on successfully defending her PhD!

Thesis:  Accelerating Genome Sequence Analysis via Efficient Hardware/Algorithm Co-Design


Genome sequence analysis plays a pivotal role in enabling many medical and scientific advancements in personalized medicine, outbreak tracing, the understanding of evolution, and forensics. Modern genome sequencing machines can rapidly generate massive amounts of genomics data at low cost. However, the analysis of genome sequencing data is currently bottlenecked by the computational power and memory bandwidth limitations of existing systems, as many of the steps in genome sequence analysis must process a large amount of data. Our goals in this dissertation are to (1) characterize the real-system behavior of the genome sequence analysis pipeline and its associated tools, (2) expose the bottlenecks and tradeoffs of the pipeline and tools, and (3) co-design fast and efficient algorithms along with scalable and energy-efficient customized hardware accelerators for the key pipeline bottlenecks to enable faster genome sequence analysis.

First, we comprehensively analyze the tools in the genome assembly pipeline for long reads in multiple dimensions (i.e., accuracy, performance, memory usage, and scalability), uncovering bottlenecks and tradeoffs that different combinations of tools and different underlying systems lead to. We show that we need high-performance, memory-efficient, low-power, and scalable designs for genome sequence analysis in order to exploit the advantages that genome sequencing provides. Second, we propose GenASM, an acceleration framework that builds upon bitvector-based approximate string matching (ASM) to accelerate multiple steps of the genome sequence analysis pipeline. We co-design our highly-parallel, scalable and memory-efficient algorithms with low-power and area-efficient hardware accelerators. We evaluate GenASM for three different use cases of ASM in genome sequence analysis and show that GenASM is significantly faster and more power- and area-efficient than state-of-the-art software and hardware tools for each of these use cases. Third, we implement an FPGA-based prototype for GenASM, where state-of-the-art 3D-stacked memory (HBM2) offers high memory bandwidth and FPGA resources offer high parallelism by instantiating multiple copies of the GenASM accelerators. Fourth, we propose GenGraph, the first hardware acceleration framework for sequence-to-graph mapping. Instead of representing the reference genome as a single linear DNA sequence, genome graphs provide a better representation of the diversity among populations by encoding variations across individuals in a graph data structure, avoiding a bias towards any one reference. GenGraph enables the efficient mapping of a sequenced genome to a graph-based reference, providing more comprehensive and accurate genome sequence analysis.

Overall, we demonstrate that genome sequence analysis can be accelerated by co- designing scalable and energy-efficient customized accelerators along with efficient algorithms for the key steps of genome sequence analysis.

Examining Committee

Onur Mutlu, Co-advisor, CMU-ECE, ETH Zurich
Saugata Ghose, Co-advisor, CMU-ECE, University of Illinois Urbana-Champaign
James C. Hoe, CMU-ECE
Can Alkan, Bilkent University

More on Damla’s publications, talks and research interests can be found on her website.








Congratulations to Nastaran Hajinazar on her successful PhD Defence

Nastaran successfully defended her PhD thesis in June 2021.  Congratulations Nastaran!  We look forward to many more collaborations with you in the future.  

Thesis: Data-Centric and Data-Aware Frameworks for Fundamentally Efficient Data Handling in Modern Computing Systems


There is an explosive growth in the size of the input and/or intermediate data used and generated by modern and emerging applications. Unfortunately, modern computing systems are not capable of handling large amounts of data efficiently. Major concepts and components (e.g., the virtual memory system) and predominant execution models (e.g., the processor-centric execution model) used in almost all computing systems are designed without having modern applications’ overwhelming data demand in mind. As a result, accessing, moving, and processing large amounts of data faces important challenges in today’s systems, making data a first-class concern and a prime performance and energy bottleneck in such systems. This thesis studies the root cause of inefficiency in modern computing systems when handling modern applications’ data demand, and aims to fundamentally address such inefficiencies, with a focus on two directions.

First, we design a new framework that aids the widespread adoption of processing-using-DRAM, a data-centric computation paradigm that improves the overall performance and efficiency of the system when computing large amounts of data by minimizing the cost of data movement and enabling computation where the data resides. To this end, we introduce SIMDRAM, an end-to-end processing-using-DRAM framework that (1) efficiently computes complex operations required by modern data intensive applications, and (2) provides the ability to implement new arbitrary operations as required, all inan in-DRAM massively-parallel SIMD substrate that requires minimal changes to the DRAM architecture.

Second, we design a new, more scalable virtual memory framework that (1) eliminates the inefficiencies of the conventional virtual memory frameworks when handling the high memory demand in modern applications, and (2) is built from the ground up to understand, convey, and exploit data properties, to create opportunities for performance and efficiency improvements. To this end, we introduce the Virtual Block Interface (VBI), a novel virtual memory framework that (1) efficiently handles modern applications’ high data demand, (2) conveys properties of different pieces of program data (e.g., data structures) to the hardware and exploits this knowledge for performance and efficiency optimizations, (3) better extracts performance from the wide variety of new system configurations that are designed to process large amounts of data (e.g., hybrid memory systems), and (4) provides all the key features of the conventional virtual memory frameworks, at low overhead.

Keywords: Efficient Data Handling, Data-Centric Architectures, Data-Aware Architectures, Virtual Memory, Processing-in-Memory

Examining Committee

Onur Mutlu, Co-Senior Supervisor
Arrvindh Shriraman, Co-Senior Supervisor
Saugata Ghose, Supervisor
Vivek Seshadri, Supervisor
Alaa Alameldeen, Internal Examiner
Myoungsoo Jung, External Examiner
Zhenman fang, Chair 


Congratulations to Gagandeep Singh on his successful PhD Defence

Gagan successfully defended his PhD thesis in March 2021.  We are excited that Gagan will stay on with SAFARI as a postdoc and we look forward many successful collaborations with him.  Congratulations Gagan!

Gagandeep Singh, March 2021 (defended 29 March 2021)

Thesis title: “Designing, Modeling, and Optimizing Data-Intensive Computing Systems”
[Slides (pptx) (pdf)]


The cost of moving data between the memory units and the compute units is a major contributor to the execution time and energy consumption of modern workloads in computing systems. At the same time, we are witnessing an enormous amount of data being generated across multiple application domains. Moreover, the end of Dennard scaling, the slowing of Moore’s law, and the emergence of dark silicon limit the attainable performance on current computing systems. These trends suggest a need for a paradigm shift towards a data-centric approach where computation is performed close to where the data resides. This approach allows us to overcome our current systems’ performance and energy limitations by minimizing the data movement overhead by ensuring that data does not overwhelm system components. Further, a data-centric approach can enable a data-driven view where we take advantage of vast amounts of available data to improve architectural decisions. Our current systems are designed to follow rigid and simple policies that lack adaptability. Therefore, current system policies fail to provide robust improvement across varying workloads and system conditions.

As a step towards modern architectures, this dissertation contributes to various aspects of the data-centric approach and further proposes several data-driven mechanisms.

First, we design NERO, a data-centric accelerator for a real-world weather prediction application. NERO overcomes the memory bottleneck of weather prediction stencil kernels by exploiting near-memory computation capability on specialized field-programmable gate array (FPGA) accelerators with high-bandwidth memory (HBM) that are attached to the host CPU.

Second, we explore the applicability of different number formats, including fixed-point, floating-point, and posit, for different stencil kernels. We search for the appropriate bit-width that reduces the memory footprint and improves the performance and energy efficiency with minimal loss in the accuracy.

Third, we propose NAPEL, an ML-based application performance and energy prediction framework for data-centric architectures. NAPEL uses ensemble learning to build a model that, once trained for a fraction of programs, can predict the performance and energy consumption of different applications.

Fourth, we present the first use of few-shot learning to transfer FPGA-based computing models across different hardware platforms and applications. LEAPER provides the ability to reuse a prediction model built on an inexpensive low-end local system to a new, unknown, high-end FPGA-based system.

Fifth, we propose QRator, a reinforcement learning (RL)-based data-placement technique for hybrid storage systems. QRator is a data-driven technique, which uses RL to develop a data-placement policy agent. The data-placement agent decides which data should be stored in what storage device to achieve the best performance while minimizing the migration overhead taking into account the device and the workload characteristics. Our evaluation results show that QRator significantly improves a hybrid storage subsystem’s performance compared to state-of-the-art data placement techniques.

Overall, this thesis provides two key conclusions: (1) hardware acceleration on an FPGA+HBM fabric is a promising solution to overcome the data movement bottleneck of our current computing systems; (2) data should drive system and design decisions by leveraging inherent data characteristics to make our computing systems more efficient. Thus, we conclude that the mechanisms proposed by this dissertation provide promising solutions to handle data well by following a data-centric approach and further demonstrates the importance of leveraging data to devise data-driven policies. We hope that the proposed architectural techniques and detailed experimental data presented in this dissertation will enable the development of energy-efficient data-intensive computing systems and drive the exploration of new mechanisms to improve the performance and energy efficiency of future computing systems.