Our group works on a broad range of research domains, including all aspects of computer architecture, hardware security, bioinformatics, computer systems. More specific topic areas within these four domains include memory systems, discovery of new security vulnerabilities and defenses, emerging technologies, genome analysis, new computing and communication paradigms, acceleration of important workloads (e.g., AI, genomics, personalized medicine), computing platforms for health and medicine, fault-tolerant systems, storage systems, distributed analytics, hardware/software co-design, mobile systems, energy-efficient systems, etc. Thesis projects are available in all topics. Please contact us if you are interested!
Below are more specific potential thesis and semester project topics with the SAFARI Research Group. These are incomplete, so if your interest is not covered by these specific projects but falls in any of the above areas, please contact us.
Memory is the major performance, energy and reliability bottleneck of all data-intensive workloads, e.g., deep learning, graph processing, in-memory databases, genome analysis. The landscape of main memory is quickly changing with many technologies appearing and being proposed. This includes many new DRAM standards (e.g., DDR5, LPDDR5, HBM3), new types of DRAM architectures, novel memory designs that are capable of processing in/near memory, new non-volatile memory technologies that are poised to replace DRAM, etc. The impact of such new designs and technologies on systems and applications need to be quickly evaluated and understood, with rigorous evaluation infrastructures.
Ramulator 2.0 is a modern, modular, and extensible DRAM simulator, developed as a successor to Ramulator (1.0). The public version of Ramulator 2.0 is available at https://github.com/CMU-SAFARI/ramulator2 Ramulator 2.0 models a wide range of DRAM standards, including DDR3, DDR4, DDR5, LPDDR5, HBM2/3, GDDR6, etc. The DRAM models of DDR3 and DDR4 are validated against publicly-available verification models provided by DRAM manufacturers. Unfortunately, for newer DRAM standards, there exist no publicly-available verification models. Read more
Near data processing for self-driving cars
Self-driving cars generate substantial data through cameras, lidar, and various sensors. The data must be processed by computing units like CPUs and GPUs, leading to significant data movement. Such data movement can diminish both the performance and lifespan of these devices, as well as increase power consumption. Given that self-driving cars must adhere to strict deadlines, have extended longevity requirements, face power constraints, and prioritize safety, it becomes urgent to minimize these challenges. Therefore, we aim to implement near-data processing strategies to reduce the overhead associated with data movement. Read more
OS I/O Stack Design for Storage-Centric Computing on Mobile Devices
A first part of this project will explore Design of a New Workflow for Data Stack Specialization, with the goal to simplify and tailor the data stack specifically for target mobile applications, enabling efficient offloading of computation and workloads to storage-centric computing systems. We’ll explore guiding principles for redesigning the data stack, look at how to evaluate the benefits of this optimization, and what widely-recognized benchmarks are available for reference.
A second part will look at Development of a Communication Interface and Protocol. The aim is to facilitate computation offloading and synchronization between the host CPU and storage-centric computing. We’ll look at how to define the memory-semantic interface, whether the approach should focus on synchronized or asynchronous access, on a stream model or a message model, and what considerations are involved in designing a communication protocol and extended command set to support in-storage data processing, based on the memory-semantic interface. We are looking for enthusiastic students who want to work hands-on on different software, hardware, and architecture projects for heterogeneous systems. Read more
Hardware-Software-OS Cooperative Techniques for Efficient and Secure Computing
Co-designing software, hardware, and the operating system is a promising approach towards (i) accelerating a wide spectrum of modern applications like graph analytics, generative AI, and recommender systems, (ii) designing intelligent and efficient OS policies like memory management, storage management, container spawning, and (ii) hardening system and processor security. We are searching for students who are interested in these hardware-software-OS co-design research topics:
- Hardware/OS co-design to enable efficient and secure memory and compute resources
- Designing software and hardware solutions to harden the security of emerging paradigms like processing-in-memory and in-storage processing Read more
Leveraging and Optimizing Heterogeneous Computing Systems
The end of Moore’s law created the need for turning computers into heterogeneous systems, i.e., composed by multiple types of processors that can suit better different types of workloads or parts of them. More than a decade ago, Graphics Processing Units (GPUs) became general-purpose parallel processors, in order to make their outstanding processing capabilities available to many workloads beyond graphics. GPUs are key in the recent development of Machine Learning and Artificial Intelligence, which took unbearable training times before GPUs. Field-Programmable Processing Arrays (FPGAs) are another example of computing device that can deliver impressive benefits in terms of performance and energy efficiency. We are looking for enthusiastic students who want to work hands-on on different software, hardware, and architecture projects for heterogeneous systems, for example:
- Heterogeneous implementations (GPU, FPGA) of modern applications from important fields such as bioinformatics, machine learning, graph processing, medical imaging, etc.
- Scheduling techniques for heterogeneous systems with different general-purpose processors and accelerators, e.g., kernel offloading, memory scheduling, etc.
- Workload characterization and programming tools that enable easier and more efficient use of heterogeneous systems. Read more
Programming and Improving a Real-world Processing-in-Memory Architecture
Data movement between the memory units and the compute units of current computing systems is a major performance and energy bottleneck. Many modern and important workloads such as machine learning, computational biology, and graph processing suffer greatly from the data movement bottleneck. In order to alleviate this data movement bottleneck, Processing-in-Memory (PIM) represents a paradigm shift from the traditional processor-centric design, where all computation takes place in the compute units, to a more data-centric design where processing elements are placed closer to or inside where the data resides. After many years of research proposals from Industry and Academia, a real-world processing-in-memory architecture is publicly available. The UPMEM PIM architecture integrates DRAM Processing Units (DPUs) inside DRAM chips. As a result, workloads can take advantage of an unprecedented memory bandwidth. Projects in this line of research span software and hardware as well as the software/hardware interface. We are looking for enthusiastic students who want to work hands-on (1) programming and optimizing workloads on the UPMEM PIM architecture, and/or (2) proposing and implementing hardware and architecture improvements for future PIM architectures. Read more
Machine-Learning Assisted Intelligent Architectures
Modern processors employ numerous human-driven policies such as prefetching, cache-replacement, data management, and memory scheduling. These techniques rely on statically chosen design features that favor specific workload and/or device characteristics over the other. However, the complexity of designing a highly-effective, high-performance, efficient policy, which can effectively adapt to the changes in workload behavior for a broad range of workloads, usually is well beyond human capability. In this project, you will help develop, implement, and evaluate machine learning-based techniques for different aspects of computer architecture. Read more
Rethinking Virtual Memory for Modern Computing Systems
Modern computing systems heavily depend on virtual memory to provide many features, all of which integral to the overall performance and functionality of the system. However, virtual memory is facing important challenges today that puts efficiently maintaining this critical component, as it is, at a serious risk. These challenges are fundamentally due to the fact that virtual memory was originally designed decades ago, not having the diversity and the complexity of today’s computing systems in mind. Our goal is to fundamentally rethink and redesign the virtual memory, in order to achieve the flexibility required in virtual memory abstraction for adopting today’s massively diverse system configurations while preserving widely-used programmer abstractions. In this project, you will be involved in (1) designing and performing evaluations to study the behavior of modern workloads and system configurations, and (2) using the insights gained from these evaluations to lead our research towards solutions for the challenges that the conventional virtual memory framework faces today. Read more
Designing and Evaluating Energy-Efficient Main Memory
DRAM-based main memory is used in nearly all computers today, but its energy consumption is becoming a growing concern. DRAM energy utilization now accounts for as much as 40% of the total energy used by a computer. Our goal is to design new DRAM-based memory architectures that reduce the energy consumption significantly. This requires a principled approach, where we must measure how existing DRAM devices consume energy. Our group has developed a sophisticated energy measurement infrastructure to collect detailed information on DRAM energy usage. You will be involved with designing and conducting experiments to measure energy consumption using our infrastructure. Based on the data, you will work with other researchers to identify memory operations that consume large amounts of energy, and will design new DRAM architectures that improve the efficiency of these operations. Read more
Evaluating and Enabling Processing inside Memory
Almost all data intensive workloads are bottlenecked in terms of performance and energy by the extensive data movement between processor and memory. We are looking for an enthusiastic student who is hungry for learning and enabling a paradigm shift that can eliminate this data movement bottleneck: computation inside memory (i.e., inside where the data resides). You will be involved in a project that aims to evaluate the benefits of executing data-intensive applications inside specialized logic in memoryand developing both mechanisms and simulatorsfor this purpose. Read more
Exploring new algorithms and hardware architectures for Genomic Sequence Alignment
Our understanding of human genomes today is affected by the ability of modern computing technology to quickly and accurately determine an individual’s entire genome. However, timely analysis of genomic data remains a challenge. One of the most fundamental computational steps in most bioinformatics analyses is genomic sequence alignment. The execution time of this step constitutes the main performance bottleneck in genomic data analysis. In our research group, we developed several efficient hardware architectures and algorithmic solutions to tackle this problem. You will work with other researchers to design and analyze new algorithms and ideas. You will also implement and evaluate these new algorithms using real genomic data. Read more
Navigating the Main Memory Landscape with Fast and Novel Infrastructures
Memory is the major performance, energy and reliability bottleneck of all data-intensive workloads, e.g., graph processing, machine learning using large data sets, data analytics, databases, genome analysis. The landscape of main memory is quickly changing with any technologies appearing and being proposed. This includes 3D-stacked memory designs that are capable of processing in memory, new non-volatile memory technologies that are poised to replace DRAM, and many new types of DRAM architectures. The impact of such new technologies on systems and applications need to be quickly evaluated and understood, with rigorous evaluation infrastructures. Our group develops and openly makes available such infrastructures. A prominent example is Ramulator, which is a very flexible and fast open-source infrastructure for simulating DRAM architectures: https://github.com/CMU-SAFARI/ramulator. This infrastructure is widely used in both academia and industry (e.g., by Google, Apple, AMD, Samsung). Your task in this project is to first understand Ramulator and then improve and extend it. Some extensions include support for the new technologies mentioned above (processing in memory, non-volatile memory, hybrid memories, new DRAM architectures). You will also evaluate the impact of such technologies on real workloads. Read more
DRAM-based main memory is used in most computers today. Manufacturers have been optimizing DRAM capacity and bandwidth for years, but little effort has been done for designing secure memories. Our goal is to discover new security vulnerabilities in DRAM and propose new mechanisms that provide security support in DRAM. This requires characterizing DRAM under different working conditions and testing different data and address patterns. Our group has developed a DRAM testing infrastructure for memory characterization. To design new in-Memory security mechanisms, our group has developed a DRAM simulator that allows evaluating new hardware features in DRAM quickly. You will be involved with designing and conducting experiments with other researchers. The goals are: 1) discover new security vulnerabilities and identify new attack vectors that might compromise the security of the system, and 2) designing new security mechanisms that protect from these and other vulnerabilities using our infrastructure. Read more