In celebration of the 50th anniversary of the ACM/IEEE International Symposium on Computer Architecture (ISCA) this year, a special retrospective of selected papers was created to mark the ISCA-50 birthday. The collection highlights the most significant and memorable papers from 1996 through 2020 (ISCA-23 through ISCA-47), and tells an exciting and meaningful story of how research at ISCA progressed over those twenty-five years. Each selected paper is accompanied by a retrospective from the authors. This volume continues the tradition of the first volume of author retrospectives from 1973 through 1995 (ISCA-1 through ISCA-22).
We were very honored to have 5 of our papers recognized in the celebratory retrospective for ISCA-50. Out of the 1077 papers published over the 25 years, only 98 papers were recognized, and 5 of our papers were selected.
Congratulations to Onur Mutlu and team, and all co-authors on this recognition! Thank you to the team of referees, comprised of program chairs from the last few ISCA conferences, who compiled this special collection.
Here are the 5 papers and our retrospectives selected for the ISCA@50 Retrospective:
[1] “A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing” (ISCA 2015) Retrospective
Our ISCA 2015 paper provided a new programmable processing-in-memory (PIM) architecture and system design that can accelerate key data-intensive applications, with a focus on graph processing workloads. Our major idea was to completely rethink the system, including the programming model, data partitioning mechanisms, system support, instruction set architecture, along with near-memory execution units and their communication architecture, such that an important workload can be accelerated at a maximum level using a distributed system of well-connected near-memory accelerators. We built our accelerator system, Tesseract, using 3D-stacked memories with logic layers, where each logic layer contains general-purpose processing cores and cores communicate with each other using a message-passing programming model. Cores could be specialized for graph processing (or any other application to be accelerated). Read more in the Retrospective
[2] “Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors” (ISCA 2014) Retrospective
Our ISCA 2014 paper provided the first scientific and detailed characterization, analysis, and real-system demonstration of what is now popularly known as the RowHammer phenomenon (or vulnerability) in modern commodity DRAM chips, which are used as main memory in almost all modern computing systems. It experimentally demonstrated that more than 80% of all DRAM modules we tested from the three major DRAM vendors were vulnerable to the RowHammer read disturbance phenomenon: one can predictably induce bitflips (i.e., data corruption) in real DRAM modules by repeatedly accessing a DRAM row and thus causing electrical disturbance to physically nearby rows. We showed that a simple unprivileged user-level program induced RowHammer bitflips in multiple real systems and suggested that a security attack can be built using this proof-of-concept to hijack control of the system or cause other harm. To solve the RowHammer problem, our paper examined seven different approaches (including a novel probabilistic approach that has very low cost), some of which influenced or were adopted in different industrial products. Read more in the Retrospective
[3] “An Experimental Study of Data Retention Behavior in Modern DRAM Devices: Implications for Retention Time Profiling Mechanisms” (ISCA 2013) Retrospective
Our ISCA 2013 paper provides a fundamental empirical understanding of two major factors that make it very difficult to determine the minimum data retention time of a DRAM cell, based on the first comprehensive experimental characterization of retention time behavior of a large number of modern commodity DRAM chips from 5 major vendors. We study the prevalence, effects, and technology scaling characteristics of two significant phenomena: 1) data pattern dependence (DPD), where the minimum retention time of a DRAM cell is affected by data stored in other DRAM cells, and 2) variable retention time (VRT), where the minimum retention time of a DRAM cell changes unpredictably over time. To this end, we built a flexible FPGA-based testing infrastructure to test DRAM chips, which has enabled a large amount of further experimental research in DRAM. Our ISCA 2013 paper’s results using this infrastructure clearly demonstrate that DPD and VRT phenomena are significant issues that must be addressed for correct operation in DRAM-based systems and their effects are getting worse as DRAM scales to smaller technology node sizes. Our work also provides ideas on how to accurately identify data retention times in the presence of DPD and VRT, e.g., online profiling with error correcting codes, which later works examined and enabled. Most modern DRAM chips now incorporate ECC, especially to account for VRT effects. Read more in the Retrospective
[4] “RAIDR: Retention-Aware Intelligent DRAM Refresh” (ISCA 2012) Retrospective
Our ISCA 2012 paper, RAIDR, examines the DRAM refresh problem from a modern computing systems perspective, demonstrating its projected impact on systems with higher-capacity DRAM chips expected to be manufactured in the future. It proposes and evaluates a simple and low-cost solution that greatly reduces the performance & energy overheads of refresh by exploiting variation in data retention times across DRAM rows. The key idea is to group the DRAM rows into bins in terms of their minimum data retention times, store the bins in low-cost Bloom filters, and refresh rows in different bins at different rates. Evaluations in our paper (and later works) show that the idea greatly improves performance & energy efficiency and its benefits increase with DRAM chip capacity. The paper embodies an approach we have termed system-DRAM co-design. Read more in the Retrospective
[5] “Self-Optimizing Memory Controllers: A Reinforcement Learning Approach” (ISCA 2008) Retrospective
Shortly after publishing the paper, RL pioneer Andrew Barto (UMass-Amherst) and co-author of the seminal book on RL [O-42] with Richard Sutton (U. of Alberta) reached out to us, as he had come across the work and thought was fascinating. The first edition of Sutton & Barto was a frequent source of inspiration and knowledge during our work, so we were incredibly flattered by this! Eventually, as Andrew and Richard were preparing the second edition of their book, Andrew reached out again to tell us that they had decided to include our work as one of eight case studies of successful uses of RL alongside projects like Google’s AlphaGo and Watson’s Jeopardy!
The design inspired follow-up work by us and many others: at the time of this writing, this paper has been cited over six hundred times. We believe this is in part because the problem was important and the ML-based approach inspiring, and in part because ML over the last ten years has become a major source of interest in computer architecture and beyond. As for the particular implementation itself, it is probably fair to say that it did not age well, as DRAM standards continued to crank up the clock while CPU frequency mostly stagnated, which essentially squeezed the clock margin that the original design enjoyed [3,10]. However, we believe the fundamentals of the RL-based design can be applied and extended to the faster memory controllers of today and the future, and to other computer architecture problems as well [1,2,6,7,9,11]. Read more in the Retrospective
You can find the full ISCA@50 Retrospective here: https://sites.coecis.cornell.edu/isca50retrospective/papers/
And more on the selection process here: https://sites.coecis.cornell.edu/isca50retrospective/
Our group enjoyed several more highlights at ISCA 2023, which you can learn about here.