Best Paper Award at MICRO’22

Congratulations to our PhD student Rahul Bera and co-authors for their Best Paper Award at MICRO’22 for their work “Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load Prediction”!

The goal of this work is to accelerate off-chip load requests by removing the on-chip cache access latency from their critical path. To this end, we propose a new technique called Hermes, whose key idea is to: (1) accurately predict which load requests might go off-chip, and (2) speculatively fetch the data required by the predicted off-chip loads directly from the main memory, while also concurrently accessing the cache hierarchy for such loads.

We recently interviewed Rahul about his award and work on Hermes:

Q: Congratulations on the Best Paper Award at MICRO’22 for your work on Hermes. Can you tell us about the significance of this paper?

Rahul: We observed a peculiar phenomenon while looking for ways to reduce memory access latency: the cache hierarchy present in today’s processor contributes a significant latency overhead just to check whether or not a memory request needs to go off-chip. To put it into perspective, the load-to-use latency of a last-level cache (LLC) access in recent Intel Alderlake processors is reported to be 14ns (, which is nearly equivalent to the row-buffer hit latency (i.e., tCAS ) of a DDR4-3200 DRAM module. This begs for a solution that enables direct access to the off-chip main memory without paying the on-chip cache hierarchy access latency. That’s exactly what is enabled by Hermes. Hermes is the first work that employs a perceptron-based predictor to accurately identify which request would go off-chip and directly accesses the main memory for those predicted requests.

I believe Hermes has a strong significance to the industry. As the on-chip cache hierarchy continues to grow both in size and latency to cater to the ever-increasing data footprint of modern workloads, Hermes offers a practical solution to alleviate the drawbacks of long on-chip cache access latency, while simultaneously enjoying the benefits of a larger cache hierarchy. I also believe that Hermes’s  key observations and mechanism can be proven beneficial for various other types of computing platforms than just general purpose processors, e.g., graphics processing unit (GPU), which also shows the same trend of increasing cache hierarchy size over generations.

Q: Is it possible for a user to quickly test Hermes and build on top of what you have done?

Rahul: For sure, yes. We have open-sourced Hermes. You can pull it from our GitHub repository ( In this repo you will find all traces and scripts required to replicate our results, as well as build on top of this idea. We have provided a well-documented list of 13 types of data prefetchers and 9 types of off-chip predictor out of the box. If you want to implement your own off-chip predictor, that’s also pretty easy. The code is written in such a way that one can easily extend the base predictor class to implement their own train/predict function to quickly get essential statistics like coverage and accuracy from simulation results. We also actively monitor any feature update requests and bugs. So, please feel free to use the infrastructure and build on top of Hermes. I really believe that Hermes’s off-chip prediction accuracy/coverage can be increased even further. Can’t wait to see new solutions along this direction.

Q: What were the biggest challenges for you during the writing and review process? 

Rahul: I think the biggest challenge in writing was to articulate the idea cleanly and demonstrate Hermes’s novelty on top of prior works. Similar to many ideas in microarchitecture design, the key observation of Hermes is also noted by some prior works (as recent as HPCA’22). So it was our responsibility to cite every prior work and compare with them (either qualitatively or quantitatively) as best as possible. Thankfully, the reviewers were very appreciative of the draft and provided many constructive feedbacks that ultimately helped to get the paper in better shape. 

Q: What writing advice can you give to students who might be drafting their first paper?

Rahul: I am not sure whether I am experienced enough to provide advice yet since I am still learning! (laugh) But as a novice researcher, I often misunderstand the importance of writing. But gradually I came to realize that clearly articulating an idea in writing is as important as, if not more than, incepting/evaluating the idea. My advice would be to consider writing as a first-class candidate for any paper. If you are spending three/four months evaluating the idea, then spend at least a month writing it. In my experience, the writing takes shape iteratively. The first draft may seem fine at the first glance, but when I read the same draft after 2-3 days, I myself often find it not engaging enough. That’s why having enough time for writing helps to shape out the best possible version via multiple iterations. 

Posted in Awards, Code, Conference, Papers.