I’m a senior researcher and lecturer in the SAFARI Research Group at ETH Zurich. I received BS and MS degrees in Telecommunication Engineering from the University of Sevilla, in 2001, and a PhD degree in Computer Science from the University of Córdoba, in 2012. Between 2005 and 2017, I was a faculty member of the University of Córdoba. My research interests focus on GPU and heterogeneous computing, processing-in-memory, memory systems, and hardware and software acceleration of medical imaging and bioinformatics. I am the lead author of PrIM, the first publicly-available benchmark suite for a real-world processing-in-memory architecture, and Chai, a benchmark suite for heterogeneous systems with CPU/GPU/FPGA.
I’m currently a leading collaborator in the EU project BioPIM.
Teaching
I teach several courses at ETH, including our Seminar in Computer Architecture, and hands-on Projects & Seminars courses. I also actively recruit, mentor, and supervise young researchers. If you’re interested in working in our group, you can apply directly on our application page, or if you’re an ETH student, you can have a look at some of our thesis topics here.
Courses 2023:
Seminar in Computer Architecture (Co-Lecturer)
Projects & Seminars: Data-Centric Architectures: Fundamentally Improving Performance and Energy (Main Lecturer)
Projects & Seminars: Programming Heterogeneous Computing Systems with GPUs (Main Lecturer)
PUMPS+AI summer school at Barcelona Supercomputing Center:
- Lecture 2: Input regularization
We covered two techniques that can be used to regularize input data for further computation. [Slides (pptx)]
These two techniques are also covered in our P&S HetSys course in the following two lectures:
Lecture 13: Parallel patterns: Merge | Lecture 14: Dynamic parallelism - Lecture 4: GPU implementation of neural networks
This lecture is an introduction to CNN, lowering convolutional layers to matrix multiplication, and advanced tiling (in shared memory and registers) for matrix multiplication [Slides (pptx)]
This lecture is also covered in our P&S HetSys:
Lecture 9: Advanced tiling for matrix multiplication - Lecture 6: Advanced features: Tensor cores, warp programming, unified memory.
This lecture covers advanced programming features such as tensor cores for ML/AI acceleration, warp programming for efficient inter-thread communication and synchronization, and unified memory for efficient and more programmable collaboration between CPU and GPU [Slides (pptx)]
These contents are covered in several P&S HetSys lectures:
Lecture 7: Parallel patterns: Histogram | Lecture 8: Parallel patterns: Convolution | Lecture 15: Collaborative computing
Courses 2022:
Seminar in Computer Architecture (Co-Lecturer)
Projects & Seminars: Data-Centric Architectures: Fundamentally Improving Performance and Energy (Main Lecturer)
Projects & Seminars: Programming Heterogeneous Computing Systems with GPUs (Main Lecturer)
Previous Teaching
(see our SAFARI courses page)
Upcoming Tutorials
“Real-world Processing-in-Memory Architectures” at HPCA’23
[HPCA Real-world PIM Tutorial website]
“Real-world Processing-in-Memory Systems for Modern Workloads” at ASPLOS’23
[ASPLOS Real-world PIM Tutorial website]
Publications
Gagandeep Singh, Rakesh Nadig, Jisung Park, Rahul Bera, Nastaran Hajinazar, David Novo, Juan Gomez-Luna, Sander Stuijk, Henk Corporaal, and Onur Mutlu, “Sibyl: Adaptive and Extensible Data Placement in Hybrid Storage Systems Using Online Reinforcement Learning”, Proceedings of the 49th International Symposium on Computer Architecture (ISCA), New York, June 2022. [Slides (pptx) (pdf)] [arXiv version] [Sibyl Source Code] [Talk Video (16 minutes)]
Damla Senol Cali, Konstantinos Kanellopoulos, Joel Lindegger, Zulal Bingol, Gurpreet S. Kalsi, Ziyi Zuo, Can Firtina, Meryem Banu Cavlak, Jeremie Kim, Nika MansouriGhiasi, Gagandeep Singh, Juan Gomez-Luna, Nour Almadhoun Alserr, Mohammed Alser, Sreenivas Subramoney, Can Alkan, Saugata Ghose, and Onur Mutlu, “SeGraM: A Universal Hardware Accelerator for Genomic Sequence-to-Graph and Sequence-to-Sequence Mapping”, Proceedings of the 49th International Symposium on Computer Architecture (ISCA), New York, June 2022. [Slides (pptx) (pdf)] [arXiv version]
Christina Giannoula, Ivan Fernandez, Juan Gomez-Luna, Nectarios Koziris, Georgios Goumas, and Onur Mutlu, “SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures”, Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), Mumbai, India, June 2022. [Extended arXiv Version] [Slides (pptx) (pdf)] [Long Talk Slides (pptx)(pdf)] [SparseP Source Code]
[Talk Video (16 minutes)] [Long Talk Video (55 minutes)]
Juan Gomez-Luna, Izzat El Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, and Onur Mutlu, “Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System”, IEEE Access, 10 May 2022. [arXiv version] [PrIM Benchmarks Source Code] [Slides (pptx) (pdf)] [Long Talk Slides (pptx) (pdf)] [Short Talk Slides (pptx) (pdf)] [SAFARI Live Seminar Slides (pptx) (pdf)] [SAFARI Live Seminar Video (2 hrs 57 mins)] [Lightning Talk Video (3 minutes)] [Short Talk Video (21 minutes)] [1-hour Talk Video (58 minutes)]
Open Source Code
PrIM, the first publicly-available benchmark suite for a real-world processing-in-memory architecture
Chai, a benchmark suite for heterogeneous systems with CPU/GPU/FPGA