SAFARI Project & Seminars Courses (Fall 2022)

Accelerating Genome Analysis with FPGAs, GPUs, and New Execution Paradigms: 227-0085-33L

Course Description

A genome encodes a set of instructions for performing some functions within our cells. Analyzing our genomes helps, for example, to determine differences in these instructions (known as genetic variations) from human to human that may cause diseases or different traits. One benefit of knowing the genetic variations is better understanding and diagnosis of diseases and the development of efficient drugs.

Computers are widely used to perform genome analysis using dedicated algorithms and data structures. However, timely analysis of genomic data remains a daunting challenge, due to the complex algorithms and large datasets used for the analysis. Increasing the number of processing cores used for genome analysis decreases the overall analysis time, but significantly escalates the cost of building, maintaining, and cooling such a computing cluster, as well as the power/energy consumed by the cluster. This is a critical shortcoming with respect to both energy production and environmental friendliness. Cloud computing platforms can be used as an alternative to distribute the workload, but transferring the data between the clinic and the cloud poses new privacy and legal concerns.

In this course, we will cover the basics of genome analysis to understand the computational steps of the entire pipeline and find the computational bottlenecks. Students will learn about the existing efforts for accelerating one or more of these steps and will have the chance to carry out a hands-on project to improve these efforts.

Prerequisites of the course:

No prior knowledge in bioinformatics or genome analysis is required.
Digital Design and Computer Architecture (or equivalent course)
A good knowledge in C programming language is required.
Experience in at least one of the following is highly desirable: FPGA implementation and GPU programming.
Interest in making things efficient and solving problems

The course is conducted in English.

Course description page
Moodle

Mentors

	Name	E-mail	Office
Lecturer	Mohammed Alser	alserm@ethz.ch	ETZ H 61.1
Supervisor	Can Firtina	can.firtina@safari.ethz.ch	ETZ H 61.1
Supervisor	Juan Gómez Luna	juan.gomez@safari.ethz.ch	ETZ H 61.1
Supervisor	Joël Lindegger	joel.lindegger@safari.ethz.ch	ETZ H 64
Supervisor	Nika Mansourighiasi	nika.mansourighiasi@safari.ethz.ch	ETZ H 61.1
Supervisor	Maximilian-David Rumpf	rumpfm@student.ethz.ch
Supervisor	Arvid Gollwitzer	arvidg@student.ethz.ch
Supervisor	Julien Eudine	jeudine@student.ethz.ch
Supervisor	Younjoo Lee	younjoo0614@gmail.com
Supervisor	Luca Blum	lblum@student.ethz.ch

Lecture Video Playlist on YouTube

Lecture Playlist

Lecture Playlist from Spring 2022

Fall 2022 Meetings/Schedule

Week	Date	Livestream	Meeting	Learning Materials
W1	13.10 Thu.	Live	L1: Intelligent Genomic Analyses (PDF) (PPT) Video	Required Materials Recommended Materials
W2	27.10 Thu.	Live	L2: P&S Course Introduction & Logistics (PDF) (PPT)	Required Materials Recommended Materials
W3	3.11 Thu.	Premiere	L3: Introduction to Sequencing (PDF) (PPT)	Required Materials Recommended Materials
W4	10.11 Thu.	Premiere	L4: Read Mapping (PDF) (PPT)	Required Materials Recommended Materials
W5	17.11 Thu.	Premiere	L5: GateKeeper (PDF) (PPT)	Required Materials Recommended Materials
W6	24.11 Thu.	Premiere	L6: MAGNET & Shouji (PDF) (PPT)	Required Materials Recommended Materials
W7	01.12 Thu.	Premiere	L7: SneakySnake (PDF) (PPT)	Required Materials Recommended Materials
W8	08.12 Thu.	Premiere	L8: GenStore (PDF) (PPT)	Required Materials Recommended Materials
W9	15.12 Thu.	Premiere	L9: GRIM-Filter (PDF) (PPT)	Required Materials Recommended Materials
W10	22.12 Thu.	Premiere	L10: Genome Assembly (PDF) (PPT)	Required Materials Recommended Materials
W11	12.01 Thu.	Premiere	L11: Genomic Data Sharing Under Differential Privacy (PDF) (PPT)	Required Materials Recommended Materials
W12	19.01 Thu.	Premiere	L12: GenASM (PDF) (PPT)	Required Materials Recommended Materials

Learning Materials

Meeting 1: Required Materials

Accelerating Genome Analysis: A Primer on an Ongoing Journey:
- IEEE Micro (IEEE MICRO), Vol. 40, No. 5, pages 65-75, September/October 2020.
- Slides: PDF PPTX
- Talk Video (1 hour 2 minutes)

Meeting 1: Recommended Materials

GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis, Proceedings of the 53rd International Symposium on Microarchitecture (MICRO), Virtual, October 2020.
- ARM Research Summit Talk Video (21 minutes)
- MICRO'20 Full Talk Video (18 minutes)
- MICRO'20 Lightning Talk Video (1.5 minutes)
- ARM Research Summit Talk Slides PPTX PDF
- MICRO'20 Full Talk Slides PPTX PDF
- MICRO'20 Lightning Talk Slides PPTX PDF

SneakySnake: A Fast and Accurate Universal Genome Pre-Alignment Filter for CPUs, GPUs, and FPGAs, Bioinformatics, 2020.
- Source Code
- Online link at Bioinformatics Journal
- Talk Video
- Poster presentation at the Swiss Genomics Forum 2019, Geneva, 27 Sept 2019.
- [Poster (pptx) (pdf)]
- Poster presentation at ISMB/ECCB 2019, Basel, Switzerland July 21 - July 25.
- [Poster (pptx) (pdf)]

GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping Using Processing-in-Memory Technologies, BMC Genomics, 2018.
- [Proceedings of the 16th Asia Pacific Bioinformatics Conference (APBC), Yokohama, Japan, January 2018.
- [Slides (pptx) (pdf)]
- Source Code
- arxiv.org Version (pdf)
- Talk Video at AACBB 2019

More Learning Materials

A survey on accelerating genome analysis: https://arxiv.org/pdf/2008.00961
A detailed survey on the state-of-the-art algorithms for sequencing data: https://arxiv.org/pdf/2003.00110
An example of how to accelerate genomic sequence matching by two orders of magnitude with the help of FPGAs or GPUs: https://arxiv.org/abs/1910.09020
An example of how to accelerate read mapping step by an order of magnitude and without using hardware acceleration: https://arxiv.org/pdf/1912.08735
An example of using a different computing paradigm for accelerating read mapping step and improving its energy consumption: https://arxiv.org/pdf/1708.04329
Two examples on using software/hardware co-design to accelerate genomic sequence matching by two orders of magnitude: https://arxiv.org/abs/1604.01789 https://arxiv.org/abs/1809.07858

Table of Contents