Table of Contents
Genome Sequencing on Mobile Devices
Genome analysis is the foundation of many scientific and medical discoveries, and serves as a key enabler of personalized medicine. This analysis is currently limited by the inability of existing technologies to read an organism’s complete genome. Instead, a dedicated machine (called sequencer) extracts a large number of shorter random fragments of an organism’s DNA sequence, known as reads. Small, handheld sequencers such as ONT MinION and Flongle make it possible to sequence bacterial and viral genomes in the field, thus facilitating disease outbreak analyses such as COVID-19, Ebola, and Zika. However, large, capable computers are still needed to perform genome assembly, which tries to reassemble read fragments back into an entire genome sequence. This limits the benefits of mobile sequencing and may pose problems in rapid diagnosis of infectious diseases, tracking outbreaks, and near-patient testing. The problem is exacerbated in developing countries and during crises where access to the internet network, cloud services, or data centers is even more limited.
In this course, we will cover the basics of genome analysis to understand the speed-accuracy tradeoff in using computationally-lightweight heuristics versus accurate computationally-expensive algorithms. Such heuristic algorithms typically operate on a smaller dataset that can fit in the memory of today’s mobile device. Students will experimentally evaluate different heuristic algorithms and observe their effect on the end results. This evaluation will give the students the chance to carry out a hands-on project to implement one or more of these heuristic algorithms in their smartphones and help the society by enabling on-site analysis of genomic data.
Prerequisites of the course:
- No prior knowledge in bioinformatics or genome analysis is required.
- A good knowledge in C programming language and programming is required.
- Interest in making things efficient and solving problems
The course is conducted in English.
Meeting 1: Required Materials
- IEEE Micro (IEEE MICRO), Vol. 40, No. 5, pages 65-75, September/October 2020.
Meeting 1: Recommended Materials
- GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis, Proceedings of the 53rd International Symposium on Microarchitecture (MICRO), Virtual, October 2020.
- SneakySnake: A Fast and Accurate Universal Genome Pre-Alignment Filter for CPUs, GPUs, and FPGAs, Bioinformatics, 2020.
- GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping Using Processing-in-Memory Technologies, BMC Genomics, 2018.
- [Proceedings of the 16th Asia Pacific Bioinformatics Conference (APBC), Yokohama, Japan, January 2018.
More Learning Materials
- A survey on accelerating genome analysis: https://arxiv.org/pdf/2008.00961
- A detailed survey on the state-of-the-art algorithms for sequencing data: https://arxiv.org/pdf/2003.00110
- An example of how to accelerate genomic sequence matching by two orders of magnitude with the help of FPGAs or GPUs: https://arxiv.org/abs/1910.09020
- An example of how to accelerate read mapping step by an order of magnitude and without using hardware acceleration: https://arxiv.org/pdf/1912.08735
- An example of using a different computing paradigm for accelerating read mapping step and improving its energy consumption: https://arxiv.org/pdf/1708.04329
- An early example of a purely software method for fast genome sequence analysis: http://www.biomedcentral.com/content/pdf/1471-2164-14-S1-S13.pdf