Join us for our upcoming SAFARI Live Seminar:
Speaker: Hasindu Gamaarachchi, UNSW Sydney
Date: Wednesday, September 27 2023, 9:00 Zurich time (CEST)
Where: Livestream on YouTube (Link)
Title: An Ecosystem for Scalable & Computationally Efficient Nanopore Data Processing
Abstract: Emerging long-read sequencing – recently dubbed the “Nature Method of the Year” – has now become an important tool in understanding genetics and genomics, Nanopore is one of the major commercially available long-read technologies that offer ultra-long reads with limited capital cost. However, computational aspects of nanopore sequence analysis (e.g., basecalling, methylation calling) are a computational burden, impeding the scalability of population-scale experiments. In this talk, I will present a complete computational ecosystem that enables scale nanopore data analysis in a computationally efficient way, built on top of our file format called SLOW5/BLOW5 we introduced recently (Nature Biotechnology, 2022). SLOW5/BLOW5 reduces computational time by an order of magnitude and additionally reduces storage footprint by ~20-80% compared to the existing FAST5 format. SLOW5/BLOW5 ecosystem which is fully open-source now includes: (i) SLOW5/BLOW5 file format and accompanying specifications (ii) the slow5lib (C/C++) and pyslow5 (python) software libraries for reading and writing SLOW5/BLOW5 files; (iii) the slow5tools toolkit for creating, converting, handling and interacting with SLOW5/BLOW5 files; and (iv) a suite of open source bioinformatics software packages (including basecalling and methylation calling tools) with which SLOW5 is now integrated. The research community has already started building on top of SLOW5/BLOW5 and slow5-rs which allows SLOW5/BLOW5 access using the Rust programming language is an example. SLOW5/BLOW5 will continue to prioritise performance, compatibility, usability and transparency. SLOW5/BLOW5 for nanopore signal space is analogous to the seminal SAM/BAM formats in the base space which the bioinformaticians are familiar with, thus making the adoption of SLOW5/BLOW5 seamless.
Speaker Bio:
Hasindu Gamaarachchi is a lecturer in bioinformatics at the School of Computer Science and Engineering, UNSW Sydney, Australia, where he completed his PhD in 2020. He is also a visiting scientist in the Genomic Technologies Group at the Garvan Institute of Medical Research, Australia. Previously, he worked as a Genomics Computing Research Scientist at the Garvan Institute of Medical Research from 2020 to 2022. Hasindu’s research interests include the design, development and optimisation of bioinformatics software for real-time third-generation sequencing data analysis and prototyping novel domain-specific computer systems for efficient genomics data analysis. He has published in top journals in the field of genomics/bioinformatics, including Nature Biotechnology, Genome Biology, and Oxford Bioinformatics.