From 21/01/2026 To 21/01/2026
Starts at 16:00 until 17:00GHGA Lecture Series: Efficient and accurate search in petabase-scale sequence repositories (Gunnar Rätsch and André Kahles)
- Address: Virtual
-
Language:
English
- Registration necessary: Yes
Publicly available sequencing data—spanning DNA, RNA, and proteins across all domains of life—has reached petabase scale, yet much of its scientific value remains locked behind metadata-only search. In this talk, we will present MetaGraph, our framework for enabling full-text search across essentially all public sequence archives. We will discuss how MetaGraph leverages annotated de Bruijn graphs and advanced compression to index 18.8 million sequencing datasets and over 200 billion amino-acid residues while reducing ~67 petabases of raw sequence to a footprint small enough to fit on a handful of consumer drives. We will talk about how this global index supports efficient and sensitive sequence queries—from exact k-mer search to sequence-to-graph alignment—and how it enables practical retrieval of transcript expression, genetic variation, antimicrobial-resistance signatures, or circular RNA junctions at extremely low cost. Importantly, we will also discuss how human genome sequencing data and associated phenotypic characterisations can be represented within this framework, and how such unified representations enable scalable queries across population-scale human datasets while preserving structure and context. By making large-scale sequence search rapid, affordable, and comprehensive, we will show how MetaGraph opens new opportunities for discovery across genomics, metagenomics, transcriptomics, and human genetic studies.
Register via Zoom
