Accurate genotyping across variant classes and lengths using variant graphs

Jonas Andreas Sibbesen, Lasse Maretty, Jacob Malte Jensen, Bent Petersen, Siyang Liu, Palle Villesen, Laurits Skov, Kirstine Belling, Christian Theil Have, Jose Maria Gonzalez-Izarzugaza, Marie Grosjean, Jette Bork-Jensen, Jakob Grove, Thomas Dam-Als, Shujia Huang, Yuqi Chang, Ruiqi Xu, Weijian Ye, Junhua Rao, Xiaosen GuoJihua Sun, Hongzhi Cao, Chen Ye, Johan van Beusekom, Thomas Espeseth, Esben Flindt, Rune M. Friborg, Anders Egerup Halager, Stephanie Le Hellard, Christina M. Hultman, Francesco Lescai, Shengting Li, Ole Lund, Peter Løngren, Thomas Mailund, María Luisa Matey-Hernandez, Ole Mors, Christian N. S. Pedersen, Thomas Sicheritz-Pontén, Patrick Sullivan, Syed Ali , David Westergaard, Rachita Yadav, Ning Li, Xun Xu, Torben Hansen, Anders Krogh, Lars Bolund, Thorkild I. A. Sørensen, Oluf Pedersen, Ramneek Gupta, Simon Rasmussen, Søren Besenbacher, Anders D. Børglum, Jun Wang, Hans Eiberg, Karsten Kristiansen, Søren Brunak, Mikkel Heide Schierup, Anders Krogh*

*Corresponding author for this work

    Research output: Contribution to journalJournal articleResearchpeer-review

    1 Downloads (Pure)

    Abstract

    Genotype estimates from short-read sequencing data are typically based on the alignment of reads to a linear reference, but reads originating from more complex variants (for example, structural variants) often align poorly, resulting in biased genotype estimates. This bias can be mitigated by first collecting a set of candidate variants across discovery methods, individuals and databases, and then realigning the reads to the variants and reference simultaneously. However, this realignment problem has proved computationally difficult. Here, we present a new method (BayesTyper) that uses exact alignment of read k-mers to a graph representation of the reference and variants to efficiently perform unbiased, probabilistic genotyping across the variation spectrum. We demonstrate that BayesTyper generally provides superior variant sensitivity and genotyping accuracy relative to existing methods when used to integrate variants across discovery approaches and individuals. Finally, we demonstrate that including a ‘variation-prior’ database containing already known variants significantly improves sensitivity.
    Original languageEnglish
    JournalNature Genetics
    Number of pages11
    ISSN1061-4036
    DOIs
    Publication statusPublished - 2018

    Cite this

    Sibbesen, J. A., Maretty, L., Jensen, J. M., Petersen, B., Liu, S., Villesen, P., ... Krogh, A. (2018). Accurate genotyping across variant classes and lengths using variant graphs. Nature Genetics. https://doi.org/10.1038/s41588-018-0145-5