Accurate genotyping across variant classes and lengths using variant graphs

Jonas Andreas Sibbesen, Lasse Maretty, Jacob Malte Jensen, Bent Petersen, Siyang Liu, Palle Villesen, Laurits Skov, Kirstine Belling, Christian Theil Have, Jose Maria Gonzalez-Izarzugaza, Marie Grosjean, Jette Bork-Jensen, Jakob Grove, Thomas Dam-Als, Shujia Huang, Yuqi Chang, Ruiqi Xu, Weijian Ye, Junhua Rao, Xiaosen Guo & 40 others Jihua Sun, Hongzhi Cao, Chen Ye, Johan van Beusekom, Thomas Espeseth, Esben Flindt, Rune M. Friborg, Anders Egerup Halager, Stephanie Le Hellard, Christina M. Hultman, Francesco Lescai, Shengting Li, Ole Lund, Peter Løngren, Thomas Mailund, María Luisa Matey-Hernandez, Ole Mors, Christian N. S. Pedersen, Thomas Sicheritz-Pontén, Patrick Sullivan, Syed Ali , David Westergaard, Rachita Yadav, Ning Li, Xun Xu, Torben Hansen, Anders Krogh, Lars Bolund, Thorkild I. A. Sørensen, Oluf Pedersen, Ramneek Gupta, Simon Rasmussen, Søren Besenbacher, Anders D. Børglum, Jun Wang, Hans Eiberg, Karsten Kristiansen, Søren Brunak, Mikkel Heide Schierup, Anders Krogh*

*Corresponding author for this work

    Research output: Contribution to journalJournal articleResearchpeer-review

    1 Downloads (Pure)

    Abstract

    Genotype estimates from short-read sequencing data are typically based on the alignment of reads to a linear reference, but reads originating from more complex variants (for example, structural variants) often align poorly, resulting in biased genotype estimates. This bias can be mitigated by first collecting a set of candidate variants across discovery methods, individuals and databases, and then realigning the reads to the variants and reference simultaneously. However, this realignment problem has proved computationally difficult. Here, we present a new method (BayesTyper) that uses exact alignment of read k-mers to a graph representation of the reference and variants to efficiently perform unbiased, probabilistic genotyping across the variation spectrum. We demonstrate that BayesTyper generally provides superior variant sensitivity and genotyping accuracy relative to existing methods when used to integrate variants across discovery approaches and individuals. Finally, we demonstrate that including a ‘variation-prior’ database containing already known variants significantly improves sensitivity.
    Original languageEnglish
    JournalNature Genetics
    Number of pages11
    ISSN1061-4036
    DOIs
    Publication statusPublished - 2018

    Cite this

    Sibbesen, J. A., Maretty, L., Jensen, J. M., Petersen, B., Liu, S., Villesen, P., ... Krogh, A. (2018). Accurate genotyping across variant classes and lengths using variant graphs. Nature Genetics. https://doi.org/10.1038/s41588-018-0145-5
    Sibbesen, Jonas Andreas ; Maretty, Lasse ; Jensen, Jacob Malte ; Petersen, Bent ; Liu, Siyang ; Villesen, Palle ; Skov, Laurits ; Belling, Kirstine ; Theil Have, Christian ; Gonzalez-Izarzugaza, Jose Maria ; Grosjean, Marie ; Bork-Jensen, Jette ; Grove, Jakob ; Dam-Als, Thomas ; Huang, Shujia ; Chang, Yuqi ; Xu, Ruiqi ; Ye, Weijian ; Rao, Junhua ; Guo, Xiaosen ; Sun, Jihua ; Cao, Hongzhi ; Ye, Chen ; van Beusekom, Johan ; Espeseth, Thomas ; Flindt, Esben ; Friborg, Rune M. ; Halager, Anders Egerup ; Le Hellard, Stephanie ; Hultman, Christina M. ; Lescai, Francesco ; Li, Shengting ; Lund, Ole ; Løngren, Peter ; Mailund, Thomas ; Matey-Hernandez, María Luisa ; Mors, Ole ; Pedersen, Christian N. S. ; Sicheritz-Pontén, Thomas ; Sullivan, Patrick ; Ali , Syed ; Westergaard, David ; Yadav, Rachita ; Li, Ning ; Xu, Xun ; Hansen, Torben ; Krogh, Anders ; Bolund, Lars ; Sørensen, Thorkild I. A. ; Pedersen, Oluf ; Gupta, Ramneek ; Rasmussen, Simon ; Besenbacher, Søren ; Børglum, Anders D. ; Wang, Jun ; Eiberg, Hans ; Kristiansen, Karsten ; Brunak, Søren ; Schierup, Mikkel Heide ; Krogh, Anders. / Accurate genotyping across variant classes and lengths using variant graphs. In: Nature Genetics. 2018.
    @article{4f4642726c624e3f9c523f0ebb9ad7b9,
    title = "Accurate genotyping across variant classes and lengths using variant graphs",
    abstract = "Genotype estimates from short-read sequencing data are typically based on the alignment of reads to a linear reference, but reads originating from more complex variants (for example, structural variants) often align poorly, resulting in biased genotype estimates. This bias can be mitigated by first collecting a set of candidate variants across discovery methods, individuals and databases, and then realigning the reads to the variants and reference simultaneously. However, this realignment problem has proved computationally difficult. Here, we present a new method (BayesTyper) that uses exact alignment of read k-mers to a graph representation of the reference and variants to efficiently perform unbiased, probabilistic genotyping across the variation spectrum. We demonstrate that BayesTyper generally provides superior variant sensitivity and genotyping accuracy relative to existing methods when used to integrate variants across discovery approaches and individuals. Finally, we demonstrate that including a ‘variation-prior’ database containing already known variants significantly improves sensitivity.",
    author = "Sibbesen, {Jonas Andreas} and Lasse Maretty and Jensen, {Jacob Malte} and Bent Petersen and Siyang Liu and Palle Villesen and Laurits Skov and Kirstine Belling and {Theil Have}, Christian and Gonzalez-Izarzugaza, {Jose Maria} and Marie Grosjean and Jette Bork-Jensen and Jakob Grove and Thomas Dam-Als and Shujia Huang and Yuqi Chang and Ruiqi Xu and Weijian Ye and Junhua Rao and Xiaosen Guo and Jihua Sun and Hongzhi Cao and Chen Ye and {van Beusekom}, Johan and Thomas Espeseth and Esben Flindt and Friborg, {Rune M.} and Halager, {Anders Egerup} and {Le Hellard}, Stephanie and Hultman, {Christina M.} and Francesco Lescai and Shengting Li and Ole Lund and Peter L{\o}ngren and Thomas Mailund and Matey-Hernandez, {Mar{\'i}a Luisa} and Ole Mors and Pedersen, {Christian N. S.} and Thomas Sicheritz-Pont{\'e}n and Patrick Sullivan and Syed Ali and David Westergaard and Rachita Yadav and Ning Li and Xun Xu and Torben Hansen and Anders Krogh and Lars Bolund and S{\o}rensen, {Thorkild I. A.} and Oluf Pedersen and Ramneek Gupta and Simon Rasmussen and S{\o}ren Besenbacher and B{\o}rglum, {Anders D.} and Jun Wang and Hans Eiberg and Karsten Kristiansen and S{\o}ren Brunak and Schierup, {Mikkel Heide} and Anders Krogh",
    year = "2018",
    doi = "10.1038/s41588-018-0145-5",
    language = "English",
    journal = "Nature Genetics",
    issn = "1061-4036",
    publisher = "Nature Publishing Group",

    }

    Sibbesen, JA, Maretty, L, Jensen, JM, Petersen, B, Liu, S, Villesen, P, Skov, L, Belling, K, Theil Have, C, Gonzalez-Izarzugaza, JM, Grosjean, M, Bork-Jensen, J, Grove, J, Dam-Als, T, Huang, S, Chang, Y, Xu, R, Ye, W, Rao, J, Guo, X, Sun, J, Cao, H, Ye, C, van Beusekom, J, Espeseth, T, Flindt, E, Friborg, RM, Halager, AE, Le Hellard, S, Hultman, CM, Lescai, F, Li, S, Lund, O, Løngren, P, Mailund, T, Matey-Hernandez, ML, Mors, O, Pedersen, CNS, Sicheritz-Pontén, T, Sullivan, P, Ali , S, Westergaard, D, Yadav, R, Li, N, Xu, X, Hansen, T, Krogh, A, Bolund, L, Sørensen, TIA, Pedersen, O, Gupta, R, Rasmussen, S, Besenbacher, S, Børglum, AD, Wang, J, Eiberg, H, Kristiansen, K, Brunak, S, Schierup, MH & Krogh, A 2018, 'Accurate genotyping across variant classes and lengths using variant graphs', Nature Genetics. https://doi.org/10.1038/s41588-018-0145-5

    Accurate genotyping across variant classes and lengths using variant graphs. / Sibbesen, Jonas Andreas; Maretty, Lasse; Jensen, Jacob Malte; Petersen, Bent; Liu, Siyang; Villesen, Palle ; Skov, Laurits; Belling, Kirstine; Theil Have, Christian; Gonzalez-Izarzugaza, Jose Maria; Grosjean, Marie; Bork-Jensen, Jette; Grove, Jakob; Dam-Als, Thomas; Huang, Shujia ; Chang, Yuqi ; Xu, Ruiqi; Ye, Weijian ; Rao, Junhua ; Guo, Xiaosen ; Sun, Jihua; Cao, Hongzhi ; Ye, Chen ; van Beusekom, Johan; Espeseth, Thomas; Flindt, Esben ; Friborg, Rune M. ; Halager, Anders Egerup; Le Hellard, Stephanie; Hultman, Christina M.; Lescai, Francesco; Li, Shengting; Lund, Ole; Løngren, Peter; Mailund, Thomas; Matey-Hernandez, María Luisa; Mors, Ole; Pedersen, Christian N. S.; Sicheritz-Pontén, Thomas; Sullivan, Patrick ; Ali , Syed; Westergaard, David; Yadav, Rachita; Li, Ning ; Xu, Xun; Hansen, Torben; Krogh, Anders; Bolund, Lars; Sørensen, Thorkild I. A.; Pedersen, Oluf; Gupta, Ramneek; Rasmussen, Simon; Besenbacher, Søren; Børglum, Anders D.; Wang, Jun; Eiberg, Hans; Kristiansen, Karsten; Brunak, Søren; Schierup, Mikkel Heide; Krogh, Anders.

    In: Nature Genetics, 2018.

    Research output: Contribution to journalJournal articleResearchpeer-review

    TY - JOUR

    T1 - Accurate genotyping across variant classes and lengths using variant graphs

    AU - Sibbesen, Jonas Andreas

    AU - Maretty, Lasse

    AU - Jensen, Jacob Malte

    AU - Petersen, Bent

    AU - Liu, Siyang

    AU - Villesen, Palle

    AU - Skov, Laurits

    AU - Belling, Kirstine

    AU - Theil Have, Christian

    AU - Gonzalez-Izarzugaza, Jose Maria

    AU - Grosjean, Marie

    AU - Bork-Jensen, Jette

    AU - Grove, Jakob

    AU - Dam-Als, Thomas

    AU - Huang, Shujia

    AU - Chang, Yuqi

    AU - Xu, Ruiqi

    AU - Ye, Weijian

    AU - Rao, Junhua

    AU - Guo, Xiaosen

    AU - Sun, Jihua

    AU - Cao, Hongzhi

    AU - Ye, Chen

    AU - van Beusekom, Johan

    AU - Espeseth, Thomas

    AU - Flindt, Esben

    AU - Friborg, Rune M.

    AU - Halager, Anders Egerup

    AU - Le Hellard, Stephanie

    AU - Hultman, Christina M.

    AU - Lescai, Francesco

    AU - Li, Shengting

    AU - Lund, Ole

    AU - Løngren, Peter

    AU - Mailund, Thomas

    AU - Matey-Hernandez, María Luisa

    AU - Mors, Ole

    AU - Pedersen, Christian N. S.

    AU - Sicheritz-Pontén, Thomas

    AU - Sullivan, Patrick

    AU - Ali , Syed

    AU - Westergaard, David

    AU - Yadav, Rachita

    AU - Li, Ning

    AU - Xu, Xun

    AU - Hansen, Torben

    AU - Krogh, Anders

    AU - Bolund, Lars

    AU - Sørensen, Thorkild I. A.

    AU - Pedersen, Oluf

    AU - Gupta, Ramneek

    AU - Rasmussen, Simon

    AU - Besenbacher, Søren

    AU - Børglum, Anders D.

    AU - Wang, Jun

    AU - Eiberg, Hans

    AU - Kristiansen, Karsten

    AU - Brunak, Søren

    AU - Schierup, Mikkel Heide

    AU - Krogh, Anders

    PY - 2018

    Y1 - 2018

    N2 - Genotype estimates from short-read sequencing data are typically based on the alignment of reads to a linear reference, but reads originating from more complex variants (for example, structural variants) often align poorly, resulting in biased genotype estimates. This bias can be mitigated by first collecting a set of candidate variants across discovery methods, individuals and databases, and then realigning the reads to the variants and reference simultaneously. However, this realignment problem has proved computationally difficult. Here, we present a new method (BayesTyper) that uses exact alignment of read k-mers to a graph representation of the reference and variants to efficiently perform unbiased, probabilistic genotyping across the variation spectrum. We demonstrate that BayesTyper generally provides superior variant sensitivity and genotyping accuracy relative to existing methods when used to integrate variants across discovery approaches and individuals. Finally, we demonstrate that including a ‘variation-prior’ database containing already known variants significantly improves sensitivity.

    AB - Genotype estimates from short-read sequencing data are typically based on the alignment of reads to a linear reference, but reads originating from more complex variants (for example, structural variants) often align poorly, resulting in biased genotype estimates. This bias can be mitigated by first collecting a set of candidate variants across discovery methods, individuals and databases, and then realigning the reads to the variants and reference simultaneously. However, this realignment problem has proved computationally difficult. Here, we present a new method (BayesTyper) that uses exact alignment of read k-mers to a graph representation of the reference and variants to efficiently perform unbiased, probabilistic genotyping across the variation spectrum. We demonstrate that BayesTyper generally provides superior variant sensitivity and genotyping accuracy relative to existing methods when used to integrate variants across discovery approaches and individuals. Finally, we demonstrate that including a ‘variation-prior’ database containing already known variants significantly improves sensitivity.

    U2 - 10.1038/s41588-018-0145-5

    DO - 10.1038/s41588-018-0145-5

    M3 - Journal article

    JO - Nature Genetics

    JF - Nature Genetics

    SN - 1061-4036

    ER -