Sequencing and de novo assembly of 150 genomes from Denmark as a population reference

Lasse Maretty, Jacob Malte Jensen, Bent Petersen, Jonas Andreas Sibbesen, Siyang Liu, Palle Villesen, Laurits Skov, Kirstine González-Izarzugaza Belling, Christian Theil Have, Jose Maria Gonzalez-Izarzugaza, Marie Grosjean, Jette Bork-Jensen, Jakob Grove, Thomas D. Als, Shujia Huang, Yuqi Chang, Ruiqi Xu, Weijian Ye, Junhua Rao, Xiaosen Guo & 39 others Jihua Sun, Hongzhi Cao, Chen Ye, Johan van Beusekom, Thomas Espeseth, Esben N. Flindt, Rune M. Friborg, Anders E. Halager, Stephanie Le Hellard, Christina M. Hultman, Francesco Lescai, Shengting Li, Ole Lund, Peter Løngreen, Thomas Mailund, María Luisa Matey-Hernandez, Ole Mors, Christian N. S. Pedersen, Thomas Sicheritz-Pontén, Patrick F. Sullivan, Ali Syed, David Westergaard, Rachita Yadav, Ning Li, Xun Xu, Torben Hansen, Anders Krogh, Lars Bolund, Thorkild I. A. Sørensen, Oluf Pedersen, Ramneek Gupta, Simon Rasmussen, Søren Besenbacher, Anders D. Börglum, Jun Wang, Hans Eiberg, Karsten Kristiansen, Søren Brunak, Mikkel Heide Schierup

    Research output: Contribution to journalJournal articleResearchpeer-review

    474 Downloads (Pure)

    Abstract

    Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set of structural variants including many novel insertions and demonstrate how this variant catalogue enables further deciphering of known association mapping signals. We leverage the assemblies to provide 100 completely resolved major histocompatibility complex haplotypes and to resolve major parts of the Y chromosome. Our study provides a regional reference genome that we expect will improve the power of future association mapping studies and hence pave the way for precision medicine initiatives, which now are being launched in many countries including Denmark.
    Original languageEnglish
    JournalNature
    Volume548
    Pages (from-to)87-91
    ISSN0028-0836
    DOIs
    Publication statusPublished - 2017

    Bibliographical note

    Creative Commons
    The article for which you have requested permission has been distributed under a Creative Commons CC-BY license (please see the article itself for the license version number). You may reuse this material without obtaining permission from Nature Publishing Group, providing that the author and the original source of publication are fully acknowledged, as per the terms of the license.
    For license terms, please see http://creativecommons.org/

    Cite this

    Maretty, L., Jensen, J. M., Petersen, B., Sibbesen, J. A., Liu, S., Villesen, P., ... Schierup, M. H. (2017). Sequencing and de novo assembly of 150 genomes from Denmark as a population reference. Nature, 548, 87-91. https://doi.org/10.1038/nature23264
    Maretty, Lasse ; Jensen, Jacob Malte ; Petersen, Bent ; Sibbesen, Jonas Andreas ; Liu, Siyang ; Villesen, Palle ; Skov, Laurits ; Belling, Kirstine González-Izarzugaza ; Theil Have, Christian ; Gonzalez-Izarzugaza, Jose Maria ; Grosjean, Marie ; Bork-Jensen, Jette ; Grove, Jakob ; Als, Thomas D. ; Huang, Shujia ; Chang, Yuqi ; Xu, Ruiqi ; Ye, Weijian ; Rao, Junhua ; Guo, Xiaosen ; Sun, Jihua ; Cao, Hongzhi ; Ye, Chen ; van Beusekom, Johan ; Espeseth, Thomas ; Flindt, Esben N. ; Friborg, Rune M. ; Halager, Anders E. ; Le Hellard, Stephanie ; Hultman, Christina M. ; Lescai, Francesco ; Li, Shengting ; Lund, Ole ; Løngreen, Peter ; Mailund, Thomas ; Matey-Hernandez, María Luisa ; Mors, Ole ; Pedersen, Christian N. S. ; Sicheritz-Pontén, Thomas ; Sullivan, Patrick F. ; Syed, Ali ; Westergaard, David ; Yadav, Rachita ; Li, Ning ; Xu, Xun ; Hansen, Torben ; Krogh, Anders ; Bolund, Lars ; Sørensen, Thorkild I. A. ; Pedersen, Oluf ; Gupta, Ramneek ; Rasmussen, Simon ; Besenbacher, Søren ; Börglum, Anders D. ; Wang, Jun ; Eiberg, Hans ; Kristiansen, Karsten ; Brunak, Søren ; Schierup, Mikkel Heide. / Sequencing and de novo assembly of 150 genomes from Denmark as a population reference. In: Nature. 2017 ; Vol. 548. pp. 87-91.
    @article{6d9d46dd3df64278be9632c2f35e0e8f,
    title = "Sequencing and de novo assembly of 150 genomes from Denmark as a population reference",
    abstract = "Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set of structural variants including many novel insertions and demonstrate how this variant catalogue enables further deciphering of known association mapping signals. We leverage the assemblies to provide 100 completely resolved major histocompatibility complex haplotypes and to resolve major parts of the Y chromosome. Our study provides a regional reference genome that we expect will improve the power of future association mapping studies and hence pave the way for precision medicine initiatives, which now are being launched in many countries including Denmark.",
    author = "Lasse Maretty and Jensen, {Jacob Malte} and Bent Petersen and Sibbesen, {Jonas Andreas} and Siyang Liu and Palle Villesen and Laurits Skov and Belling, {Kirstine Gonz{\'a}lez-Izarzugaza} and {Theil Have}, Christian and Gonzalez-Izarzugaza, {Jose Maria} and Marie Grosjean and Jette Bork-Jensen and Jakob Grove and Als, {Thomas D.} and Shujia Huang and Yuqi Chang and Ruiqi Xu and Weijian Ye and Junhua Rao and Xiaosen Guo and Jihua Sun and Hongzhi Cao and Chen Ye and {van Beusekom}, Johan and Thomas Espeseth and Flindt, {Esben N.} and Friborg, {Rune M.} and Halager, {Anders E.} and {Le Hellard}, Stephanie and Hultman, {Christina M.} and Francesco Lescai and Shengting Li and Ole Lund and Peter L{\o}ngreen and Thomas Mailund and Matey-Hernandez, {Mar{\'i}a Luisa} and Ole Mors and Pedersen, {Christian N. S.} and Thomas Sicheritz-Pont{\'e}n and Sullivan, {Patrick F.} and Ali Syed and David Westergaard and Rachita Yadav and Ning Li and Xun Xu and Torben Hansen and Anders Krogh and Lars Bolund and S{\o}rensen, {Thorkild I. A.} and Oluf Pedersen and Ramneek Gupta and Simon Rasmussen and S{\o}ren Besenbacher and B{\"o}rglum, {Anders D.} and Jun Wang and Hans Eiberg and Karsten Kristiansen and S{\o}ren Brunak and Schierup, {Mikkel Heide}",
    note = "Creative Commons The article for which you have requested permission has been distributed under a Creative Commons CC-BY license (please see the article itself for the license version number). You may reuse this material without obtaining permission from Nature Publishing Group, providing that the author and the original source of publication are fully acknowledged, as per the terms of the license. For license terms, please see http://creativecommons.org/",
    year = "2017",
    doi = "10.1038/nature23264",
    language = "English",
    volume = "548",
    pages = "87--91",
    journal = "Nature",
    issn = "0028-0836",
    publisher = "Nature Publishing Group",

    }

    Maretty, L, Jensen, JM, Petersen, B, Sibbesen, JA, Liu, S, Villesen, P, Skov, L, Belling, KG-I, Theil Have, C, Gonzalez-Izarzugaza, JM, Grosjean, M, Bork-Jensen, J, Grove, J, Als, TD, Huang, S, Chang, Y, Xu, R, Ye, W, Rao, J, Guo, X, Sun, J, Cao, H, Ye, C, van Beusekom, J, Espeseth, T, Flindt, EN, Friborg, RM, Halager, AE, Le Hellard, S, Hultman, CM, Lescai, F, Li, S, Lund, O, Løngreen, P, Mailund, T, Matey-Hernandez, ML, Mors, O, Pedersen, CNS, Sicheritz-Pontén, T, Sullivan, PF, Syed, A, Westergaard, D, Yadav, R, Li, N, Xu, X, Hansen, T, Krogh, A, Bolund, L, Sørensen, TIA, Pedersen, O, Gupta, R, Rasmussen, S, Besenbacher, S, Börglum, AD, Wang, J, Eiberg, H, Kristiansen, K, Brunak, S & Schierup, MH 2017, 'Sequencing and de novo assembly of 150 genomes from Denmark as a population reference', Nature, vol. 548, pp. 87-91. https://doi.org/10.1038/nature23264

    Sequencing and de novo assembly of 150 genomes from Denmark as a population reference. / Maretty, Lasse; Jensen, Jacob Malte; Petersen, Bent; Sibbesen, Jonas Andreas; Liu, Siyang; Villesen, Palle ; Skov, Laurits; Belling, Kirstine González-Izarzugaza; Theil Have, Christian; Gonzalez-Izarzugaza, Jose Maria; Grosjean, Marie; Bork-Jensen, Jette; Grove, Jakob; Als, Thomas D.; Huang, Shujia; Chang, Yuqi; Xu, Ruiqi; Ye, Weijian ; Rao, Junhua ; Guo, Xiaosen; Sun, Jihua; Cao, Hongzhi; Ye, Chen; van Beusekom, Johan; Espeseth, Thomas; Flindt, Esben N.; Friborg, Rune M. ; Halager, Anders E.; Le Hellard, Stephanie; Hultman, Christina M.; Lescai, Francesco; Li, Shengting; Lund, Ole; Løngreen, Peter; Mailund, Thomas; Matey-Hernandez, María Luisa; Mors, Ole; Pedersen, Christian N. S.; Sicheritz-Pontén, Thomas; Sullivan, Patrick F.; Syed, Ali; Westergaard, David ; Yadav, Rachita; Li, Ning; Xu, Xun; Hansen, Torben; Krogh, Anders; Bolund, Lars; Sørensen, Thorkild I. A.; Pedersen, Oluf; Gupta, Ramneek; Rasmussen, Simon; Besenbacher, Søren; Börglum, Anders D.; Wang, Jun; Eiberg, Hans; Kristiansen, Karsten; Brunak, Søren; Schierup, Mikkel Heide.

    In: Nature, Vol. 548, 2017, p. 87-91.

    Research output: Contribution to journalJournal articleResearchpeer-review

    TY - JOUR

    T1 - Sequencing and de novo assembly of 150 genomes from Denmark as a population reference

    AU - Maretty, Lasse

    AU - Jensen, Jacob Malte

    AU - Petersen, Bent

    AU - Sibbesen, Jonas Andreas

    AU - Liu, Siyang

    AU - Villesen, Palle

    AU - Skov, Laurits

    AU - Belling, Kirstine González-Izarzugaza

    AU - Theil Have, Christian

    AU - Gonzalez-Izarzugaza, Jose Maria

    AU - Grosjean, Marie

    AU - Bork-Jensen, Jette

    AU - Grove, Jakob

    AU - Als, Thomas D.

    AU - Huang, Shujia

    AU - Chang, Yuqi

    AU - Xu, Ruiqi

    AU - Ye, Weijian

    AU - Rao, Junhua

    AU - Guo, Xiaosen

    AU - Sun, Jihua

    AU - Cao, Hongzhi

    AU - Ye, Chen

    AU - van Beusekom, Johan

    AU - Espeseth, Thomas

    AU - Flindt, Esben N.

    AU - Friborg, Rune M.

    AU - Halager, Anders E.

    AU - Le Hellard, Stephanie

    AU - Hultman, Christina M.

    AU - Lescai, Francesco

    AU - Li, Shengting

    AU - Lund, Ole

    AU - Løngreen, Peter

    AU - Mailund, Thomas

    AU - Matey-Hernandez, María Luisa

    AU - Mors, Ole

    AU - Pedersen, Christian N. S.

    AU - Sicheritz-Pontén, Thomas

    AU - Sullivan, Patrick F.

    AU - Syed, Ali

    AU - Westergaard, David

    AU - Yadav, Rachita

    AU - Li, Ning

    AU - Xu, Xun

    AU - Hansen, Torben

    AU - Krogh, Anders

    AU - Bolund, Lars

    AU - Sørensen, Thorkild I. A.

    AU - Pedersen, Oluf

    AU - Gupta, Ramneek

    AU - Rasmussen, Simon

    AU - Besenbacher, Søren

    AU - Börglum, Anders D.

    AU - Wang, Jun

    AU - Eiberg, Hans

    AU - Kristiansen, Karsten

    AU - Brunak, Søren

    AU - Schierup, Mikkel Heide

    N1 - Creative Commons The article for which you have requested permission has been distributed under a Creative Commons CC-BY license (please see the article itself for the license version number). You may reuse this material without obtaining permission from Nature Publishing Group, providing that the author and the original source of publication are fully acknowledged, as per the terms of the license. For license terms, please see http://creativecommons.org/

    PY - 2017

    Y1 - 2017

    N2 - Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set of structural variants including many novel insertions and demonstrate how this variant catalogue enables further deciphering of known association mapping signals. We leverage the assemblies to provide 100 completely resolved major histocompatibility complex haplotypes and to resolve major parts of the Y chromosome. Our study provides a regional reference genome that we expect will improve the power of future association mapping studies and hence pave the way for precision medicine initiatives, which now are being launched in many countries including Denmark.

    AB - Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set of structural variants including many novel insertions and demonstrate how this variant catalogue enables further deciphering of known association mapping signals. We leverage the assemblies to provide 100 completely resolved major histocompatibility complex haplotypes and to resolve major parts of the Y chromosome. Our study provides a regional reference genome that we expect will improve the power of future association mapping studies and hence pave the way for precision medicine initiatives, which now are being launched in many countries including Denmark.

    U2 - 10.1038/nature23264

    DO - 10.1038/nature23264

    M3 - Journal article

    VL - 548

    SP - 87

    EP - 91

    JO - Nature

    JF - Nature

    SN - 0028-0836

    ER -

    Maretty L, Jensen JM, Petersen B, Sibbesen JA, Liu S, Villesen P et al. Sequencing and de novo assembly of 150 genomes from Denmark as a population reference. Nature. 2017;548:87-91. https://doi.org/10.1038/nature23264