MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads

Thomas Nordahl Petersen, Oksana Lukjancenko, Martin Christen Frølund Thomsen, Maria Maddalena Sperotto, Ole Lund, Frank Møller Aarestrup, Thomas Sicheritz-Pontén

Research output: Contribution to journalJournal articleResearchpeer-review

336 Downloads (Pure)

Abstract

An increasing amount of species and gene identification studies rely on the use of next generation sequence analysis of either single isolate or metagenomics samples. Several methods are available to perform taxonomic annotations and a previous metagenomics benchmark study has shown that a vast number of false positive species annotations are a problem unless thresholds or post-processing are applied to differentiate between correct and false annotations. MGmapper is a package to process raw next generation sequence data and perform reference based sequence assignment, followed by a post-processing analysis to produce reliable taxonomy annotation at species and strain level resolution. An in-vitro bacterial mock community sample comprised of 8 genuses, 11 species and 12 strains was previously used to benchmark metagenomics classification methods. After applying a post-processing filter, we obtained 100% correct taxonomy assignments at species and genus level. A sensitivity and precision at 75% was obtained for strain level annotations. A comparison between MGmapper and Kraken at species level, shows MGmapper assigns taxonomy at species level using 84.8% of the sequence reads, compared to 70.5% for Kraken and both methods identified all species with no false positives. Extensive read count statistics are provided in plain text and excel sheets for both rejected and accepted taxonomy annotations. The use of custom databases is possible for the command-line version of MGmapper, and the complete pipeline is freely available as a bitbucked package (https://bitbucket.org/genomicepidemiology/mgmapper). A web-version (https://cge.cbs.dtu.dk/services/MGmapper) provides the basic functionality for analysis of small fastq datasets.
Original languageEnglish
Article numbere0176469
JournalP L o S One
Volume12
Issue number5
Number of pages13
ISSN1932-6203
DOIs
Publication statusPublished - 2017

Cite this

Petersen, Thomas Nordahl ; Lukjancenko, Oksana ; Thomsen, Martin Christen Frølund ; Sperotto, Maria Maddalena ; Lund, Ole ; Aarestrup, Frank Møller ; Sicheritz-Pontén, Thomas. / MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads. In: P L o S One. 2017 ; Vol. 12, No. 5.
@article{ef5007f21e574678988336e053933d0c,
title = "MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads",
abstract = "An increasing amount of species and gene identification studies rely on the use of next generation sequence analysis of either single isolate or metagenomics samples. Several methods are available to perform taxonomic annotations and a previous metagenomics benchmark study has shown that a vast number of false positive species annotations are a problem unless thresholds or post-processing are applied to differentiate between correct and false annotations. MGmapper is a package to process raw next generation sequence data and perform reference based sequence assignment, followed by a post-processing analysis to produce reliable taxonomy annotation at species and strain level resolution. An in-vitro bacterial mock community sample comprised of 8 genuses, 11 species and 12 strains was previously used to benchmark metagenomics classification methods. After applying a post-processing filter, we obtained 100{\%} correct taxonomy assignments at species and genus level. A sensitivity and precision at 75{\%} was obtained for strain level annotations. A comparison between MGmapper and Kraken at species level, shows MGmapper assigns taxonomy at species level using 84.8{\%} of the sequence reads, compared to 70.5{\%} for Kraken and both methods identified all species with no false positives. Extensive read count statistics are provided in plain text and excel sheets for both rejected and accepted taxonomy annotations. The use of custom databases is possible for the command-line version of MGmapper, and the complete pipeline is freely available as a bitbucked package (https://bitbucket.org/genomicepidemiology/mgmapper). A web-version (https://cge.cbs.dtu.dk/services/MGmapper) provides the basic functionality for analysis of small fastq datasets.",
author = "Petersen, {Thomas Nordahl} and Oksana Lukjancenko and Thomsen, {Martin Christen Fr{\o}lund} and Sperotto, {Maria Maddalena} and Ole Lund and Aarestrup, {Frank M{\o}ller} and Thomas Sicheritz-Pont{\'e}n",
year = "2017",
doi = "10.1371/journal.pone.0176469",
language = "English",
volume = "12",
journal = "P L o S One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "5",

}

MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads. / Petersen, Thomas Nordahl; Lukjancenko, Oksana; Thomsen, Martin Christen Frølund; Sperotto, Maria Maddalena; Lund, Ole; Aarestrup, Frank Møller; Sicheritz-Pontén, Thomas.

In: P L o S One, Vol. 12, No. 5, e0176469, 2017.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads

AU - Petersen, Thomas Nordahl

AU - Lukjancenko, Oksana

AU - Thomsen, Martin Christen Frølund

AU - Sperotto, Maria Maddalena

AU - Lund, Ole

AU - Aarestrup, Frank Møller

AU - Sicheritz-Pontén, Thomas

PY - 2017

Y1 - 2017

N2 - An increasing amount of species and gene identification studies rely on the use of next generation sequence analysis of either single isolate or metagenomics samples. Several methods are available to perform taxonomic annotations and a previous metagenomics benchmark study has shown that a vast number of false positive species annotations are a problem unless thresholds or post-processing are applied to differentiate between correct and false annotations. MGmapper is a package to process raw next generation sequence data and perform reference based sequence assignment, followed by a post-processing analysis to produce reliable taxonomy annotation at species and strain level resolution. An in-vitro bacterial mock community sample comprised of 8 genuses, 11 species and 12 strains was previously used to benchmark metagenomics classification methods. After applying a post-processing filter, we obtained 100% correct taxonomy assignments at species and genus level. A sensitivity and precision at 75% was obtained for strain level annotations. A comparison between MGmapper and Kraken at species level, shows MGmapper assigns taxonomy at species level using 84.8% of the sequence reads, compared to 70.5% for Kraken and both methods identified all species with no false positives. Extensive read count statistics are provided in plain text and excel sheets for both rejected and accepted taxonomy annotations. The use of custom databases is possible for the command-line version of MGmapper, and the complete pipeline is freely available as a bitbucked package (https://bitbucket.org/genomicepidemiology/mgmapper). A web-version (https://cge.cbs.dtu.dk/services/MGmapper) provides the basic functionality for analysis of small fastq datasets.

AB - An increasing amount of species and gene identification studies rely on the use of next generation sequence analysis of either single isolate or metagenomics samples. Several methods are available to perform taxonomic annotations and a previous metagenomics benchmark study has shown that a vast number of false positive species annotations are a problem unless thresholds or post-processing are applied to differentiate between correct and false annotations. MGmapper is a package to process raw next generation sequence data and perform reference based sequence assignment, followed by a post-processing analysis to produce reliable taxonomy annotation at species and strain level resolution. An in-vitro bacterial mock community sample comprised of 8 genuses, 11 species and 12 strains was previously used to benchmark metagenomics classification methods. After applying a post-processing filter, we obtained 100% correct taxonomy assignments at species and genus level. A sensitivity and precision at 75% was obtained for strain level annotations. A comparison between MGmapper and Kraken at species level, shows MGmapper assigns taxonomy at species level using 84.8% of the sequence reads, compared to 70.5% for Kraken and both methods identified all species with no false positives. Extensive read count statistics are provided in plain text and excel sheets for both rejected and accepted taxonomy annotations. The use of custom databases is possible for the command-line version of MGmapper, and the complete pipeline is freely available as a bitbucked package (https://bitbucket.org/genomicepidemiology/mgmapper). A web-version (https://cge.cbs.dtu.dk/services/MGmapper) provides the basic functionality for analysis of small fastq datasets.

U2 - 10.1371/journal.pone.0176469

DO - 10.1371/journal.pone.0176469

M3 - Journal article

VL - 12

JO - P L o S One

JF - P L o S One

SN - 1932-6203

IS - 5

M1 - e0176469

ER -