Rapid and precise alignment of raw reads against redundant databases with KMA

Research output: Contribution to journalJournal article – Annual report year: 2018Researchpeer-review

Standard

Rapid and precise alignment of raw reads against redundant databases with KMA. / Clausen, Philip Thomas Lanken Conradsen; Aarestrup, Frank Møller; Lund, Ole.

In: B M C Bioinformatics, Vol. 19, No. 1, 307, 2018.

Research output: Contribution to journalJournal article – Annual report year: 2018Researchpeer-review

Harvard

APA

CBE

MLA

Vancouver

Author

Bibtex

@article{65ae414e1c214b98bcf30939d3a625c7,
title = "Rapid and precise alignment of raw reads against redundant databases with KMA",
abstract = "Background: As the cost of sequencing has declined, clinical diagnostics based on next generation sequencing (NGS) have become reality. Diagnostics based on sequencing will require rapid and precise mapping against redundant databases because some of the most important determinants, such as antimicrobial resistance and core genome multilocus sequence typing (MLST) alleles, are highly similar to one another.In order to facilitate this, a novel mapping method, KMA (k-mer alignment), was designed. KMA is able to map raw reads directly against redundant databases, it also scales well for large redundant databases. KMA uses k-mer seeding to speed up mapping and the Needleman-Wunsch algorithm to accurately align extensions from k-mer seeds. Multi-mapping reads are resolved using a novel sorting scheme (ConClave scheme), ensuring an accurate selection of templates.Results: The functionality of KMA was compared with SRST2, MGmapper, BWA-MEM, Bowtie2, Minimap2 and Salmon, using both simulated data and a dataset of Escherichia coli mapped against resistance genes and core genome MLST alleles. KMA outperforms current methods with respect to both accuracy and speed, while using a comparable amount of memory.Conclusion: With KMA, it was possible map raw reads directly against redundant databases with high accuracy, speed and memory efficiency.",
author = "Clausen, {Philip Thomas Lanken Conradsen} and Aarestrup, {Frank M{\o}ller} and Ole Lund",
year = "2018",
doi = "10.1186/s12859-018-2336-6",
language = "English",
volume = "19",
journal = "B M C Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central Ltd.",
number = "1",

}

RIS

TY - JOUR

T1 - Rapid and precise alignment of raw reads against redundant databases with KMA

AU - Clausen, Philip Thomas Lanken Conradsen

AU - Aarestrup, Frank Møller

AU - Lund, Ole

PY - 2018

Y1 - 2018

N2 - Background: As the cost of sequencing has declined, clinical diagnostics based on next generation sequencing (NGS) have become reality. Diagnostics based on sequencing will require rapid and precise mapping against redundant databases because some of the most important determinants, such as antimicrobial resistance and core genome multilocus sequence typing (MLST) alleles, are highly similar to one another.In order to facilitate this, a novel mapping method, KMA (k-mer alignment), was designed. KMA is able to map raw reads directly against redundant databases, it also scales well for large redundant databases. KMA uses k-mer seeding to speed up mapping and the Needleman-Wunsch algorithm to accurately align extensions from k-mer seeds. Multi-mapping reads are resolved using a novel sorting scheme (ConClave scheme), ensuring an accurate selection of templates.Results: The functionality of KMA was compared with SRST2, MGmapper, BWA-MEM, Bowtie2, Minimap2 and Salmon, using both simulated data and a dataset of Escherichia coli mapped against resistance genes and core genome MLST alleles. KMA outperforms current methods with respect to both accuracy and speed, while using a comparable amount of memory.Conclusion: With KMA, it was possible map raw reads directly against redundant databases with high accuracy, speed and memory efficiency.

AB - Background: As the cost of sequencing has declined, clinical diagnostics based on next generation sequencing (NGS) have become reality. Diagnostics based on sequencing will require rapid and precise mapping against redundant databases because some of the most important determinants, such as antimicrobial resistance and core genome multilocus sequence typing (MLST) alleles, are highly similar to one another.In order to facilitate this, a novel mapping method, KMA (k-mer alignment), was designed. KMA is able to map raw reads directly against redundant databases, it also scales well for large redundant databases. KMA uses k-mer seeding to speed up mapping and the Needleman-Wunsch algorithm to accurately align extensions from k-mer seeds. Multi-mapping reads are resolved using a novel sorting scheme (ConClave scheme), ensuring an accurate selection of templates.Results: The functionality of KMA was compared with SRST2, MGmapper, BWA-MEM, Bowtie2, Minimap2 and Salmon, using both simulated data and a dataset of Escherichia coli mapped against resistance genes and core genome MLST alleles. KMA outperforms current methods with respect to both accuracy and speed, while using a comparable amount of memory.Conclusion: With KMA, it was possible map raw reads directly against redundant databases with high accuracy, speed and memory efficiency.

U2 - 10.1186/s12859-018-2336-6

DO - 10.1186/s12859-018-2336-6

M3 - Journal article

VL - 19

JO - B M C Bioinformatics

JF - B M C Bioinformatics

SN - 1471-2105

IS - 1

M1 - 307

ER -