TY - JOUR
T1 - MINTyper: An outbreak-detection method for accurate and rapid SNP typing of clonal clusters with noisy long reads
AU - Hallgren, Malte Bjørn
AU - Overballe-Petersen, Søren
AU - Lund, Ole
AU - Hasman, Henrik
AU - Clausen, Philip Thomas Lanken Conradsen
PY - 2021
Y1 - 2021
N2 - For detection of clonal outbreaks in clinical settings, we present a complete pipeline that generates a SNP-distance matrix from a set of sequencing reads. Importantly, the program is able to handle a separate mix of both short reads from the Illumina sequencing platforms and long reads from Oxford Nanopore Technologies’ (ONT) platforms as input. MINTyper performs automated reference identification, alignment, alignment trimming, optional methylation masking and pairwise distance calculations. With this approach, we could rapidly and accurately cluster a set of DNA sequenced isolates, with a known epidemiological relationship to confirm the clustering. Functions were built to allow for both high-accuracy methylation-aware base-called MinION reads (hac_m Q10) and fast generated lower-quality reads (fast Q8) to be used, also in combination with Illumina data. With fast Q8 reads a higher number of base pairs were excluded from the calculated distance matrix, compared to the high-accuracy methylation-aware Q10 base-calling of ONT data. Nonetheless, when using different qualities of ONT data with corresponding input parameters, the clustering of isolates were nearly identical.
AB - For detection of clonal outbreaks in clinical settings, we present a complete pipeline that generates a SNP-distance matrix from a set of sequencing reads. Importantly, the program is able to handle a separate mix of both short reads from the Illumina sequencing platforms and long reads from Oxford Nanopore Technologies’ (ONT) platforms as input. MINTyper performs automated reference identification, alignment, alignment trimming, optional methylation masking and pairwise distance calculations. With this approach, we could rapidly and accurately cluster a set of DNA sequenced isolates, with a known epidemiological relationship to confirm the clustering. Functions were built to allow for both high-accuracy methylation-aware base-called MinION reads (hac_m Q10) and fast generated lower-quality reads (fast Q8) to be used, also in combination with Illumina data. With fast Q8 reads a higher number of base pairs were excluded from the calculated distance matrix, compared to the high-accuracy methylation-aware Q10 base-calling of ONT data. Nonetheless, when using different qualities of ONT data with corresponding input parameters, the clustering of isolates were nearly identical.
KW - ONT
KW - Bioinformatics
KW - Clustering
KW - SNP distance
U2 - 10.1093/biomethods/bpab008
DO - 10.1093/biomethods/bpab008
M3 - Journal article
C2 - 33981853
SN - 2396-8923
VL - 6
JO - Biology Methods and Protocols
JF - Biology Methods and Protocols
IS - 1
M1 - bpab008
ER -