Development of applications using Oxford Nanopore Technology sequencing reads for epidemiological purposes

Malte Bjørn Hallgren

Research output: Book/ReportPh.D. thesis

24 Downloads (Pure)

Abstract

Oxford Nanopore Technologies’ (ONT) sequencing platforms offer the unique advantage of producing long reads, which can span entire genomic regions, providing comprehensive insights that are often difficult to achieve with short-read sequencing technologies. However, the potential of ONT sequencing is somewhat limited when used in stand-alone stations due to its relatively high error rate, which poses challenges for many downstream analyses and applications. Despite these challenges, the long reads generated by ONT have shown promise in various genomic applications, especially where structural information is critical.
This Ph.D. thesis comprises four manuscripts that delve into the technical challenges of utilizing ONT reads for various established epidemiological applications. Furthermore, it introduces three novel bioinformatics tools: MINTyper, cgPhylo,
and NanoMGT, all specifically designed to use ONT reads effectively. These tools address key issues related to variant calling, phylogenetic analysis, and microbial genotyping, facilitating the reliable use of ONT data in epidemiological studies.
All tools developed during this Ph.D. research were created under the principles of open science to provide the global epidemiological community with accessible resources that can enhance public health safety. The hope is that these contributions will help bridge the gap between ONT’s innovative technology and its practical applications in epidemiology, thereby improving the monitoring and control of infectious diseases worldwide.

In Manuscript I, we presented a novel tool, MINTyper, for rapid estimation of phylogeny using both Illumina and Oxford Nanopore Technology (ONT) sequencing data. In this study, we developed automatic whole genome reference finding, various MSA trimming, and rapid distance matrix estimation. Additionally, we demonstrated pruning strategies that allowed low-quality ONT data to be combined with high-quality Illumina data. These methods significantly enhance the utility of MINTyper for accurate phylogenetic analysis across different sequencing platforms.

In Manuscript II, we presented an iteration on the work conducted in Manuscript I by developing a different strategy for determining outbreak phylogeny. In this article,  we introduced cgPhylo, a tool that estimates single nucleotide polymorphisms (SNPs) based solely on core genes derived from cgMLST databases. cgPhylo demonstrated the ability to recreate highly similar phylogenies compared to existing technologies despite basing the distance matrices on core genes, which typically comprise only 20-30% of the microbial genome. cgPhylo was developed as a method to ensure the exclusion of genetic mobility from the analysis, which was demonstrated through the simulation of plasmid exchange by conjugation. In these simulations, cgPhylo outperformed existing tools by preventing the new plasmid from affecting the outbreak phylogeny.

In Manuscript III, we demonstrated the superior potential of adopting a thresholdbased filtering strategy when typing strain diversity in metagenomic samples of the same species. We introduced a novel tool, NanoMGT, which employs a series of penalty and reward parameters iteratively to establish individual inclusion thresholds for minor variants. Parameter values for these variables were determined using an interactive grid search of 39 isolates spanning six species where NanoMGT achieved superior performance compared to existing computational solutions.

In Manuscript IV, we investigated the effect of database completeness on the precision of alignment-based taxonomic classification of raw sequencing reads. NCBI’s RefSeq bacterial whole genome database was homology reduced using query and template homology k-mer overlaps in 1% increments, and a series of bacterial isolates and in-silico generated metagenomics samples were used to evaluate classification precision. We demonstrated that moderate levels of homology reduction could be carried out on large databases to reduce the alignment memory peak without significantly impacting classification performance.

Novel solutions for bioinformatics applications using ONT sequencing data are presented in this Ph.D. thesis. They will enable epidemiologists worldwide to understand imminent public health threats better. All source code related to the research conducted is freely available at github.com/genomicepidemiology, and the code is open for anyone to use, fork, and modify.
Original languageEnglish
Place of PublicationKgs. Lyngby
PublisherTechnical University of Denmark
Number of pages100
Publication statusPublished - 2024

Fingerprint

Dive into the research topics of 'Development of applications using Oxford Nanopore Technology sequencing reads for epidemiological purposes'. Together they form a unique fingerprint.

Cite this