Comparison of three source attribution methods applied to whole genome sequencing data of monophasic and biphasic Salmonella Typhimurium isolates from the British Isles and Denmark

Jaromir Guzinski*, Mark Arnold, Tim Whiteley, Yue Tang, Virag Patel, Jahcub Trew, Eva Litrup, Tine Hald, Richard Piers Smith, Liljana Petrovska*

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

49 Downloads (Pure)

Abstract

Methodologies for source attribution (SA) of foodborne illnesses comprise a rapidly expanding suite of techniques for estimating the most important source or sources of human infection. Recently, the increasing availability of whole genome sequencing (WGS) data for a wide range of bacterial strains has led to the development of novel SA methods. These techniques utilize the unique features of bacterial genomes adapted to different host types and hence offer increased resolution of the outputs. Comparative studies of different SA techniques reliant on WGS data are currently lacking. Here, we critically assessed and compared the outputs of three SA methods: a supervised classification random forest machine learning algorithm (RandomForest), an Accessory genes-Based Source Attribution method (AB_SA), and a Bayesian frequency matching method (Bayesian). Each technique was applied to the WGS data of a panel of 902 reservoir host and human monophasic and biphasic Salmonella enterica subsp. enterica serovar Typhimurium isolates sampled in the British Isles (BI) and Denmark from 2012 to 2016. Additionally, for RandomForest and Bayesian, we explored whether utilization of accessory genome features as model inputs improved attribution accuracy of these methods over using the core genome derived features only. Results indicated that this was the case for RandomForest, but for Bayesian the overall attribution estimates varied little regardless of the inclusion or not of the accessory genome features. All three methods attributed the vast majority of human isolates to the Pigs primary source class, which was expected given the known high relative prevalence rates in pigs, and hence routes of infection into the human population, of monophasic and biphasic S. Typhimurium in the BI and Denmark. The accuracy of AB_SA was lower than of RandomForest when attributing the primary source classes to the 120 animal test set isolates with known primary sources. A major advantage of both AB_SA and Bayesian was a much faster execution time as compared to RandomForest. Overall, the SA method comparison presented in this study describes the strengths and weaknesses of each of the three methods applied to attributing potential monophasic and biphasic S. Typhimurium animal sources to human infections that could be valuable when deciding which SA methodology would be the most applicable to foodborne disease outbreak scenarios involving monophasic and biphasic S. Typhimurium.
Original languageEnglish
Article number1393824
JournalFrontiers in Microbiology
Volume15
Number of pages14
ISSN1664-302X
DOIs
Publication statusPublished - 2024

Keywords

  • Source attribution
  • Monophasic and biphasic Salmonella Typhimurium
  • Machine learning
  • Random forest
  • Bayesian modeling
  • Accessory genes-Based Source Attribution
  • Bacterial genomics

Fingerprint

Dive into the research topics of 'Comparison of three source attribution methods applied to whole genome sequencing data of monophasic and biphasic Salmonella Typhimurium isolates from the British Isles and Denmark'. Together they form a unique fingerprint.

Cite this