Machine learning-based predictive modeling to identify genotypic traits associated with Salmonella enterica disease endpoints in isolates from ground chicken

Collins K. Tanui, Shraddha Karanth, Patrick M.K. Njage, Jianghong Meng, Abani K. Pradhan*

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

1 Downloads (Pure)


As the cost of genome sequencing of foodborne pathogens decreases, it has become possible to sequence a large number of isolates and evaluate those using machine learning algorithms. This study aimed to utilize machine learning algorithms to predict the disease endpoints in untagged Salmonella genome sequences isolated from ground chicken. Our models recognized genetic patterns in the test dataset based on our training dataset obtained from an extensive literature review, using a semi-supervised approach. Using known genotypes as input features, the semi-supervised random forest model showed the highest overall accuracy of 0.94 (95% confidence interval: 0.85–0.99), and a Kappa value of 0.82, and predicted 87% of the disease endpoints. The model predicted genes associated with specific disease endpoints that were associated with virulence, which could be used as features in predictive modeling endeavors in the future. Our machine learning approach would be useful in different areas of food safety, including identifying pathogen sources, predicting antibiotic resistance, and risk assessment of foodborne pathogens.
Original languageEnglish
Article number112701
Number of pages8
Publication statusPublished - 2022


  • Predictive modeling
  • Machine learning
  • Whole genome sequencing
  • Salmonella


Dive into the research topics of 'Machine learning-based predictive modeling to identify genotypic traits associated with <i>Salmonella enterica</i> disease endpoints in isolates from ground chicken'. Together they form a unique fingerprint.

Cite this