Abstract
As the cost of genome sequencing of foodborne pathogens decreases, it has become possible to sequence a large number of isolates and evaluate those using machine learning algorithms. This study aimed to utilize machine learning algorithms to predict the disease endpoints in untagged Salmonella genome sequences isolated from ground chicken. Our models recognized genetic patterns in the test dataset based on our training dataset obtained from an extensive literature review, using a semi-supervised approach. Using known genotypes as input features, the semi-supervised random forest model showed the highest overall accuracy of 0.94 (95% confidence interval: 0.85–0.99), and a Kappa value of 0.82, and predicted 87% of the disease endpoints. The model predicted genes associated with specific disease endpoints that were associated with virulence, which could be used as features in predictive modeling endeavors in the future. Our machine learning approach would be useful in different areas of food safety, including identifying pathogen sources, predicting antibiotic resistance, and risk assessment of foodborne pathogens.
Original language | English |
---|---|
Article number | 112701 |
Journal | LWT |
Volume | 154 |
Number of pages | 8 |
ISSN | 0023-6438 |
DOIs | |
Publication status | Published - 2022 |
Keywords
- Predictive modeling
- Machine learning
- Whole genome sequencing
- Salmonella