Skip to main navigation Skip to search Skip to main content

Whole-genome prediction of bacterial pathogenic capacity on novel bacteria using protein language models with PathogenFinder2

Research output: Contribution to journalJournal articleResearchpeer-review

1 Downloads (Orbit)

Abstract

Motivation: Infectious diseases continue to be a leading cause of mortality and pose a significant global health threat. Thus, the development of tools for surveillance and early detection of emerging pathogens is needed.

Results:
 We introduce PathogenFinder2, a novel, alignment-free, taxonomy-agnostic model for predicting bacterial pathogenic capacity in humans using protein language models. It outperforms previous methods, particularly for novel taxa, and provides interpretable outputs by highlighting proteins most relevant to pathogenic potential. These insights aid the identification of virulence factors, vaccine targets, and infection-related metabolic pathways. Furthermore, we introduce the Bacterial Pathogenic Capacity Landscape, which reveals patterns linked to host condition, infection site, microbial antagonism, and environmental origin.

Availability:
 The model is freely available online at https://genepi.dk/pathogenfinder2, or as a standalone program (https://github.com/genomicepidemiology/PathogenFinder2).

Supplementary information:
Supplementary data are available at Bioinformatics online.
Original languageEnglish
Article numberbtag129
JournalBioinformatics
ISSN1367-4803
DOIs
Publication statusAccepted/In press - 2026

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Fingerprint

Dive into the research topics of 'Whole-genome prediction of bacterial pathogenic capacity on novel bacteria using protein language models with PathogenFinder2'. Together they form a unique fingerprint.

Cite this