Abstract
B-cell epitope prediction tools are of great medical and commercial interest due to their practical applications in vaccine development and disease diagnostics. The introduction of protein language models (LM), trained on unprecedented large datasets of protein sequences and structures, tap into a powerful numeric representation that can be exploited to accurately predict local and global protein structural features from amino acid sequences only. In this paper, we present BepiPred-3.0, a sequence-based epitope prediction tool that, by exploiting LM embeddings, greatly improves the prediction accuracy for both linear and conformational epitope prediction on several independent test sets. Furthermore, by carefully selecting additional input variables and epitope residue annotation strategy, performance was further improved, thus achieving unprecedented predictive power. Our tool can predict epitopes across hundreds of sequences in minutes. It is freely available as a web server and a standalone package at https://services.healthtech.dtu.dk/ with a user-friendly interface to navigate the results. This article is protected by copyright. All rights reserved.
Original language | English |
---|---|
Article number | e4497 |
Journal | Protein Science |
Volume | 31 |
Issue number | 12 |
Number of pages | 11 |
ISSN | 0961-8368 |
DOIs | |
Publication status | Published - 2022 |
Keywords
- BepiPred-3.0
- BepiPred
- B-cell epitope prediction
- Protein language model
- Machine learning
- Deep learning
- Immunology
- B-cell epitopes
- Bioinformatics
- Immunoinformatics