TY - JOUR
T1 - Improved Prediction of MHC II Antigen Presentation through Integration and Motif Deconvolution of Mass Spectrometry MHC Eluted Ligand Data
AU - Reynisson, Birkir
AU - Barra, Carolina
AU - Kaabinejadian, Saghar
AU - Hildebrand, William H.
AU - Peters, Bjoern
AU - Nielsen, Morten
PY - 2020
Y1 - 2020
N2 - Major histocompatibility complex II (MHC II) molecules play a vital role in the onset and control of cellular immunity. In a highly selective process, MHC II presents peptides derived from exogenous antigens on the surface of antigen-presenting cells for T cell scrutiny. Understanding the rules defining this presentation holds critical insights into the regulation and potential manipulation of the cellular immune system. Here, we apply the NNAlign_MA machine learning framework to analyze and integrate large-scale eluted MHC II ligand mass spectrometry (MS) data sets to advance prediction of CD4+ epitopes. NNAlign_MA allows integration of mixed data types, handling ligands with multiple potential allele annotations, encoding of ligand context, leveraging information between data sets, and has pan-specific power allowing accurate predictions outside the set of molecules included in the training data. Applying this framework, we identified accurate binding motifs of more than 50 MHC class II molecules described by MS data, particularly expanding coverage for DP and DQ beyond that obtained using current MS motif deconvolution techniques. Furthermore, in large-scale benchmarking, the final model termed NetMHCIIpan-4.0 demonstrated improved performance beyond current state-of-the-art predictors for ligand and CD4+ T cell epitope prediction. These results suggest that NNAlign_MA and NetMHCIIpan-4.0 are powerful tools for analysis of immunopeptidome MS data, prediction of T cell epitopes, and development of personalized immunotherapies.
AB - Major histocompatibility complex II (MHC II) molecules play a vital role in the onset and control of cellular immunity. In a highly selective process, MHC II presents peptides derived from exogenous antigens on the surface of antigen-presenting cells for T cell scrutiny. Understanding the rules defining this presentation holds critical insights into the regulation and potential manipulation of the cellular immune system. Here, we apply the NNAlign_MA machine learning framework to analyze and integrate large-scale eluted MHC II ligand mass spectrometry (MS) data sets to advance prediction of CD4+ epitopes. NNAlign_MA allows integration of mixed data types, handling ligands with multiple potential allele annotations, encoding of ligand context, leveraging information between data sets, and has pan-specific power allowing accurate predictions outside the set of molecules included in the training data. Applying this framework, we identified accurate binding motifs of more than 50 MHC class II molecules described by MS data, particularly expanding coverage for DP and DQ beyond that obtained using current MS motif deconvolution techniques. Furthermore, in large-scale benchmarking, the final model termed NetMHCIIpan-4.0 demonstrated improved performance beyond current state-of-the-art predictors for ligand and CD4+ T cell epitope prediction. These results suggest that NNAlign_MA and NetMHCIIpan-4.0 are powerful tools for analysis of immunopeptidome MS data, prediction of T cell epitopes, and development of personalized immunotherapies.
KW - Machine learning
KW - Bioinformatics
KW - Immunoinformatics
KW - Immunology
KW - MHC II
KW - Antigen presentation
KW - Mass spectrometry
KW - Immunopeptidomics
KW - Neoepitopes
U2 - 10.1021/acs.jproteome.9b00874
DO - 10.1021/acs.jproteome.9b00874
M3 - Journal article
C2 - 32308001
SN - 1535-3893
VL - 19
SP - 2304
EP - 2315
JO - Journal of Proteome Research
JF - Journal of Proteome Research
IS - 6
ER -