MS-Rescue: A Computational Pipeline to Increase the Quality and Yield of Immunopeptidomics Experiments

Massimo Andreatta, Annalisa Nicastri, Xu Peng, Gemma Hancock, Lucy Dorrell, Nicola Ternette, Morten Nielsen*

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

LC–MS/MS has become the standard platform for the characterization of immunopeptidomes, the collection of peptides naturally presented by major histocompatibility complex molecules to the cell surface. The protocols and algorithms used for immunopeptidomics data analysis are based on tools developed for traditional bottom-up proteomics that address the identification of peptides generated by tryptic digestion. Such algorithms are generally not tailored to the specific requirements of MHC ligand identification and, as a consequence, immunopeptidomics datasets suffer from dismissal of informative spectral information and high false discovery rates. Here, a new pipeline for the refinement of peptide-spectrum matches (PSM) is proposed, based on the assumption that immunopeptidomes contain a limited number of recurring peptide motifs, corresponding to MHC specificities. Sequence motifs are learned directly from the individual peptidome by training a prediction model on high-confidence PSMs. The model is then applied to PSM candidates with lower confidence, and sequences that score significantly higher than random peptides are rescued as likely true ligands. The pipeline is applied to MHC class I immunopeptidomes from three different species, and it is shown that it can increase the number of identified ligands by up to 20–30%, while effectively removing false positives and products of co-precipitation. Spectral validation using synthetic peptides confirms the identity of a large proportion of rescued ligands in the experimental peptidome.

Original languageEnglish
Article number1800357
JournalProteomics
Volume19
Issue number4
Number of pages7
ISSN1615-9853
DOIs
Publication statusPublished - 2019

Keywords

  • Machine learning
  • Mass spectrometry
  • MHC
  • Peptidome
  • Sequence motifs

Cite this

Andreatta, M., Nicastri, A., Peng, X., Hancock, G., Dorrell, L., Ternette, N., & Nielsen, M. (2019). MS-Rescue: A Computational Pipeline to Increase the Quality and Yield of Immunopeptidomics Experiments. Proteomics, 19(4), [1800357]. https://doi.org/10.1002/pmic.201800357
Andreatta, Massimo ; Nicastri, Annalisa ; Peng, Xu ; Hancock, Gemma ; Dorrell, Lucy ; Ternette, Nicola ; Nielsen, Morten. / MS-Rescue: A Computational Pipeline to Increase the Quality and Yield of Immunopeptidomics Experiments. In: Proteomics. 2019 ; Vol. 19, No. 4.
@article{59ca0cc4e39b47faad0361233cfbdca3,
title = "MS-Rescue: A Computational Pipeline to Increase the Quality and Yield of Immunopeptidomics Experiments",
abstract = "LC–MS/MS has become the standard platform for the characterization of immunopeptidomes, the collection of peptides naturally presented by major histocompatibility complex molecules to the cell surface. The protocols and algorithms used for immunopeptidomics data analysis are based on tools developed for traditional bottom-up proteomics that address the identification of peptides generated by tryptic digestion. Such algorithms are generally not tailored to the specific requirements of MHC ligand identification and, as a consequence, immunopeptidomics datasets suffer from dismissal of informative spectral information and high false discovery rates. Here, a new pipeline for the refinement of peptide-spectrum matches (PSM) is proposed, based on the assumption that immunopeptidomes contain a limited number of recurring peptide motifs, corresponding to MHC specificities. Sequence motifs are learned directly from the individual peptidome by training a prediction model on high-confidence PSMs. The model is then applied to PSM candidates with lower confidence, and sequences that score significantly higher than random peptides are rescued as likely true ligands. The pipeline is applied to MHC class I immunopeptidomes from three different species, and it is shown that it can increase the number of identified ligands by up to 20–30{\%}, while effectively removing false positives and products of co-precipitation. Spectral validation using synthetic peptides confirms the identity of a large proportion of rescued ligands in the experimental peptidome.",
keywords = "Machine learning, Mass spectrometry, MHC, Peptidome, Sequence motifs",
author = "Massimo Andreatta and Annalisa Nicastri and Xu Peng and Gemma Hancock and Lucy Dorrell and Nicola Ternette and Morten Nielsen",
year = "2019",
doi = "10.1002/pmic.201800357",
language = "English",
volume = "19",
journal = "Proteomics",
issn = "1615-9853",
publisher = "Wiley - V C H Verlag GmbH & Co. KGaA",
number = "4",

}

Andreatta, M, Nicastri, A, Peng, X, Hancock, G, Dorrell, L, Ternette, N & Nielsen, M 2019, 'MS-Rescue: A Computational Pipeline to Increase the Quality and Yield of Immunopeptidomics Experiments', Proteomics, vol. 19, no. 4, 1800357. https://doi.org/10.1002/pmic.201800357

MS-Rescue: A Computational Pipeline to Increase the Quality and Yield of Immunopeptidomics Experiments. / Andreatta, Massimo; Nicastri, Annalisa; Peng, Xu; Hancock, Gemma; Dorrell, Lucy; Ternette, Nicola; Nielsen, Morten.

In: Proteomics, Vol. 19, No. 4, 1800357, 2019.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - MS-Rescue: A Computational Pipeline to Increase the Quality and Yield of Immunopeptidomics Experiments

AU - Andreatta, Massimo

AU - Nicastri, Annalisa

AU - Peng, Xu

AU - Hancock, Gemma

AU - Dorrell, Lucy

AU - Ternette, Nicola

AU - Nielsen, Morten

PY - 2019

Y1 - 2019

N2 - LC–MS/MS has become the standard platform for the characterization of immunopeptidomes, the collection of peptides naturally presented by major histocompatibility complex molecules to the cell surface. The protocols and algorithms used for immunopeptidomics data analysis are based on tools developed for traditional bottom-up proteomics that address the identification of peptides generated by tryptic digestion. Such algorithms are generally not tailored to the specific requirements of MHC ligand identification and, as a consequence, immunopeptidomics datasets suffer from dismissal of informative spectral information and high false discovery rates. Here, a new pipeline for the refinement of peptide-spectrum matches (PSM) is proposed, based on the assumption that immunopeptidomes contain a limited number of recurring peptide motifs, corresponding to MHC specificities. Sequence motifs are learned directly from the individual peptidome by training a prediction model on high-confidence PSMs. The model is then applied to PSM candidates with lower confidence, and sequences that score significantly higher than random peptides are rescued as likely true ligands. The pipeline is applied to MHC class I immunopeptidomes from three different species, and it is shown that it can increase the number of identified ligands by up to 20–30%, while effectively removing false positives and products of co-precipitation. Spectral validation using synthetic peptides confirms the identity of a large proportion of rescued ligands in the experimental peptidome.

AB - LC–MS/MS has become the standard platform for the characterization of immunopeptidomes, the collection of peptides naturally presented by major histocompatibility complex molecules to the cell surface. The protocols and algorithms used for immunopeptidomics data analysis are based on tools developed for traditional bottom-up proteomics that address the identification of peptides generated by tryptic digestion. Such algorithms are generally not tailored to the specific requirements of MHC ligand identification and, as a consequence, immunopeptidomics datasets suffer from dismissal of informative spectral information and high false discovery rates. Here, a new pipeline for the refinement of peptide-spectrum matches (PSM) is proposed, based on the assumption that immunopeptidomes contain a limited number of recurring peptide motifs, corresponding to MHC specificities. Sequence motifs are learned directly from the individual peptidome by training a prediction model on high-confidence PSMs. The model is then applied to PSM candidates with lower confidence, and sequences that score significantly higher than random peptides are rescued as likely true ligands. The pipeline is applied to MHC class I immunopeptidomes from three different species, and it is shown that it can increase the number of identified ligands by up to 20–30%, while effectively removing false positives and products of co-precipitation. Spectral validation using synthetic peptides confirms the identity of a large proportion of rescued ligands in the experimental peptidome.

KW - Machine learning

KW - Mass spectrometry

KW - MHC

KW - Peptidome

KW - Sequence motifs

U2 - 10.1002/pmic.201800357

DO - 10.1002/pmic.201800357

M3 - Journal article

VL - 19

JO - Proteomics

JF - Proteomics

SN - 1615-9853

IS - 4

M1 - 1800357

ER -