TY - JOUR
T1 - InstaNovo enables diffusion-powered de novo peptide sequencing in large-scale proteomics experiments
AU - Eloff, Kevin
AU - Kalogeropoulos, Konstantinos
AU - Mabona, Amandla
AU - Morell, Oliver
AU - Catzel, Rachel
AU - Rivera-de-Torre, Esperanza
AU - Berg Jespersen, Jakob
AU - Williams, Wesley
AU - van Beljouw, Sam P.B.
AU - Skwark, Marcin J.
AU - Laustsen, Andreas Hougaard
AU - Brouns, Stan J.J.
AU - Ljungars, Anne
AU - Schoof, Erwin M.
AU - Van Goey, Jeroen
AU - auf dem Keller, Ulrich
AU - Beguir, Karim
AU - Lopez Carranza, Nicolas
AU - Jenkins, Timothy P.
PY - 2025
Y1 - 2025
N2 - Mass spectrometry-based proteomics focuses on identifying the peptide that generates a tandem mass spectrum. Traditional methods rely on protein databases but are often limited or inapplicable in certain contexts. De novo peptide sequencing, which assigns peptide sequences to spectra without prior information, is valuable for diverse biological applications; however, owing to a lack of accuracy, it remains challenging to apply. Here we introduce InstaNovo, a transformer model that translates fragment ion peaks into peptide sequences. We demonstrate that InstaNovo outperforms state-of-the-art methods and showcase its utility in several applications. We also introduce InstaNovo+, a diffusion model that improves performance through iterative refinement of predicted sequences. Using these models, we achieve improved therapeutic sequencing coverage, discover novel peptides and detect unreported organisms in diverse datasets, thereby expanding the scope and detection rate of proteomics searches. Our models unlock opportunities across domains such as direct protein sequencing, immunopeptidomics and exploration of the dark proteome.
AB - Mass spectrometry-based proteomics focuses on identifying the peptide that generates a tandem mass spectrum. Traditional methods rely on protein databases but are often limited or inapplicable in certain contexts. De novo peptide sequencing, which assigns peptide sequences to spectra without prior information, is valuable for diverse biological applications; however, owing to a lack of accuracy, it remains challenging to apply. Here we introduce InstaNovo, a transformer model that translates fragment ion peaks into peptide sequences. We demonstrate that InstaNovo outperforms state-of-the-art methods and showcase its utility in several applications. We also introduce InstaNovo+, a diffusion model that improves performance through iterative refinement of predicted sequences. Using these models, we achieve improved therapeutic sequencing coverage, discover novel peptides and detect unreported organisms in diverse datasets, thereby expanding the scope and detection rate of proteomics searches. Our models unlock opportunities across domains such as direct protein sequencing, immunopeptidomics and exploration of the dark proteome.
U2 - 10.1038/s42256-025-01019-5
DO - 10.1038/s42256-025-01019-5
M3 - Journal article
SN - 2522-5839
VL - 7
SP - 565
EP - 579
JO - Nature Machine Intelligence
JF - Nature Machine Intelligence
ER -