NetSurfP-2.0: Improved prediction of protein structural features by integrated deep learning

Michael Schantz Klausen, Martin Closter Jespersen, Henrik Nielsen, Kamilla Kjærgaard Jensen, Vanessa Isabell Jurtz, Casper Kaae Sønderby, Morten Otto Alexander Sommer, Ole Winther, Morten Nielsen, Bent Petersen, Paolo Marcatili*

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

138 Downloads (Pure)

Abstract

The ability to predict local structural features of a protein from the primary sequence is of paramount importance for unravelling its function in absence of experimental structural information. Two main factors affect the utility of potential prediction tools: their accuracy must enable extraction of reliable structural information on the proteins of interest, and their runtime must be low to keep pace with sequencing data being generated at a constantly increasing speed. Here, we present NetSurfP-2.0, a novel tool that can predict the most important local structural features with unprecedented accuracy and runtime. NetSurfP-2.0 is sequence-based and uses an architecture composed of convolutional and long short-term memory neural networks trained on solved protein structures. Using a single integrated model, NetSurfP-2.0 predicts solvent accessibility, secondary structure, structural disorder, and backbone dihedral angles for each residue of the input sequences. We assessed the accuracy of NetSurfP-2.0 on several independent test datasets and found it to consistently produce state-of-the-art predictions for each of its output features. We observe a correlation of 80% between predictions and experimental data for solvent accessibility, and a precision of 85% on secondary structure 3-class predictions. In addition to improved accuracy, the processing time has been optimized to allow predicting more than 1,000 proteins in less than 2 hours, and complete proteomes in less than 1 day. This article is protected by copyright. All rights reserved.
Original languageEnglish
JournalProteins: Structure, Function, and Bioinformatics
Volume87
Issue number6
Pages (from-to)520-527
Number of pages8
ISSN0887-3585
DOIs
Publication statusPublished - 2019

Keywords

  • Deep learning
  • Disorder
  • Local structure prediction
  • Secondary structure
  • Solvent accessibility

Cite this