TY - JOUR
T1 - NetSurfP-2.0: Improved prediction of protein structural features by integrated deep learning
AU - Klausen, Michael Schantz
AU - Jespersen, Martin Closter
AU - Nielsen, Henrik
AU - Jensen, Kamilla Kjærgaard
AU - Jurtz, Vanessa Isabell
AU - Sønderby, Casper Kaae
AU - Sommer, Morten Otto Alexander
AU - Winther, Ole
AU - Nielsen, Morten
AU - Petersen, Bent
AU - Marcatili, Paolo
PY - 2019
Y1 - 2019
N2 - The ability to predict local structural features of a protein from the primary sequence is of paramount importance for unravelling its function in absence of experimental structural information. Two main factors affect the utility of potential prediction tools: their accuracy must enable extraction of reliable structural information on the proteins of interest, and their runtime must be low to keep pace with sequencing data being generated at a constantly increasing speed. Here, we present NetSurfP-2.0, a novel tool that can predict the most important local structural features with unprecedented accuracy and runtime. NetSurfP-2.0 is sequence-based and uses an architecture composed of convolutional and long short-term memory neural networks trained on solved protein structures. Using a single integrated model, NetSurfP-2.0 predicts solvent accessibility, secondary structure, structural disorder, and backbone dihedral angles for each residue of the input sequences. We assessed the accuracy of NetSurfP-2.0 on several independent test datasets and found it to consistently produce state-of-the-art predictions for each of its output features. We observe a correlation of 80% between predictions and experimental data for solvent accessibility, and a precision of 85% on secondary structure 3-class predictions. In addition to improved accuracy, the processing time has been optimized to allow predicting more than 1,000 proteins in less than 2 hours, and complete proteomes in less than 1 day. This article is protected by copyright. All rights reserved.
AB - The ability to predict local structural features of a protein from the primary sequence is of paramount importance for unravelling its function in absence of experimental structural information. Two main factors affect the utility of potential prediction tools: their accuracy must enable extraction of reliable structural information on the proteins of interest, and their runtime must be low to keep pace with sequencing data being generated at a constantly increasing speed. Here, we present NetSurfP-2.0, a novel tool that can predict the most important local structural features with unprecedented accuracy and runtime. NetSurfP-2.0 is sequence-based and uses an architecture composed of convolutional and long short-term memory neural networks trained on solved protein structures. Using a single integrated model, NetSurfP-2.0 predicts solvent accessibility, secondary structure, structural disorder, and backbone dihedral angles for each residue of the input sequences. We assessed the accuracy of NetSurfP-2.0 on several independent test datasets and found it to consistently produce state-of-the-art predictions for each of its output features. We observe a correlation of 80% between predictions and experimental data for solvent accessibility, and a precision of 85% on secondary structure 3-class predictions. In addition to improved accuracy, the processing time has been optimized to allow predicting more than 1,000 proteins in less than 2 hours, and complete proteomes in less than 1 day. This article is protected by copyright. All rights reserved.
KW - Deep learning
KW - Disorder
KW - Local structure prediction
KW - Secondary structure
KW - Solvent accessibility
U2 - 10.1002/prot.25674
DO - 10.1002/prot.25674
M3 - Journal article
C2 - 30785653
SN - 0887-3585
VL - 87
SP - 520
EP - 527
JO - Proteins: Structure, Function, and Bioinformatics
JF - Proteins: Structure, Function, and Bioinformatics
IS - 6
ER -