Abstract

The ability to predict local structural features of a protein from the primary sequence is of paramount importance for unravelling its function in absence of experimental structural information. Two main factors affect the utility of potential prediction tools: their accuracy must enable extraction of reliable structural information on the proteins of interest, and their runtime must be low to keep pace with sequencing data being generated at a constantly increasing speed. Here, we present NetSurfP-2.0, a novel tool that can predict the most important local structural features with unprecedented accuracy and runtime. NetSurfP-2.0 is sequence-based and uses an architecture composed of convolutional and long short-term memory neural networks trained on solved protein structures. Using a single integrated model, NetSurfP-2.0 predicts solvent accessibility, secondary structure, structural disorder, and backbone dihedral angles for each residue of the input sequences. We assessed the accuracy of NetSurfP-2.0 on several independent test datasets and found it to consistently produce state-of-the-art predictions for each of its output features. We observe a correlation of 80% between predictions and experimental data for solvent accessibility, and a precision of 85% on secondary structure 3-class predictions. In addition to improved accuracy, the processing time has been optimized to allow predicting more than 1,000 proteins in less than 2 hours, and complete proteomes in less than 1 day. This article is protected by copyright. All rights reserved.
Original languageEnglish
JournalProteins: Structure, Function, and Bioinformatics
Volume87
Issue number6
Pages (from-to)520-527
Number of pages8
ISSN0887-3585
DOIs
Publication statusPublished - 2019

Keywords

  • Deep learning
  • Disorder
  • Local structure prediction
  • Secondary structure
  • Solvent accessibility

Cite this

@article{efbba69137e94e7fac92a104755d0147,
title = "NetSurfP-2.0: Improved prediction of protein structural features by integrated deep learning",
abstract = "The ability to predict local structural features of a protein from the primary sequence is of paramount importance for unravelling its function in absence of experimental structural information. Two main factors affect the utility of potential prediction tools: their accuracy must enable extraction of reliable structural information on the proteins of interest, and their runtime must be low to keep pace with sequencing data being generated at a constantly increasing speed. Here, we present NetSurfP-2.0, a novel tool that can predict the most important local structural features with unprecedented accuracy and runtime. NetSurfP-2.0 is sequence-based and uses an architecture composed of convolutional and long short-term memory neural networks trained on solved protein structures. Using a single integrated model, NetSurfP-2.0 predicts solvent accessibility, secondary structure, structural disorder, and backbone dihedral angles for each residue of the input sequences. We assessed the accuracy of NetSurfP-2.0 on several independent test datasets and found it to consistently produce state-of-the-art predictions for each of its output features. We observe a correlation of 80{\%} between predictions and experimental data for solvent accessibility, and a precision of 85{\%} on secondary structure 3-class predictions. In addition to improved accuracy, the processing time has been optimized to allow predicting more than 1,000 proteins in less than 2 hours, and complete proteomes in less than 1 day. This article is protected by copyright. All rights reserved.",
keywords = "Deep learning, Disorder, Local structure prediction, Secondary structure, Solvent accessibility",
author = "Klausen, {Michael Schantz} and Jespersen, {Martin Closter} and Henrik Nielsen and Jensen, {Kamilla Kj{\ae}rgaard} and Jurtz, {Vanessa Isabell} and S{\o}nderby, {Casper Kaae} and Sommer, {Morten Otto Alexander} and Ole Winther and Morten Nielsen and Bent Petersen and Paolo Marcatili",
year = "2019",
doi = "10.1002/prot.25674",
language = "English",
volume = "87",
pages = "520--527",
journal = "Proteins: Structure, Function, and Bioinformatics",
issn = "0887-3585",
publisher = "JohnWiley & Sons, Inc.",
number = "6",

}

NetSurfP-2.0: Improved prediction of protein structural features by integrated deep learning. / Klausen, Michael Schantz; Jespersen, Martin Closter; Nielsen, Henrik; Jensen, Kamilla Kjærgaard; Jurtz, Vanessa Isabell; Sønderby, Casper Kaae; Sommer, Morten Otto Alexander; Winther, Ole; Nielsen, Morten; Petersen, Bent; Marcatili, Paolo.

In: Proteins: Structure, Function, and Bioinformatics, Vol. 87, No. 6, 2019, p. 520-527.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - NetSurfP-2.0: Improved prediction of protein structural features by integrated deep learning

AU - Klausen, Michael Schantz

AU - Jespersen, Martin Closter

AU - Nielsen, Henrik

AU - Jensen, Kamilla Kjærgaard

AU - Jurtz, Vanessa Isabell

AU - Sønderby, Casper Kaae

AU - Sommer, Morten Otto Alexander

AU - Winther, Ole

AU - Nielsen, Morten

AU - Petersen, Bent

AU - Marcatili, Paolo

PY - 2019

Y1 - 2019

N2 - The ability to predict local structural features of a protein from the primary sequence is of paramount importance for unravelling its function in absence of experimental structural information. Two main factors affect the utility of potential prediction tools: their accuracy must enable extraction of reliable structural information on the proteins of interest, and their runtime must be low to keep pace with sequencing data being generated at a constantly increasing speed. Here, we present NetSurfP-2.0, a novel tool that can predict the most important local structural features with unprecedented accuracy and runtime. NetSurfP-2.0 is sequence-based and uses an architecture composed of convolutional and long short-term memory neural networks trained on solved protein structures. Using a single integrated model, NetSurfP-2.0 predicts solvent accessibility, secondary structure, structural disorder, and backbone dihedral angles for each residue of the input sequences. We assessed the accuracy of NetSurfP-2.0 on several independent test datasets and found it to consistently produce state-of-the-art predictions for each of its output features. We observe a correlation of 80% between predictions and experimental data for solvent accessibility, and a precision of 85% on secondary structure 3-class predictions. In addition to improved accuracy, the processing time has been optimized to allow predicting more than 1,000 proteins in less than 2 hours, and complete proteomes in less than 1 day. This article is protected by copyright. All rights reserved.

AB - The ability to predict local structural features of a protein from the primary sequence is of paramount importance for unravelling its function in absence of experimental structural information. Two main factors affect the utility of potential prediction tools: their accuracy must enable extraction of reliable structural information on the proteins of interest, and their runtime must be low to keep pace with sequencing data being generated at a constantly increasing speed. Here, we present NetSurfP-2.0, a novel tool that can predict the most important local structural features with unprecedented accuracy and runtime. NetSurfP-2.0 is sequence-based and uses an architecture composed of convolutional and long short-term memory neural networks trained on solved protein structures. Using a single integrated model, NetSurfP-2.0 predicts solvent accessibility, secondary structure, structural disorder, and backbone dihedral angles for each residue of the input sequences. We assessed the accuracy of NetSurfP-2.0 on several independent test datasets and found it to consistently produce state-of-the-art predictions for each of its output features. We observe a correlation of 80% between predictions and experimental data for solvent accessibility, and a precision of 85% on secondary structure 3-class predictions. In addition to improved accuracy, the processing time has been optimized to allow predicting more than 1,000 proteins in less than 2 hours, and complete proteomes in less than 1 day. This article is protected by copyright. All rights reserved.

KW - Deep learning

KW - Disorder

KW - Local structure prediction

KW - Secondary structure

KW - Solvent accessibility

U2 - 10.1002/prot.25674

DO - 10.1002/prot.25674

M3 - Journal article

VL - 87

SP - 520

EP - 527

JO - Proteins: Structure, Function, and Bioinformatics

JF - Proteins: Structure, Function, and Bioinformatics

SN - 0887-3585

IS - 6

ER -