TY - JOUR
T1 - Improving Prediction of Protein Secondary Structure using Structured Neural Networks and Multiple Sequence Alignments
AU - Kamaric Riis, Søren
AU - Krogh, A.
PY - 1996
Y1 - 1996
N2 - The prediction of protein secondary structure by use of carefully structured neural networks and multiple sequence alignments has been investigated. Separate networks are used for predicting the three secondary structures α-helix, β-strand, and coil. The networks are designed using a priori knowledge of amino acid properties with respect to the secondary structure and the characteristic periodicity in α-helices. Since these single-structure networks all have less than 600 adjustable weights, overfitting is avoided. To obtain a three-state prediction of α-helix, β- strand, or coil, ensembles of single-structure networks are combined with another neural network. This method gives an overall prediction accuracy of 66.3% when using 7-fold cross-validation on a database of 126 nonhomologous globular proteins. Applying the method to multiple sequence alignments of homologous proteins increases the prediction accuracy significantly to 71.3% with corresponding Matthew's correlation coefficients C(α) = 0.59, C(β) = 0.52, and C(c) = 0.50. More than 72% of the residues in the database are predicted with an accuracy of 80%. It is shown that the network outputs can be interpreted as estimated probabilities of correct prediction, and, therefore, these numbers indicate which residues are predicted with high confidence.
AB - The prediction of protein secondary structure by use of carefully structured neural networks and multiple sequence alignments has been investigated. Separate networks are used for predicting the three secondary structures α-helix, β-strand, and coil. The networks are designed using a priori knowledge of amino acid properties with respect to the secondary structure and the characteristic periodicity in α-helices. Since these single-structure networks all have less than 600 adjustable weights, overfitting is avoided. To obtain a three-state prediction of α-helix, β- strand, or coil, ensembles of single-structure networks are combined with another neural network. This method gives an overall prediction accuracy of 66.3% when using 7-fold cross-validation on a database of 126 nonhomologous globular proteins. Applying the method to multiple sequence alignments of homologous proteins increases the prediction accuracy significantly to 71.3% with corresponding Matthew's correlation coefficients C(α) = 0.59, C(β) = 0.52, and C(c) = 0.50. More than 72% of the residues in the database are predicted with an accuracy of 80%. It is shown that the network outputs can be interpreted as estimated probabilities of correct prediction, and, therefore, these numbers indicate which residues are predicted with high confidence.
U2 - 10.1089/cmb.1996.3.163
DO - 10.1089/cmb.1996.3.163
M3 - Journal article
SN - 1066-5277
VL - 3
SP - 163
EP - 183
JO - Journal of Computational Biology
JF - Journal of Computational Biology
IS - 1
ER -