Gauss-integral based representation of protein structure for predicting the fold class from the sequence

Bjørn Gilbert Nielsen, Peter Røgen, Henrik Bohr

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

A representative subset of protein chains were selected from the CATH 2.4 database [C.A. Orengo, A.D. Michie, S. Jones, D.T. Jones, M.B. Swindells, J.M. Thornton, CATH - a hierarchic classification of protein domain structures, Structure 5 (8) (1997) 1093-1108], and were used for training a feed-forward neural network in order to predict protein fold classes by using as input the dipeptide frequency matrix and as output a novel representation of the protein chains in R30 space, based on knot invariant values [P. Røgen, B. Fain, Automatic classification of protein structure by using Gauss integrals, Proceedings of the National Academy of Sciences of the United States of America 100 (1) (2003) 119-124; P. Røgen, H.G. Bohr, A new family of global protein shape descriptors, Mathematical Biosciences 182 (2) (2003) 167-181]. In the general case when excluding singletons (proteins representing a topology or a sequence homology as unique members of these sets), the success rates for the predictions were 77% for class level, 60% for architecture, and 48% for topology. The total number of fold classes that are included in the present data set (∼500) is ten times that which has been reported in earlier attempts, so this result represents an improvement on previous work (reporting on a few handpicked folds). Furthermore, distance analysis of the network outputs resulting from singletons shows that it is possible to detect novel topologies with very high confidence (∼85%), and the network can in these cases be used as a sorting mechanism that identifies sequences which might need special attention. Also, a direct measure of prediction confidence may be obtained from such distance analysis.

Original languageEnglish
JournalMathematical and Computer Modelling
Volume43
Issue number3-4
Pages (from-to)401-412
ISSN0895-7177
DOIs
Publication statusPublished - 2006

Keywords

  • Proteins fold class prediction CATH neural networks

Fingerprint

Dive into the research topics of 'Gauss-integral based representation of protein structure for predicting the fold class from the sequence'. Together they form a unique fingerprint.

Cite this