A 7-mer knowledge-based potential for detecting native protein structures from decoys.

Publication: Research - peer-reviewPoster – Annual report year: 2009

View graph of relations

Abstract: The prediction of proteins 3d-structure from its sequence of amino acids, remains a key and very difficult problem in Computational Structural Biology. Often the prediction is divided into two parts: 1) a sampling of decoy structures and 2) the evaluation of an energy function that needs to assign the lowest energy to the native structure and to have a nice minimum around the native structure in order to assign detectable lower values on a set larger than the sampling density. Here we consider the second part of the problem and use only the carbon alpha atoms of the structures to allow for faster sampling methods. Background: The C-alpha atoms define a polygonal curve in 3-space which is smoothened by the method presented in [1] and is illustrated below. The geometry of a 7-mer is described by two numbers that describe how stretched and curved the smoothening of the 7-mer is. These two numbers are called length and distance excess, c.f. [2], and give one point in the length - distance excess - plane, LDE-plane. Method: Given a sequence of amino acids, we break it down to all its 7-mers and search a database of known 3d-structures for similar 7-mer sequences. For the query 7-mer we define an energy function in the LDE-plane. This energy is given by the 7-mer found and depends linearly on some design parameters. The energy function of the full query sequence, F, is then a sum over all 7-mers. For a protein P and a decoy D we ideally want F(D)-F(P)=constant.RMSD( D , P ), where 0.25 <constant <4 depends on the protein. To find the final energy, we minimize the squared errors on these equations under linear constraints, which is a trivial optimization problem. Datasets: From the CATH2.4 protein domains we keep 1.8 million 7-mer fragments. For each of 218 non-homologous proteins 400 decoys are randomly chosen from the Titan High Resolution decoy sets [3]. The decoy-native RMSDs are evenly distributed from 0.5-1 Ångström up to 2.5-3Å and there are relatively few decoys with lager RMSDs. Preliminary results: If at least 15% of the 7-mer fragments of a query protein exist in the 7-mer database, then 1) it is 97% certain that the native structure has the lowest energy 2) the energy has a nice minimum. That is, the correlation between energy and decoy-native RMSD is on the average 0.8 for decoys with RMSD <5.5Å. The energy has a tendency to flatten out for larger RMSDs. The method needs further testing, but as the required level of homology of 15% not is to one known structure but to fragments of all known structures it seems promising. [1] K. Lindorff-Larsen, P. Røgen, E. Paci, M Vendrusscolo & C.M. Dobson, Trends in Biochemical Sciences, 30(1), 13-19. 2005. [2] P. Røgen, P.W. Karlsson, Geometriae Dedicata, 134(1), 91-107, 2008.
Original languageEnglish
Publication date2009
StatePublished

Conference

ConferenceVIII European Symposium of The Protein Society
CityZurich/Switzerland
Period01/01/09 → …
Download as:
Download as PDF
Select render style:
APAAuthorCBEHarvardMLAStandardVancouverShortLong
PDF
Download as HTML
Select render style:
APAAuthorCBEHarvardMLAStandardVancouverShortLong
HTML
Download as Word
Select render style:
APAAuthorCBEHarvardMLAStandardVancouverShortLong
Word

ID: 3471441