Abstract
Abstract: The prediction of proteins 3d-structure from its sequence of amino acids, remains a key and very difficult problem in Computational Structural Biology. Often the prediction is divided into two parts: 1) a sampling of decoy structures and 2) the evaluation of an energy function that needs to assign the lowest energy to the native structure and to have a nice minimum around the native structure in order to assign detectable lower values on a set larger than the sampling density. Here we consider the second part of the problem and use only the carbon alpha atoms of the structures to allow for faster sampling methods.
Background: The C-alpha atoms define a polygonal curve in 3-space which is smoothened by the method presented in [1] and is illustrated below. The geometry of a 7-mer is described by two numbers that describe how stretched and curved the smoothening of the 7-mer is. These two numbers are called length and distance excess, c.f. [2], and give one point in the length - distance excess - plane, LDE-plane.
Method: Given a sequence of amino acids, we break it down to all its 7-mers and search a database of known 3d-structures for similar 7-mer sequences. For the query 7-mer we define an energy function in the LDE-plane. This energy is given by the 7-mer found and depends linearly on some design parameters. The energy function of the full query sequence, F, is then a sum over all 7-mers. For a protein P and a decoy D we ideally want F(D)-F(P)=constant.RMSD( D , P ), where 0.25 <constant <4 depends on the protein. To find the final energy, we minimize the squared errors on these equations under linear constraints, which is a trivial optimization problem.
Datasets: From the CATH2.4 protein domains we keep 1.8 million 7-mer fragments. For each of 218 non-homologous proteins 400 decoys are randomly chosen from the Titan High Resolution decoy sets [3]. The decoy-native RMSDs are evenly distributed from 0.5-1 Ångström up to 2.5-3Å and there are relatively few decoys with lager RMSDs.
Preliminary results: If at least 15% of the 7-mer fragments of a query protein exist in the 7-mer database, then
1) it is 97% certain that the native structure has the lowest energy
2) the energy has a nice minimum.
That is, the correlation between energy and decoy-native RMSD is on the average 0.8 for decoys with RMSD <5.5Å. The energy has a tendency to flatten out for larger RMSDs. The method needs further testing, but as the required level of homology of 15% not is to one known structure but to fragments of all known structures it seems promising.
[1] K. Lindorff-Larsen, P. Røgen, E. Paci, M Vendrusscolo & C.M. Dobson, Trends in Biochemical Sciences, 30(1), 13-19. 2005.
[2] P. Røgen, P.W. Karlsson, Geometriae Dedicata, 134(1), 91-107, 2008.
Original language | English |
---|---|
Publication date | 2009 |
Publication status | Published - 2009 |
Event | 8th European Symposium of The Protein Society - Zurich, Switzerland Duration: 7 Jun 2009 → 11 Jun 2009 Conference number: 8 |
Conference
Conference | 8th European Symposium of The Protein Society |
---|---|
Number | 8 |
Country/Territory | Switzerland |
City | Zurich |
Period | 07/06/2009 → 11/06/2009 |