Background: The occurrence of very similar structural motifs brought about by different parts of non homologous proteins is often indicative of a common function. Indeed, relatively small local structures can mediate binding to a common partner, be it a protein, a nucleic acid, a cofactor or a substrate. While it is relatively easy to identify short amino acid or nucleotide sequence motifs in a given set of proteins or genes, and many methods do exist for this purpose, much more challenging is the identification of common local substructures, especially if they are formed by non consecutive residues in the sequence.Results: Here we describe a publicly available tool, able to identify common structural motifs shared by different non homologous proteins in an unsupervised mode. The motifs can be as short as three residues and need not to be contiguous or even present in the same order in the sequence. Users can submit a set of protein structures deemed or not to share a common function ( e. g. they bind similar ligands, or share a common epitope). The server finds and lists structural motifs composed of three or more spatially well conserved residues shared by at least three of the submitted structures. The method uses a local structural comparison algorithm to identify subsets of similar amino acids between each pair of input protein chains and a clustering procedure to group similarities shared among different structure pairs.Conclusions: FunClust is fast, completely sequence independent, and does not need an a priori knowledge of the motif to be found. The output consists of a list of aligned structural matches displayed in both tabular and graphical form. We show here examples of its usefulness by searching for the largest common structural motifs in test sets of non homologous proteins and showing that the identified motifs correspond to a known common functional feature.
Bibliographical noteThis is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
- protein structure
- structural motif
- Organisms (Organisms) - Organisms  organism common
- 04500, Mathematical biology and statistical methods
- 10515, Biophysics - Biocybernetics
- Computational Biology
- FunClust mathematical and computer techniques
- Models and Simulations