Abstract
Motivation: Clustering protein structures is an important task in structural
bioinformatics. De novo structure prediction, for example, often
involves a clustering step for nding the best prediction. Other applications
include assigning proteins to fold families and analyzing
molecular dynamics trajectories.
Results: We present Pleiades, a novel approach to clustering protein
structures with a rigorous mathematical underpinning. The method
approximates clustering based on the root mean square deviation
by rst mapping structures to Gauss integral vectors – which were
introduced by Røgen and co-workers – and subsequently performing
K-means clustering.
Conclusions: Compared to current methods, Pleiades dramatically
improves on the time needed to perform clustering, and can cluster
a signicantly larger number of structures, while providing state-ofthe-
art results. The number of low energy structures generated in a
typical folding study, which is in the order of 50,000 structures, can be
clustered within seconds to minutes.
Original language | English |
---|---|
Journal | Bioinformatics |
Volume | 28 |
Issue number | 4 |
Pages (from-to) | 510-515 |
ISSN | 1367-4803 |
DOIs | |
Publication status | Published - 2011 |