Abstract
An important feature of structural data especially those from structural determination and protein-ligand docking programs is that their distribution could be both uniform and non-uniform. Traditional clustering algorithms developed specifically for non-uniformly distributed data may not be adequate for their classification. Here we present a geometric partitional algorithm that could be applied to both uniformly and non-uniformly distributed data. The algorithm is a top-down approach that recursively selects the outliers as the seeds to form new clusters until all the structures within a cluster satisfy certain requirements. The applications of the algorithm to a diverse set of data from NMR structure determination, protein-ligand docking and simulation show that it is superior to the previous clustering algorithms for the identification of the correct but minor clusters. The algorithm should be useful for the identification of correct docking poses and for speeding up an iterative process widely used in NMR structure determination.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognition Letters 31(8), 651–666 (2010)
May, A.C.W.: Toward more meaningful hierarchical classification of protein three-dimensional structures. PROTEINS 37(1), 20–29 (1999)
Shao, J., Tanner, S.W., Thompson, N., Cheatham, T.E.: Clustering molecular dynamics trajectories: 1. characterizing the performance of different clustering algorithms. J. Chem. Theory Comput. 3(6), 2312–2334 (2007)
Keller, B., Daura, X., van Gunsteren, W.F.: Comparing geometric and kinetic cluster algorithms for molecular simulation data. J. Chem. Phys. 132(7), 074110 (2010)
Bottegoni, G., Rocchia, W., Cavalli, A.: Application of conformational clustering in protein–ligand docking. In: Computational Drug Discovery and Design, pp. 169–186. Springer (2012)
Adzhubei, A.A., Laughton, C.A., Neidle, S.: An approach to protein homology modelling based on an ensemble of NMR structures: application to the Sox-5 HMG-box protein. Protein Engineering 8(7), 615–625 (1995)
Domingues, F.S., Rahnenführer, J., Lengauer, T.: Automated clustering of ensembles of alternative models in protein structure databases. Protein Eng. Des. Sel. 17(6), 537–543 (2004)
Sutcliffe, M.J.: Representing an ensemble of NMR-derived protein structures by a single structure. Protein Sci. 2(6), 936–944 (1993)
Downs, G.M., Barnard, J.M.: Clustering Methods and Their Uses in Computational Chemistry, pp. 1–40. John Wiley & Sons, Inc. (2003)
Blumenthal, L.: Theory and applications of distance geometry, 2nd edn. Chelsea Publishing Company (1970)
Day, W.H.E., Edelsbrunner, H.: Efficient algorithms for agglomerative hierarchical clustering methods. Journal of Classification 1(1), 7–24 (1984)
Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–136 (1982)
Jones, G., Willett, P., Glen, R.C.: Molecular recognition of receptor sites using a genetic algorithm with a description of desolvation. J. Mol. Biol. 245, 43–53 (1995)
Sadowski, J., Gasteiger, J., Klebe, G.: Comparison of automatic three-dimensional model builders using 639 x-ray structures. J. Chem. Inf. Comput. Sci. 34(4), 1000–1008 (1994)
Baker, N.A., Sept, D., Joseph, S., Holst, M.J., McCammon, J.A.: Electrostatics of nanosystems: Application to microtubules and the ribosome. PNAS 98(18), 10037–10041 (2001)
Wang, L., Mettu, R., Donald, B.R.: A polynomial-time algorithm for de novo protein backbone structure determination from NMR data. J. Comput. Biol. 13(7), 1276–1288 (2006)
Warren, B.L., Andrews, C.W., Capelli, A.M., Clarke, B., Lalonde, J., Lambert, M.H., Lindvall, M., Nevins, N., Semus, S.F., Senger, S., Tedesco, G., Wall, I.D., Woolven, J.M., Peishoff, C.E., Head, M.S.: A critical assessment of docking programs and scoring functions. J. Med. Chem. 49(20), 5912–5931 (2006)
Landau, L.D., Lifshitz, E.M.: Statistical Physics, vol. 5. Pergamon Press, Oxford (1980)
Wang, L., Li, Y., Yan, H.: Structure-function relationships of cellular retinoic acid-binding proteins: Quantitative analysis of the ligand binding properties of the wild-type proteins and site-directed mutants. J. Biol. Chem. 272(3), 1541–1547 (1997)
Kleinberg, J.: An impossibility theorem for clustering. In: Proc. 2002 Conf. Advances in Neural Information Processing Systems, Vancouver, Canada, vol. 15, pp. 463–470. International Institute of Informatics and Systemics (2002)
Gunoche, A., Hansen, P., Jaumard, B.: Efficient algorithms for divisive hierarchical clustering with the diameter criterion. Journal of Classification 8(1), 5–30 (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Xu, S., Zou, S., Wang, L. (2014). A Geometric Clustering Algorithm and Its Applications to Structural Data. In: Sharan, R. (eds) Research in Computational Molecular Biology. RECOMB 2014. Lecture Notes in Computer Science(), vol 8394. Springer, Cham. https://doi.org/10.1007/978-3-319-05269-4_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-05269-4_29
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05268-7
Online ISBN: 978-3-319-05269-4
eBook Packages: Computer ScienceComputer Science (R0)