Skip to main content

A Geometric Clustering Algorithm and Its Applications to Structural Data

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2014)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8394))

  • 2989 Accesses

Abstract

An important feature of structural data especially those from structural determination and protein-ligand docking programs is that their distribution could be both uniform and non-uniform. Traditional clustering algorithms developed specifically for non-uniformly distributed data may not be adequate for their classification. Here we present a geometric partitional algorithm that could be applied to both uniformly and non-uniformly distributed data. The algorithm is a top-down approach that recursively selects the outliers as the seeds to form new clusters until all the structures within a cluster satisfy certain requirements. The applications of the algorithm to a diverse set of data from NMR structure determination, protein-ligand docking and simulation show that it is superior to the previous clustering algorithms for the identification of the correct but minor clusters. The algorithm should be useful for the identification of correct docking poses and for speeding up an iterative process widely used in NMR structure determination.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  2. Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognition Letters 31(8), 651–666 (2010)

    Article  Google Scholar 

  3. May, A.C.W.: Toward more meaningful hierarchical classification of protein three-dimensional structures. PROTEINS 37(1), 20–29 (1999)

    Article  Google Scholar 

  4. Shao, J., Tanner, S.W., Thompson, N., Cheatham, T.E.: Clustering molecular dynamics trajectories: 1. characterizing the performance of different clustering algorithms. J. Chem. Theory Comput. 3(6), 2312–2334 (2007)

    Article  Google Scholar 

  5. Keller, B., Daura, X., van Gunsteren, W.F.: Comparing geometric and kinetic cluster algorithms for molecular simulation data. J. Chem. Phys. 132(7), 074110 (2010)

    Google Scholar 

  6. Bottegoni, G., Rocchia, W., Cavalli, A.: Application of conformational clustering in protein–ligand docking. In: Computational Drug Discovery and Design, pp. 169–186. Springer (2012)

    Google Scholar 

  7. Adzhubei, A.A., Laughton, C.A., Neidle, S.: An approach to protein homology modelling based on an ensemble of NMR structures: application to the Sox-5 HMG-box protein. Protein Engineering 8(7), 615–625 (1995)

    Article  Google Scholar 

  8. Domingues, F.S., Rahnenführer, J., Lengauer, T.: Automated clustering of ensembles of alternative models in protein structure databases. Protein Eng. Des. Sel. 17(6), 537–543 (2004)

    Article  Google Scholar 

  9. Sutcliffe, M.J.: Representing an ensemble of NMR-derived protein structures by a single structure. Protein Sci. 2(6), 936–944 (1993)

    Article  Google Scholar 

  10. Downs, G.M., Barnard, J.M.: Clustering Methods and Their Uses in Computational Chemistry, pp. 1–40. John Wiley & Sons, Inc. (2003)

    Google Scholar 

  11. Blumenthal, L.: Theory and applications of distance geometry, 2nd edn. Chelsea Publishing Company (1970)

    Google Scholar 

  12. Day, W.H.E., Edelsbrunner, H.: Efficient algorithms for agglomerative hierarchical clustering methods. Journal of Classification 1(1), 7–24 (1984)

    Article  MATH  Google Scholar 

  13. Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–136 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  14. Jones, G., Willett, P., Glen, R.C.: Molecular recognition of receptor sites using a genetic algorithm with a description of desolvation. J. Mol. Biol. 245, 43–53 (1995)

    Article  Google Scholar 

  15. Sadowski, J., Gasteiger, J., Klebe, G.: Comparison of automatic three-dimensional model builders using 639 x-ray structures. J. Chem. Inf. Comput. Sci. 34(4), 1000–1008 (1994)

    Article  Google Scholar 

  16. Baker, N.A., Sept, D., Joseph, S., Holst, M.J., McCammon, J.A.: Electrostatics of nanosystems: Application to microtubules and the ribosome. PNAS 98(18), 10037–10041 (2001)

    Article  Google Scholar 

  17. Wang, L., Mettu, R., Donald, B.R.: A polynomial-time algorithm for de novo protein backbone structure determination from NMR data. J. Comput. Biol. 13(7), 1276–1288 (2006)

    Article  MathSciNet  Google Scholar 

  18. Warren, B.L., Andrews, C.W., Capelli, A.M., Clarke, B., Lalonde, J., Lambert, M.H., Lindvall, M., Nevins, N., Semus, S.F., Senger, S., Tedesco, G., Wall, I.D., Woolven, J.M., Peishoff, C.E., Head, M.S.: A critical assessment of docking programs and scoring functions. J. Med. Chem. 49(20), 5912–5931 (2006)

    Article  Google Scholar 

  19. Landau, L.D., Lifshitz, E.M.: Statistical Physics, vol. 5. Pergamon Press, Oxford (1980)

    Google Scholar 

  20. Wang, L., Li, Y., Yan, H.: Structure-function relationships of cellular retinoic acid-binding proteins: Quantitative analysis of the ligand binding properties of the wild-type proteins and site-directed mutants. J. Biol. Chem. 272(3), 1541–1547 (1997)

    Article  Google Scholar 

  21. Kleinberg, J.: An impossibility theorem for clustering. In: Proc. 2002 Conf. Advances in Neural Information Processing Systems, Vancouver, Canada, vol. 15, pp. 463–470. International Institute of Informatics and Systemics (2002)

    Google Scholar 

  22. Gunoche, A., Hansen, P., Jaumard, B.: Efficient algorithms for divisive hierarchical clustering with the diameter criterion. Journal of Classification 8(1), 5–30 (1991)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Xu, S., Zou, S., Wang, L. (2014). A Geometric Clustering Algorithm and Its Applications to Structural Data. In: Sharan, R. (eds) Research in Computational Molecular Biology. RECOMB 2014. Lecture Notes in Computer Science(), vol 8394. Springer, Cham. https://doi.org/10.1007/978-3-319-05269-4_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-05269-4_29

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05268-7

  • Online ISBN: 978-3-319-05269-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics