Skip to main content

Clustering Algorithms Optimizer: A Framework for Large Datasets

  • Conference paper
Bioinformatics Research and Applications (ISBRA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4463))

Included in the following conference series:

Abstract

Clustering algorithms are employed in many bioinformatics tasks, including categorization of protein sequences and analysis of gene-expression data. Although these algorithms are routinely applied, many of them suffer from the following limitations: (i) relying on predetermined parameters tuning, such as a-priori knowledge regarding the number of clusters; (ii) involving nondeterministic procedures that yield inconsistent outcomes. Thus, a framework that addresses these shortcomings is desirable. We provide a data-driven framework that includes two interrelated steps. The first one is SVD-based dimension reduction and the second is an automated tuning of the algorithm’s parameter(s). The dimension reduction step is efficiently adjusted for very large datasets. The optimal parameter setting is identified according to the internal evaluation criterion known as Bayesian Information Criterion (BIC). This framework can incorporate most clustering algorithms and improve their performance. In this study we illustrate the effectiveness of this platform by incorporating the standard K-Means and the Quantum Clustering algorithms. The implementations are applied to several gene-expression benchmarks with significant success.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  2. Sharan, R., Shamir, R.: CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis. In: ISMB’00, pp. 307–316. AAAI Press, Menlo Park (2000)

    Google Scholar 

  3. Blatt, M., Wiseman, S., Domany, E.: Superparamagnetic Clustering of Data. Physical Review Letters 76, 3251–3254 (1996)

    Article  Google Scholar 

  4. Getz, G., Levine, E., Domany, E.: Coupled two-way clustering analysis of gene microarray data. PNAS 97(22), 12079–12084 (2000)

    Article  Google Scholar 

  5. Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering Gene Expression Patterns. Journal of Computational Biology 6(3-4), 281–297 (1999)

    Article  Google Scholar 

  6. Dembele, D., Kastner, P.: Fuzzy C-means method for clustering microarray data. Bioinformatics 19(8), 973–980 (2003)

    Article  Google Scholar 

  7. Horn, D., Gottlieb, A.: Algorithm for data clustering in pattern recognition problems based on quantum mechanics. Physical Review Letters 88(1) (2002)

    Google Scholar 

  8. Horn, D., Axel, I.: Novel clustering algorithm for microarray expression data in a truncated SVD space. Bioinformatics 19(9), 1110–1115 (2003)

    Article  Google Scholar 

  9. Eisen, M.B., et al.: Cluster analysis and display of genome-wide expression patterns. PNAS 95(25), 14863–14868 (1998)

    Article  Google Scholar 

  10. Teschendorff, A.E., et al.: A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data. Bioinformatics 21(13), 3025–3033 (2005)

    Article  Google Scholar 

  11. Zhong, S., Ghosh, J.: A unified framework for model-based clustering. Journal of Machine Learning Research 4, 1001–1037 (2003)

    Article  MathSciNet  Google Scholar 

  12. Wall, M., Rechtsteiner, A., Rocha, L.: Singular Value Decomposition and Principal Component Analysis. In: Berrar, D., Dubitzky, W., Granzow, M. (eds.) A Practical Approach to Microarray Data Analysis, pp. 91–109. Kluwer Academic Publishers, Dordrecht (2003)

    Chapter  Google Scholar 

  13. Ding, C., et al.: Adaptive dimension reduction for clustering high dimensional data. In: IEEE International Conference on Data Mining 2002, pp. 107–114 (2002)

    Google Scholar 

  14. Xing, E.P., Karp, R.M.: CLIFF: clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts. Bioinformatics 17(90001), S306–315 (2001)

    Google Scholar 

  15. Plagianakos, V.P., Tasoulis, D.K., Vrahatis, M.N.: Hybrid dimension reduction approach for gene expression data classification. In: International Joint Conference on Neural Networks 2005, Post-Conference Workshop on Computational Intelligence Approaches for the Analysis of Bioinformatics (2005)

    Google Scholar 

  16. Zhong, W., et al.: Improved K-means Clustering Algorithm for Exploring Local Protein Sequence Motifs Representing Common Structural Property. In: IEEE Transactions on NanoBioscience, 255-265 (2005)

    Google Scholar 

  17. Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)

    Article  Google Scholar 

  18. Varshavsky, R., Linial, M., Horn, D.: COMPACT: A Comparative Package for Clustering Assessment. In: Chen, G., et al. (eds.) ISPA-WS 2005. LNCS, vol. 3759, pp. 159–167. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  19. Alter, O., Brown, P.O., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modeling. PNAS 97(18), 10101–10106 (2000)

    Article  Google Scholar 

  20. Landauer, T.K., Foltz, P.W., Laham, D.: Introduction to Latent Semantic Analysis. Discourse Processes 25, 259–284 (1998)

    Article  Google Scholar 

  21. Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? - Answers via Model-Based Cluster Analysis. Computer Journal 41, 578–588 (1998)

    Article  MATH  Google Scholar 

  22. Barash, D., Comaniciu, D.: Meanshift clustering for DNA microarray analysis. In: IEEE Computational Systems Bioinformatics Conference (CSB) (2004)

    Google Scholar 

  23. Alon, U., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS 96(12), 6745–6750 (1999)

    Article  Google Scholar 

  24. Golub, T.R., et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286(5439), 531–537 (1999)

    Article  Google Scholar 

  25. Spellman, P.T., et al.: Comprehensive Identification of Cell Cycle-regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization. Mol. Biol. Cell. 9(12), 3273–3297 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ion Măndoiu Alexander Zelikovsky

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Varshavsky, R., Horn, D., Linial, M. (2007). Clustering Algorithms Optimizer: A Framework for Large Datasets. In: Măndoiu, I., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2007. Lecture Notes in Computer Science(), vol 4463. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72031-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72031-7_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72030-0

  • Online ISBN: 978-3-540-72031-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics