Advertisement

COMPACT: A Comparative Package for Clustering Assessment

  • Roy Varshavsky
  • Michal Linial
  • David Horn
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3759)

Abstract

There exist numerous algorithms that cluster data-points from large-scale genomic experiments such as sequencing, gene-expression and proteomics. Such algorithms may employ distinct principles, and lead to different performance and results. The appropriate choice of a clustering method is a significant and often overlooked aspect in extracting information from large-scale datasets. Evidently, such choice may significantly influence the biological interpretation of the data. We present an easy-to-use and intuitive tool that compares some clustering methods within the same framework. The interface is named COMPACT for Comparative-Package-for-Clustering-Assessment. COMPACT first reduces the dataset’s dimensionality using the Singular Value Decomposition (SVD) method, and only then employs various clustering techniques. Besides its simplicity, and its ability to perform well on high-dimensional data, it provides visualization tools for evaluating the results. COMPACT was tested on a variety of datasets, from classical benchmarks to large-scale gene-expression experiments. COMPACT is configurable and expendable to newly added algorithms.

Keywords

Cluster Algorithm Singular Value Decomposition Yeast Cell Cycle Quantum Cluster Real Classification 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)zbMATHGoogle Scholar
  2. 2.
    Sharan, R., Maron-Katz, A., Shamir, R.: CLICK and EXPANDER: a system for clustering and visualizing gene expression data. Bioinformatics 19(14), 1787–1799 (2003)Google Scholar
  3. 3.
    Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. In: Proc. Natl. Acad. Sci., USA, vol. 95(25), pp. 14863–14868 (1998)Google Scholar
  4. 4.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  5. 5.
    Cheng, Y., Church, G.M.: Biclustering of Expression Data. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 93–103. AAAI, Menlo Park (2000)Google Scholar
  6. 6.
    Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., Futcher, B.P.T.: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9(12), 3273–3297 (1998)Google Scholar
  7. 7.
    Horn, D., Gottlieb, A.: Algorithm for data clustering in pattern recognition problems based on quantum mechanics. Phys. Rev. Lett. 88(1), 018702 (2002)Google Scholar
  8. 8.
    Yeang, C.H., Ramaswamy, S., Tamayo, P., Mukherjee, S., Rifkin, R.M., Angelo, M., Reich, M., Lander, E., Mesirov, J., Golub, T.C.H., Ramaswamy, S.: Molecular classification of multiple tumor types. Bioinformatics, 17 (Suppl. 1) S316–S322 (2001)Google Scholar
  9. 9.
    Pan, W.: A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics 18(4), 546–554 (2002)Google Scholar
  10. 10.
    Mukherjee, S., Tamayo, P., Rogers, S., Rifkin, R., Engle, A., Campbell, C., Golub, T.R., Mesirov, J.P.S.: Estimating dataset size requirements for classifying DNA microarray data. J. Comput. Biol. 10(2), 119–42 (2003)Google Scholar
  11. 11.
    Pan, W.: A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics 18(4), 546–554 (2002)Google Scholar
  12. 12.
    Alter, O., Brown, P.O., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modeling. In: Proc. Natl. Acad. Sci. USA, vol. 97, pp. 10101-10106 (2000)Google Scholar
  13. 13.
    Horn, D., Axel, I.: Novel clustering algorithm for microarray expression data in a truncated SVD space. Bioinformatics 19(9), 1110–1115 (2003)Google Scholar
  14. 14.
    Friedman, N., Linial, M., Nachman, I., Pe’er, D.: Using Bayesian networks to analyze expression data. J. Comput. Biol. 7, 601–20 (2000)Google Scholar
  15. 15.
    Sasson, O., Linial, N., Linial, M.: The metric space of proteins-comparative study of clustering algorithms. Bioinformatics, 18 (Suppl. 1) S14–S21 (2002)Google Scholar
  16. 16.
    Sasson, O., Vaaknin, A., Fleischer, H., Portugaly, E., Bilu, Y., Linial, N., Linial, M.: ProtoNet: hierarchical classification of the protein space. Nucleic Acids Res. 31(1), 348–52 (2003)Google Scholar
  17. 17.
    The Eisen Lab software page, http://rana.lbl.gov/EisenSoftware.htm
  18. 18.
    The R project for statistical computing, http://www.r-project.org/

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Roy Varshavsky
    • 1
  • Michal Linial
    • 2
  • David Horn
    • 3
  1. 1.School of Computer Science and EngineeringThe Hebrew University of JerusalemIsrael
  2. 2.Dept of Biological Chemistry, Institute of Life SciencesThe Hebrew University of JerusalemIsrael
  3. 3.School of Physics and AstronomyTel Aviv UniversityIsrael

Personalised recommendations