Skip to main content

Metaclustering and Consensus Algorithms for Interactive Data Analysis and Validation

  • Conference paper
Fuzzy Logic and Applications (WILF 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5571))

Included in the following conference series:

Abstract

Clustering of real-world datasets is a complex problem. Optimization models seeking to maximize a fitness function assume that the solution corresponding to the global optimum is the best clustering solution. Unfortunately, this is not always the case, mainly because of noise or intrinsic ambiguity in the data. In this work we present a set of tools implementing classical and novel techniques to approach clustering in a systematic way, with an application example to a complex biological dataset. The tools deal with the problem of generating multiple clustering solutions, performing cluster analysis on such clusterings (i.e. Meta Clustering) and reducing the final number of clusterings by the appropriate application of different Consensus techniques. A subsequent crossing of prior knowledge to the obtained clusters helps the user in better understanding its meaning and validates the solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amato, R., Ciaramella, A., Deniskina, N., et al.: A Multi-Step Approach to Time Series Analysis and Gene Expression Clustering. Bioinformatics 22(5), 589–596 (1995)

    Article  Google Scholar 

  2. Barthélemy, J.P., Leclerc, B.: The median procedure for partitions. In: Cox, I.J., Hansen, P., Julesz, B. (eds.) Partitioning Data Sets, American Mathematical Society, Providence, RI, pp. 3–34 (1995)

    Google Scholar 

  3. Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, vol. 7, pp. 6–17 (2002)

    Google Scholar 

  4. Bertolacci, M., Wirth, A.: Are approximation algorithms for consensus clustering worthwhile? In: 7th SIAM International Conference on Data Mining, pp. 437–442 (2007)

    Google Scholar 

  5. Bertoni, A., Valentini, G.: Random projections for assessing gene expression cluster stability. In: Proceedings IEEE International Joint Conference on Neural Networks, vol. 1, pp. 149–154 (2005)

    Google Scholar 

  6. Bifulco, I., Fedullo, C., Napolitano, F., Raiconi, G., Tagliaferri, R.: Robust Clustering by Aggregation and Intersection Methods. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008, Part III. LNCS, vol. 5179, pp. 732–739. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  7. Bifulco, I., Murino, L., Napolitano, F., Raiconi, G., Tagliaferri, R.: Using Global Optimization to Explore Multiple Solutions of Clustering Problems. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008, Part III. LNCS, vol. 5179, pp. 724–731. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  8. Bishehsari, F., Mahdavinia, M., Malekzadeh, R., Mariani-Costantini, R., Miele, G., Napolitano, F., Raiconi, G., Tagliaferri, R., Verginelli, F.: PCA based feature selection applied to the analysis of the international variation in diet. In: Masulli, F., Mitra, S., Pasi, G. (eds.) WILF 2007. LNCS, vol. 4578, pp. 551–556. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  9. Brachetti, P., De Felice Ciccoli, M., Di Pillo, G., Lucidi, S.: A new version of the Price’s algorithm for global optimization. Journal of Global Optimization 10, 165–184 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  10. Bresco, M., Raiconi, G., Barone, F., De Rosa, R., Milano, L.: Genetic approach helps to speed classical Price algorithm for global optimization. Soft Computing Journal 9, 525–535 (2005)

    Article  MATH  Google Scholar 

  11. Nguyen, N., Caruana, R.: Consensus Clustering. In: Perner, P. (ed.) ICDM 2007. LNCS, vol. 4597, pp. 607–612. Springer, Heidelberg (2007)

    Google Scholar 

  12. Caruana, R., Elhawary, M., Nguyen, N., Smith, C.: Meta Clustering. In: Perner, P. (ed.) ICDM 2006. LNCS, vol. 4065, pp. 107–118. Springer, Heidelberg (2006)

    Google Scholar 

  13. Ciaramella, A., Cocozza, S., Iorio, F., Miele, G., Napolitano, F., Pinelli, M., Raiconi, G., Tagliaferri, R.: Interactive data analysis and clustering of genomic data. Neural Networks 21, 368–378 (2008)

    Article  Google Scholar 

  14. Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Trans. Knowl. Discov. Data 1 (1 article 4) (2007)

    Google Scholar 

  15. Kerr, M.K., Churchill, G.A.: Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments. PNAS 98, 8961–8965 (2001)

    Article  MATH  Google Scholar 

  16. Kuncheva, L.I., Vetrov, D.P.: Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization. PAMI 28(11), 1798–1808 (2006)

    Article  Google Scholar 

  17. Napolitano, F., Raiconi, G., Tagliaferri, R., Ciaramella, A., Staiano, A., Miele, A.: Clustering and visualization approaches for human cell cycle gene expression data analysis. International Journal Of Approximate Reasoning 47(1), 70–84 (2008)

    Article  Google Scholar 

  18. Price, W.L.: Global optimization by controlled random search. Journal of Optimization Theory and Applications 55, 333–348 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  19. Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3, 583–617 (2002)

    MathSciNet  MATH  Google Scholar 

  20. Valentini, G., Ruffino, F.: Characterization Of Lung Tumor Subtypes Through Gene Expression Cluster Validity Assessment. RAIRO-Inf. Theor. Appl. 40, 163–176 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  21. Xui, R., Wunsch, D.: Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3), 645–678 (2005)

    Article  Google Scholar 

  22. Whitfield, M.L., Sherlock, G., Saldanha, A.J., Murray, J.I., Ball, C.A., Alexander, K.E., Matese, J.C., Perou, C.M., Hurt, M.M., Brown, P.O., Botstein, D.: Identification of Genes Periodically Expressed in the Human Cell Cycle and Their Expression in Tumors. Molecular Biology of the Cell 13, 1977–2000 (2002)

    Article  Google Scholar 

  23. MIDA software, NeuRoNe lab, DMI, University of Salerno, http://www.neuronelab.dmi.unisa.it

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bifulco, I., Fedullo, C., Napolitano, F., Raiconi, G., Tagliaferri, R. (2009). Metaclustering and Consensus Algorithms for Interactive Data Analysis and Validation. In: Di Gesù, V., Pal, S.K., Petrosino, A. (eds) Fuzzy Logic and Applications. WILF 2009. Lecture Notes in Computer Science(), vol 5571. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02282-1_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02282-1_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02281-4

  • Online ISBN: 978-3-642-02282-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics