Abstract
Clustering of real-world datasets is a complex problem. Optimization models seeking to maximize a fitness function assume that the solution corresponding to the global optimum is the best clustering solution. Unfortunately, this is not always the case, mainly because of noise or intrinsic ambiguity in the data. In this work we present a set of tools implementing classical and novel techniques to approach clustering in a systematic way, with an application example to a complex biological dataset. The tools deal with the problem of generating multiple clustering solutions, performing cluster analysis on such clusterings (i.e. Meta Clustering) and reducing the final number of clusterings by the appropriate application of different Consensus techniques. A subsequent crossing of prior knowledge to the obtained clusters helps the user in better understanding its meaning and validates the solutions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amato, R., Ciaramella, A., Deniskina, N., et al.: A Multi-Step Approach to Time Series Analysis and Gene Expression Clustering. Bioinformatics 22(5), 589–596 (1995)
Barthélemy, J.P., Leclerc, B.: The median procedure for partitions. In: Cox, I.J., Hansen, P., Julesz, B. (eds.) Partitioning Data Sets, American Mathematical Society, Providence, RI, pp. 3–34 (1995)
Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, vol. 7, pp. 6–17 (2002)
Bertolacci, M., Wirth, A.: Are approximation algorithms for consensus clustering worthwhile? In: 7th SIAM International Conference on Data Mining, pp. 437–442 (2007)
Bertoni, A., Valentini, G.: Random projections for assessing gene expression cluster stability. In: Proceedings IEEE International Joint Conference on Neural Networks, vol. 1, pp. 149–154 (2005)
Bifulco, I., Fedullo, C., Napolitano, F., Raiconi, G., Tagliaferri, R.: Robust Clustering by Aggregation and Intersection Methods. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008, Part III. LNCS, vol. 5179, pp. 732–739. Springer, Heidelberg (2008)
Bifulco, I., Murino, L., Napolitano, F., Raiconi, G., Tagliaferri, R.: Using Global Optimization to Explore Multiple Solutions of Clustering Problems. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008, Part III. LNCS, vol. 5179, pp. 724–731. Springer, Heidelberg (2008)
Bishehsari, F., Mahdavinia, M., Malekzadeh, R., Mariani-Costantini, R., Miele, G., Napolitano, F., Raiconi, G., Tagliaferri, R., Verginelli, F.: PCA based feature selection applied to the analysis of the international variation in diet. In: Masulli, F., Mitra, S., Pasi, G. (eds.) WILF 2007. LNCS, vol. 4578, pp. 551–556. Springer, Heidelberg (2007)
Brachetti, P., De Felice Ciccoli, M., Di Pillo, G., Lucidi, S.: A new version of the Price’s algorithm for global optimization. Journal of Global Optimization 10, 165–184 (1997)
Bresco, M., Raiconi, G., Barone, F., De Rosa, R., Milano, L.: Genetic approach helps to speed classical Price algorithm for global optimization. Soft Computing Journal 9, 525–535 (2005)
Nguyen, N., Caruana, R.: Consensus Clustering. In: Perner, P. (ed.) ICDM 2007. LNCS, vol. 4597, pp. 607–612. Springer, Heidelberg (2007)
Caruana, R., Elhawary, M., Nguyen, N., Smith, C.: Meta Clustering. In: Perner, P. (ed.) ICDM 2006. LNCS, vol. 4065, pp. 107–118. Springer, Heidelberg (2006)
Ciaramella, A., Cocozza, S., Iorio, F., Miele, G., Napolitano, F., Pinelli, M., Raiconi, G., Tagliaferri, R.: Interactive data analysis and clustering of genomic data. Neural Networks 21, 368–378 (2008)
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Trans. Knowl. Discov. Data 1 (1 article 4) (2007)
Kerr, M.K., Churchill, G.A.: Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments. PNAS 98, 8961–8965 (2001)
Kuncheva, L.I., Vetrov, D.P.: Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization. PAMI 28(11), 1798–1808 (2006)
Napolitano, F., Raiconi, G., Tagliaferri, R., Ciaramella, A., Staiano, A., Miele, A.: Clustering and visualization approaches for human cell cycle gene expression data analysis. International Journal Of Approximate Reasoning 47(1), 70–84 (2008)
Price, W.L.: Global optimization by controlled random search. Journal of Optimization Theory and Applications 55, 333–348 (1983)
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3, 583–617 (2002)
Valentini, G., Ruffino, F.: Characterization Of Lung Tumor Subtypes Through Gene Expression Cluster Validity Assessment. RAIRO-Inf. Theor. Appl. 40, 163–176 (2006)
Xui, R., Wunsch, D.: Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3), 645–678 (2005)
Whitfield, M.L., Sherlock, G., Saldanha, A.J., Murray, J.I., Ball, C.A., Alexander, K.E., Matese, J.C., Perou, C.M., Hurt, M.M., Brown, P.O., Botstein, D.: Identification of Genes Periodically Expressed in the Human Cell Cycle and Their Expression in Tumors. Molecular Biology of the Cell 13, 1977–2000 (2002)
MIDA software, NeuRoNe lab, DMI, University of Salerno, http://www.neuronelab.dmi.unisa.it
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bifulco, I., Fedullo, C., Napolitano, F., Raiconi, G., Tagliaferri, R. (2009). Metaclustering and Consensus Algorithms for Interactive Data Analysis and Validation. In: Di Gesù, V., Pal, S.K., Petrosino, A. (eds) Fuzzy Logic and Applications. WILF 2009. Lecture Notes in Computer Science(), vol 5571. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02282-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-02282-1_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02281-4
Online ISBN: 978-3-642-02282-1
eBook Packages: Computer ScienceComputer Science (R0)