Abstract
With the evolution of the WWW, the computer world has become a huge wave of data, to perform a search of this data, the classical approaches of data mining are still valid, but with diminished performance. In this paper, we present a new clustering approach based on multilevel paradigm called multilevel clustering, that allows to divert the complexity of calculation and execution period of data mining on very large scale. The developed algorithm have been implemented on three public benchmarks to test the effectiveness of the multilevel clustering approach. The numerical results have been compared to those of the simple k-means algorithm. As foreseeable, the multilevel clustering outperforms clearly the basic k-means on both the execution time and success rate that reaches 100 % while increasing the number of data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Han, J., Kamber, M.: Data mining: Concepts and techniques. Morgan Kaufmann Publisher (2001)
Shih, M.-Y., Jheng, J.W., Lai, L.F.: A Two-Step Method for Clustering Mixed Categroical and Numeric Data. Tamkang Journal of Science and Engineering 13, 11–19 (2010)
He, Z., Xu, X., Deng, S.: Clustering Mixed Numeric and Categorical Data: A Cluster Ensemble Approach. CoRR abs/cs/0509011 (2005)
Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: The First Pacific- Asia Conference on Knowledge Discovery and Data Mining (1997)
Meta-Knowledge, W., Drias, H., Djenouri, Y.: Multilevel clustering of induction rules for web meta-knowledge. In: Rocha, Á., Correia, A.M., Wilson, T., Stroetmann, K.A. (eds.) Advances in Information Systems and Technologies. AISC, vol. 206, pp. 43–54. Springer, Heidelberg (2013)
Czarnul, P., Ciereszko, A., Frązak, M.: Towards efficient parallel image processing on cluster grids using GIMP. In: Bubak, M., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2004. LNCS, vol. 3037, pp. 451–458. Springer, Heidelberg (2004)
Quaresma, P., Rodrigues, I.P.: Cooperative Information Retrieval Dialogues through Clustering. In: Text,Speech and Dialogue, Part-III, pp. 415–420 (2000)
Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice Hall, Englewood Cliffs (1988)
Álvarez, M., Pan, A., Raposo, J., Bellas, F., Cacheda, F.: Using Clustering and Edit Distance Techniques for Automatic Web Data Extraction. In: Benatallah, B., Casati, F., Georgakopoulos, D., Bartolini, C., Sadiq, W., Godart, C. (eds.) WISE 2007. LNCS, vol. 4831, pp. 212–224. Springer, Heidelberg (2007)
Agarwal, P., et al.: International Journal of Engineering Science and Technology (IJEST) 3 (2011) ISSN : 8282-8289
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Chemchem, A., Drias, H. (2013). Multilevel Clustering on Very Large Scale of Web Data. In: Casillas, J., Martínez-López, F., Vicari, R., De la Prieta, F. (eds) Management Intelligent Systems. Advances in Intelligent Systems and Computing, vol 220. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00569-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-00569-0_2
Publisher Name: Springer, Heidelberg
Print ISBN: 978-3-319-00568-3
Online ISBN: 978-3-319-00569-0
eBook Packages: EngineeringEngineering (R0)