Multilevel Clustering on Very Large Scale of Web Data

Chemchem, Amine; Drias, Habiba

doi:10.1007/978-3-319-00569-0_2

Amine Chemchem⁵ &
Habiba Drias⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 220))

1015 Accesses

Abstract

With the evolution of the WWW, the computer world has become a huge wave of data, to perform a search of this data, the classical approaches of data mining are still valid, but with diminished performance. In this paper, we present a new clustering approach based on multilevel paradigm called multilevel clustering, that allows to divert the complexity of calculation and execution period of data mining on very large scale. The developed algorithm have been implemented on three public benchmarks to test the effectiveness of the multilevel clustering approach. The numerical results have been compared to those of the simple k-means algorithm. As foreseeable, the multilevel clustering outperforms clearly the basic k-means on both the execution time and success rate that reaches 100 % while increasing the number of data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Han, J., Kamber, M.: Data mining: Concepts and techniques. Morgan Kaufmann Publisher (2001)
Google Scholar
Shih, M.-Y., Jheng, J.W., Lai, L.F.: A Two-Step Method for Clustering Mixed Categroical and Numeric Data. Tamkang Journal of Science and Engineering 13, 11–19 (2010)
Google Scholar
He, Z., Xu, X., Deng, S.: Clustering Mixed Numeric and Categorical Data: A Cluster Ensemble Approach. CoRR abs/cs/0509011 (2005)
Google Scholar
Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: The First Pacific- Asia Conference on Knowledge Discovery and Data Mining (1997)
Google Scholar
Meta-Knowledge, W., Drias, H., Djenouri, Y.: Multilevel clustering of induction rules for web meta-knowledge. In: Rocha, Á., Correia, A.M., Wilson, T., Stroetmann, K.A. (eds.) Advances in Information Systems and Technologies. AISC, vol. 206, pp. 43–54. Springer, Heidelberg (2013)
Chapter Google Scholar
Czarnul, P., Ciereszko, A., Frązak, M.: Towards efficient parallel image processing on cluster grids using GIMP. In: Bubak, M., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2004. LNCS, vol. 3037, pp. 451–458. Springer, Heidelberg (2004)
Chapter Google Scholar
Quaresma, P., Rodrigues, I.P.: Cooperative Information Retrieval Dialogues through Clustering. In: Text,Speech and Dialogue, Part-III, pp. 415–420 (2000)
Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice Hall, Englewood Cliffs (1988)
MATH Google Scholar
Álvarez, M., Pan, A., Raposo, J., Bellas, F., Cacheda, F.: Using Clustering and Edit Distance Techniques for Automatic Web Data Extraction. In: Benatallah, B., Casati, F., Georgakopoulos, D., Bartolini, C., Sadiq, W., Godart, C. (eds.) WISE 2007. LNCS, vol. 4831, pp. 212–224. Springer, Heidelberg (2007)
Chapter Google Scholar
Agarwal, P., et al.: International Journal of Engineering Science and Technology (IJEST) 3 (2011) ISSN : 8282-8289
Google Scholar

Download references

Author information

Authors and Affiliations

USTHB-LRIA, BP 32, El Alia Bab Ezzouar, Algiers, Algeria
Amine Chemchem & Habiba Drias

Authors

Amine Chemchem
View author publications
You can also search for this author in PubMed Google Scholar
Habiba Drias
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amine Chemchem .

Editor information

Editors and Affiliations

(CITIC-UGR), Department Computer Science and, University of Granada, Granada, 18071, Spain
Jorge Casillas
, Dept. Business Administration, University of Granada, Granada, 18071, Spain
Francisco J. Martínez-López
UFRGS, Department of Computer Systems, University of Sao Paulo, Sao Paulo, 91501-970, Brazil
Rosa Vicari
, Department of Computing Science, Universidad de Salamanca, Plaza de la Merced s/n, Salamanca, 37008, Spain
Fernando De la Prieta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chemchem, A., Drias, H. (2013). Multilevel Clustering on Very Large Scale of Web Data. In: Casillas, J., Martínez-López, F., Vicari, R., De la Prieta, F. (eds) Management Intelligent Systems. Advances in Intelligent Systems and Computing, vol 220. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00569-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-00569-0_2
Publisher Name: Springer, Heidelberg
Print ISBN: 978-3-319-00568-3
Online ISBN: 978-3-319-00569-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics