Lightweight Clustering Technique for Distributed Data Mining Applications

Aouad, Lamine M.; Le-Khac, Nhien-An; Kechadi, Tahar M.

doi:10.1007/978-3-540-73435-2_10

Lightweight Clustering Technique for Distributed Data Mining Applications

Lamine M. Aouad¹,
Nhien-An Le-Khac¹ &
Tahar M. Kechadi¹

Conference paper

745 Accesses
10 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4597))

Abstract

Many parallel and distributed clustering algorithms have already been proposed. Most of them are based on the aggregation of local models according to some collected local statistics. In this paper, we propose a lightweight distributed clustering algorithm based on minimum variance increases criterion which requires a very limited communication overhead. We also introduce the notion of distributed perturbation to improve the globally generated clustering. We show that this algorithm improves the quality of the overall clustering and manage to find the real structure and number of clusters of the global dataset.

This study is part of ADMIRE [15], a distributed data mining framework designed and developed at University College Dublin, Ireland.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Calinski, R.B., Harabasz, J.: A dendrite method for cluster analysis. Communication in statistics 3 (1974)
Google Scholar
Cannataro, M., Congiusta, A., Pugliese, A., Talia, D., Trunfio, P.: Distributed Data Mining on Grids: Services, Tools, and Applications. IEEE Transaction on System, Man, and Cybernetics 34(6) (2004)
Google Scholar
Dhillon, I.S., Modha, D.: A Data-Clustering Algorithm on Distributed Memory Multiprocessors. In: Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems. SIGKDD (1999)
Google Scholar
Ester, M., Kriegel, H.-P, Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD) (1996)
Google Scholar
Garg, A., Mangla, A., Bhatnagar, V., Gupta, N.: PBIRCH: A Scalable Parallel Clustering algorithm for Incremental Data. In: IDEAS 2006. 10th International Database Engineering and Applications Symposium (2006)
Google Scholar
Geng, H., Deng, X., Ali, H.: A New Clustering Algorithm Using Message Passing and its Applications in Analyzing Microarray Data. In: ICMLA 2005. Proceedings of the Fourth International Conference on Machine Learning and Applications, pp. 145–150. IEEE Computer Society Press, Los Alamitos (2005)
Google Scholar
Ghanem, V.M., Kohler, Y.M., Sayed, A.J., Wendel, P.: Discovery Net: Towards a Grid of Knowledge Discovery. In: Eight Int. Conf. on Knowledge Discovery and Data Mining (2002)
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys (1999)
Google Scholar
Januzaj, E., Kriegel, H-P., Pfeifle, M.: Towards Effective and Efficient Distributed Clustering. In: Int. Workshop on Clustering Large Data Sets. 3rd Int. Conf. on Data Mining, ICDM (2003)
Google Scholar
Januzaj, E., Kriegel, H-P., Pfeifle, M.: DBDC: Density-Based Distributed Clustering. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, Springer, Heidelberg (2004)
Google Scholar
Januzaj, E., Kriegel, H-P., Pfeifle, M.: Scalable Density-Based Distributed Clustering. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, Springer, Heidelberg (2004)
Google Scholar
Jin, R., Goswani, A., Agrawal, G.: Fast and Exact Out-of-Core and Distributed K-Means Clustering. Knowledge and Information Systems 10 (2006)
Google Scholar
Joshi, M.N.: Parallel K-Means Algorithm on Distributed Memory Multiprocessors. Technical report, University of Minnesota (2003)
Google Scholar
Kickinger, G., Hofer, J., Brezany, P., Tjoa, A.M.: Grid Knowledge Discovery Processes and an Architecture for their Composition. Parallel and Distributed Computing and Networks (2004)
Google Scholar
Le-Khac, N-A., Kechadi, M.T., Carthy, J.: ADMIRE framework: Distributed Data Mining on Data Grid platforms. In: ICSOFT 2006. first Int. Conf. on Software and Data Technologies (2006)
Google Scholar
Ng, R.T., Han, J.: Efficient and Effective Clustering Methods for Spatial Data Mining. In: VLDB 1994. Proceedings of 20th International Conference on Very Large Data Bases, Santiago de Chile (1994)
Google Scholar
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a dataset via the Gap statistic. Technical report, Stanford University (March 2000)
Google Scholar
Veenman, C.J., Reinders, M.J., Backer, E.: A Maximum Variance Cluster Algorithm. IEEE Transactions on pattern analysis and machine intelligence 24(9) (2002)
Google Scholar
Xu, R., Wunsch, D.: Survey of Clustering Algorithms. IEEE Transactions on Neural Networks 16 (2005)
Google Scholar
Xu, X., Jager, J., Kriegel, H.-P.: A Fast Parallel Clustering Algorithm for Large Spatial Databases. Journal of Data Mining and Knowledge Discovery 3 (1999)
Google Scholar
Zhang, B., Forman, G.: Distributed Data Clustering Can be Efficient and Exact. Technical report, HP Labs (2000)
Google Scholar
Zhang, B., Hsu, M., Dayal, U.: K-Harmonic Means - A Data Clustering Algorithm. Technical report, HP Labs (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Informatics, University College Dublin, Ireland
Lamine M. Aouad, Nhien-An Le-Khac & Tahar M. Kechadi

Authors

Lamine M. Aouad
View author publications
You can also search for this author in PubMed Google Scholar
Nhien-An Le-Khac
View author publications
You can also search for this author in PubMed Google Scholar
Tahar M. Kechadi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aouad, L.M., Le-Khac, NA., Kechadi, T.M. (2007). Lightweight Clustering Technique for Distributed Data Mining Applications. In: Perner, P. (eds) Advances in Data Mining. Theoretical Aspects and Applications. ICDM 2007. Lecture Notes in Computer Science(), vol 4597. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73435-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-73435-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73434-5
Online ISBN: 978-3-540-73435-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics