Abstract
The chief motivation is to develop a framework for handling clustering of large datasets in a distributed manner. The proposal presented in this work addresses both numerical and categorical data with effective noisy information handling approach. Two basic models are developed known as primary and connected model to design the distributed approach. After forming clusters separately based on numerical and categorical features, an evolutionary approach is suggested to merge the clusters for optimization. A modification of multiple kernel-based FCM algorithm (MKFCM) Chen et al. (A multiple kernel fuzzy c-means algorithm for image segmentation 41:1263–1274, 2011) is used to implement the proposal. A comprehensive view of the designed method and algorithm is presented in this paper. Comparison of the results on few sample datasets shows the effectiveness of the proposed approach over existing one.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ji, J., Pang, W., Zhou, C., Han, X., Wang, Z.: A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data. J. Knowl. Based Syst. 30, 129–135 (2012)
Chen, L., Chen, C.L., Lu, M.: A multiple-kernel fuzzy C-means algorithm for image segmentation. IEEE Trans. Syst. Man Cybern. Part B 41(5), 1263–1274 (2011)
Inderjit, S.D., Modha, D.S.: A data-clustering algorithm on distributed memory multiprocessors. In: Proceedings of KDD Workshop High Performance Knowledge Discovery, pp. 245–260 (1999)
Jin, R., Goswami, A., Agrawal, G.: Fast and exact out-of-core and distributed K-Means clustering. J. Knowl. Inf. Syst. 10(1), 17–40 (2006)
Ji, G., Ling, X.: Ensemble learning based distributed clustering. Emerg. Technol. Knowl. Discov. Data Min. 4819, 312–321 (2007)
Beaumont, O., Bonichon, N., Duchon, P., Eyraud-Dubois, L., Larcheveque, H.: A distributed algorithm for resource clustering in large scale platforms. Principles Distrib. Syst. 5401, 564–567 (2008)
Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison Wesley, 1st edition (1989)
Flag dataset: http://archive.ics.uci.edu/ml/datasets/Fl
Adult dataset: http://archive.ics.uci.edu/ml/datasets/Adult
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer India
About this paper
Cite this paper
Swapna, C.S., Kumar, V.V., Murthy, J.V.R. (2016). A Framework for Data Clustering of Large Datasets in a Distributed Environment. In: Satapathy, S., Raju, K., Mandal, J., Bhateja, V. (eds) Proceedings of the Second International Conference on Computer and Communication Technologies. Advances in Intelligent Systems and Computing, vol 379. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2517-1_41
Download citation
DOI: https://doi.org/10.1007/978-81-322-2517-1_41
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2516-4
Online ISBN: 978-81-322-2517-1
eBook Packages: EngineeringEngineering (R0)