Abstract
The Affinity Propagation (AP) is a clustering algorithm based on the concept of “message passing” between data points. Unlike most clustering algorithms such as k-means, the AP does not require the number of clusters to be determined or estimated before running the algorithm. There are implementation of AP on Hadoop, a distribute cloud environment, called the Map/Reduce Affinity Propagation (MRAP). But the MRAP has a limitation: it is hard to know what value of parameter “preference” can yield an optimal clustering solution. The Adaptive Affinity Propagation Clustering (AAP) algorithm was proposed to overcome this limitation to decide the preference value in AP. In this study, we propose to combine these two methods as the Adaptive Map/Reduce Affinity Propagation (AMRAP), which divides the clustering task to multiple mappers and one reducer in Hadoop, and decides suitable preference values individually for each mapper. In the experiments, we compare the clustering results of the proposed AMRAP with the original MRAP method. The experiment results support that the proposed AMRAP method outperforms the original MRAP method in terms of accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Frey, B.J., Dueck, D.: Clustering by Passing Messages Between Data Points. Science 315, 972–976 (2007)
He, Y.C., Chen, Q.C., Wang, X.L., Xu, R.F., Bai, X.H., Meng, X.J.: An adaptive affinity propagation document clustering. Informatics and Systems (INFOS), pp. 1-7 (March 2010)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
Hadoop, http://hadoop.apache.org (referenced on March 1, 2013)
Bhandarkar, M.: MapReduce programming with apache Hadoop. In: Parallel & Distributed Processing (IPDPS), pp. 19–23 (April 2010)
Maurya, M., Mahajan, S.: Performance analysis of MapReduce programs on Hadoop cluster. In: Information and Communication Technologies (WICT), pp. 505–510 (2012)
Lynch, N.A.: Distributed Algorithms. Morgan Kaufmann (1996)
Wang, K., Zhang, J., Li, D., Zhang, X., Guo, T.: Adaptive Affinity Propagation Clustering. Acta Automatica Sinica 33(12), 1242–1246 (2007)
Hung, W.C., Chu, C.Y., Wu, Y.L., Tang, C.Y.: Map/Reduce Affinity Propagation Clustering Algorithm. In: International Conference on Control, Robotics and Cybernetics (ICCRC 2014)(August 2014)
UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/ (referenced on March 1, 2013)
The Yale Face Database, http://cvc.yale.edu/projects/yalefaces/yalefaces.html (referenced on March 1, 2013)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Computational and Applied Mathematics 20, 53–65 (1987)
Dudoit, S., Fridlyand, J.: A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biology 3(7) (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Hung, WC., Liu, YC., Wu, YL., Tang, CY., Hor, MK. (2014). Adaptive Affinity Propagation Clustering in MapReduce Environment. In: Cheng, SM., Day, MY. (eds) Technologies and Applications of Artificial Intelligence. TAAI 2014. Lecture Notes in Computer Science(), vol 8916. Springer, Cham. https://doi.org/10.1007/978-3-319-13987-6_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-13987-6_20
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13986-9
Online ISBN: 978-3-319-13987-6
eBook Packages: Computer ScienceComputer Science (R0)