Abstract
The next big thing in the IT world is Big Data. The values generated from storing and processing of Big Data cannot be analyzed using traditional computing techniques. The main aim of this paper is to design a scalable machine learning algorithm to scaleup and speedup clustering algorithm without losing its accuracy. Clustering using power iteration is fast and scalable. However, it requires matrix computation which makes the algorithm infeasible for Big Data. Moreover, power method converges slowly based on eigen vector. Hence, in this paper an investigation is done on convergence factor by applying a modified constraint that minimizes the computational cost by making the algorithm converge quickly. MapReduce parallel environment for Big Data is verified for the proposed algorithm using different sizes of datasets with different nodes in the cluster selecting speedup, scalability, and efficiency as the indicators. The performance of the proposed algorithm has been shown with respect to the execution time and the number of nodes. The results show that the proposed method is feasible and valid. It improves the overall performance and efficiency of the algorithm that can meet the needs of large scale processing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Fashanu, A., Ale, F., Agboola, O.A., Ibidaapo Obe, O.: Performance analysis of parallel computing algorithm developed for space weather simulation. Int. J. Advancements Res. Technol. 1(7), 2278–7763 (2012)
Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y., Herawan, T.: Big data clustering: a review. In: Murgante, B., Misra, S., Rocha, A.M.A.C., Torre, C., Rocha, J.G., Falcão, M.I., Taniar, D., Apduhan, B.O., Gervasi, O. (eds.) ICCSA 2014, Part V. LNCS, vol. 8583, pp. 707–720. Springer, Heidelberg (2014)
Azmoodeh, A., Hashemi, S.: To boost graph clustering based on power iteration by removing outliers. In: Herawan, T., Deris, M.M., Abawajy, J. (eds.) Proceedings of the First International Conference on Advanced Data and Information Engineering. LNEE, vol. 285, pp. 249–258. Springer, Heidelberg (2013)
Elsayed, A., Ismail, O., EiSharkawi, M.E.: MapReduce: state-of-the-art and research directions. Int. J. Comput. Electr. Eng. 6(1) (2014). doi:10.7763/IJCEE.2014.v6.789
Buzbee, B.L.: The efficiency of parallel processing. Frontiers of Supercomputing, Los Alamos Siencee Fall 7 (1983)
Fowlkes, C., Belongie, S., Chung, F., Malik, J.: Spectral grouping using the Nystrom method. IEEE Trans. Pattern Anal. Mach. Intell. 26, 214–225 (2004)
Heller, E.J., Kaplan, L., Pollaman, F.: Inflamatory dynamics for matrix Eigen value problems. PNAS, 105(22), 7631–7635 (2008). doi:10.1073/pnas.0801047105
Xue, F.: Numerical solution of eigenvalue problems with spectral transformations. Doctor of Philosophy (2009)
Alecu, F.: Performance analysis of parallel algorithms. J. Appl. Quant. Methods 2(1), 129–134 (2007)
Lin, F., Cohen, W.W.: Power iteration clustering. In: International Conference on Machine Learning, Haifa, Israel (2010)
Fahad, A., Alshatri, N., Tari, Z., Zomaya, A., Foufou, S., Bouras, A.: A survey of clustering algorithms for big data taxonomy and empirical analysis. IEEE Trans. Emerg. Top. Comput. (2014). doi:10.1109/TETC20142330519
Ninama, H.: Distributed data mining using message passing interface. Rev. Res. 2(9) (2013). ISSN 2249-894X
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. ACM Commun. 51(1), 107–113 (2008)
Lambers, J.: The eigenvalue problem: power iterations. In: MAT 610 Summer Session 2009–10
Yang, J., Li, X.: MapReduce based method for big data semantic clustering. In: Proceedings of the 2013 IEEE International Conference on Systems, Man, and Cybernetics (SMC 2013) (2013). ISBN 978-1-4799-0652/13. doi:10.1109/SMC.2013.480
Kamalraj, N., Malathi, A.: Hadoop operations management for big data clusters in telecommunication industry. Int. J. Comput. Appl. (0975-8887) 105(12), 40–44 (2014)
Shim, K.: MapReduce algorithms for big data analysis. In: Proceedings of the VLDB Endowment, VLDB Endowment 21508097/12/08, vol. 5, no. 12. (2012)
Lancos, C.: An iteration method for the solution of Eigen value problem of linear differential and integral operators. J. Res. Nat. Bur. Stand. 48, 255 (1959)
Steinbach, M., Ertöz, L., Kumar, V.: The challenges of clustering high dimensional data. In: Wille, L.T. (ed.) New Directions in Statistical Physics, Book Part IV, pp. 273–309. Springer, Heidelberg (2004). doi:10.1007/978-3-662-08968-2_16
Panju, M.: Iterative methods for computing eigenvalues and eigenvectors. The Waterloo Mathematics Review. University of Waterloo (2011). http://mathreview.waterlo.ca
Numerical methods, chapter 10.3 power method for approximating eigenvalues. www.cengage.com/resource_uploads/downloads/0618783768_138794.pdf
Gobil, P., Garg, D., Panchal, B.: A performance analysis of MapReduce applications on big data in cloud based Hadoop. In: ICICES2014, Chennai. IEEE (2014). ISSN 978-1-4799-3834-6/14
Rong, Z., Xia, D., Hang, Z.: Complex statistical analysis of big data: implementation and application of apriori and FP-growth algorithm based on MapReduce. In: 2013 IEEE 4th International Conference on Software Engineering and Service Science (ICSESS). IEEE (2013). ISSN 978-1-4673-5000-6/13. doi:10.1109/ICSESS.2013.6615467
Chen, W.Y., Song, Y., Bai, H., Lin, C., Chang, E.Y.: Parallel spectral clustering in distributed systems. IEEE Trans. Pattern Anal. Mach. Intell. 33(3), 568–586 (2011)
Yana, W., et al.: p-PIC: parallel power iteration clustering for big data. J. Parallel Distrib. Algorithm 73(3), 352–359 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Jayalatchumy, D., Thambidurai, P. (2015). To Optimize Graph Based Power Iteration for Big Data Based on MapReduce Paradigm. In: Prasath, R., Vuppala, A., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2015. Lecture Notes in Computer Science(), vol 9468. Springer, Cham. https://doi.org/10.1007/978-3-319-26832-3_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-26832-3_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26831-6
Online ISBN: 978-3-319-26832-3
eBook Packages: Computer ScienceComputer Science (R0)