MapReduce Based Analysis of Sample Applications Using Hadoop
The rate of increase of structured, semi-structured and unstructured data is very high. To discover hidden information from different types of data is a big challenge. The two techniques, word frequency count and string matching, are applied on a single node and multi node cluster with an input data set. The results are analyzed and compared by varying MapReduce configuration of both. In this paper we have tested that for a MapReduce job how changing the number of mappers and reducers can significantly affect performance. Further, it is analyzed how Hadoop invokes number of mappers/reducers depending upon the input size and Hadoop Distributed File System (HDFS) block size. The outcome of research analysis for heterogeneous cluster configurations indicates the prospective of the framework, as well as of mappers and reducers that affect its performance.
KeywordsBig data Cloud computing Hadoop HDFS MapReduce
This work was made possible by the financial support of Department of Science & Technology (DST), Ministry of Science and Technology, Government of India, in terms of Research Fellowship.
- 1.Hansen, C.A.: Optimizing Hadoop for the cluster, Institute for Computer Science. University of Troms, Norway 2012Google Scholar
- 2.Shelly and Raghava N.S.: Iris recognition on hadoop: a biometrics system implementation on cloud computing. In: Proceedings of IEEE CCIS, pp. 482–485, 15 September 2011Google Scholar
- 3.Benslimane, Z., Liu, Q., Hongming, Z.: Predicting hadoop parameters. In: Proceedings of the Second International Conference on Advances in Electronics and Electrical Engineering (AEEE), Seek digital library, IRED Headquarters, Santa Barbara, California USA, pp. 63–67, 6 April 2013Google Scholar
- 4.Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI 2004: Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation, USENIX Association, Berkeley, CA, USA, vol. 51(1), pp. 107–113, 4 November 2004Google Scholar
- 5.Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. In: Proceedings of the 19th ACM Symposium on Operating Systems Principles, Lake George, New York, USA, vol. 37(5), pp. 29–43, 10 February 2003Google Scholar
- 6.Lam, C.: Introducing Hadoop. In: Hadoop in Action, MANNING (2011)Google Scholar
- 7.Londhe, P.D., Kumbhar, S.S., Sul, R.S., Khadse, A.J.: Processing big data using hadoop framework. In: Proceedings of 4th SARC-IRF International Conference, New Delhi, India, pp. 72–75, 27 April 2014Google Scholar
- 9.Wottrich, K., Bressoud, T.: The performance characteristics of mapreduce applications on scalable clusters. In: Proceedings of the Midstates Conference on Undergraduate Research in Computer Science and Mathematics (MCURCSM), Denison University, Granville, USA, November 2011Google Scholar
- 10.Rao, B.T., Sridevi, N.V., Reddy, V.K., Reddy, L.S.S.: Performance issues of heterogeneous hadoop clusters in cloud computing. Global J. Comput. Sci. Technol. 11(8), 81–87 May 2011Google Scholar