MapReduce Based Analysis of Sample Applications Using Hadoop

Ghazi, Mohd Rehan; Raghava, N. S.

doi:10.1007/978-981-13-2035-4_4

Mohd Rehan Ghazi¹² &
N. S. Raghava¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 899))

Included in the following conference series:

International Conference on Application of Computing and Communication Technologies

769 Accesses
2 Citations

Abstract

The rate of increase of structured, semi-structured and unstructured data is very high. To discover hidden information from different types of data is a big challenge. The two techniques, word frequency count and string matching, are applied on a single node and multi node cluster with an input data set. The results are analyzed and compared by varying MapReduce configuration of both. In this paper we have tested that for a MapReduce job how changing the number of mappers and reducers can significantly affect performance. Further, it is analyzed how Hadoop invokes number of mappers/reducers depending upon the input size and Hadoop Distributed File System (HDFS) block size. The outcome of research analysis for heterogeneous cluster configurations indicates the prospective of the framework, as well as of mappers and reducers that affect its performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hansen, C.A.: Optimizing Hadoop for the cluster, Institute for Computer Science. University of Troms, Norway 2012
Google Scholar
Shelly and Raghava N.S.: Iris recognition on hadoop: a biometrics system implementation on cloud computing. In: Proceedings of IEEE CCIS, pp. 482–485, 15 September 2011
Google Scholar
Benslimane, Z., Liu, Q., Hongming, Z.: Predicting hadoop parameters. In: Proceedings of the Second International Conference on Advances in Electronics and Electrical Engineering (AEEE), Seek digital library, IRED Headquarters, Santa Barbara, California USA, pp. 63–67, 6 April 2013
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI 2004: Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation, USENIX Association, Berkeley, CA, USA, vol. 51(1), pp. 107–113, 4 November 2004
Google Scholar
Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. In: Proceedings of the 19th ACM Symposium on Operating Systems Principles, Lake George, New York, USA, vol. 37(5), pp. 29–43, 10 February 2003
Google Scholar
Lam, C.: Introducing Hadoop. In: Hadoop in Action, MANNING (2011)
Google Scholar
Londhe, P.D., Kumbhar, S.S., Sul, R.S., Khadse, A.J.: Processing big data using hadoop framework. In: Proceedings of 4th SARC-IRF International Conference, New Delhi, India, pp. 72–75, 27 April 2014
Google Scholar
Ghazi, M.R., Gangodkar, D.: Hadoop, MapReduce and HDFS: a developers perspective. Procedia Comput. Sci. 48, 45–50 (2015)
Article Google Scholar
Wottrich, K., Bressoud, T.: The performance characteristics of mapreduce applications on scalable clusters. In: Proceedings of the Midstates Conference on Undergraduate Research in Computer Science and Mathematics (MCURCSM), Denison University, Granville, USA, November 2011
Google Scholar
Rao, B.T., Sridevi, N.V., Reddy, V.K., Reddy, L.S.S.: Performance issues of heterogeneous hadoop clusters in cloud computing. Global J. Comput. Sci. Technol. 11(8), 81–87 May 2011
Google Scholar
Elsayed, A., Ismail, O., El-Sharkawi, M.E.: MapReduce: state-of-the-art and research directions. Int. J. Comput. Electr. Eng. 6(1), 34–39 (2014)
Article Google Scholar
https://ntier.wordpress.com/category/distributed/computing/hadoop/

Download references

Acknowledgement

This work was made possible by the financial support of Department of Science & Technology (DST), Ministry of Science and Technology, Government of India, in terms of Research Fellowship.

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Delhi Technological University, Delhi, India
Mohd Rehan Ghazi & N. S. Raghava

Authors

Mohd Rehan Ghazi
View author publications
You can also search for this author in PubMed Google Scholar
N. S. Raghava
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Mohd Rehan Ghazi or N. S. Raghava .

Editor information

Editors and Affiliations

Ministry of Skill Development, Delhi, India
Ganesh Chandra Deka
School of Science and Technology, Nottingham Trent University, Nottingham, UK
Omprakash Kaiwartya
Department of Computer Science, Shyama Prasad Mukherji College, University of Delhi, Delhi, India
Pooja Vashisth
Department of Computer Science, North Campus, University of Delhi, Delhi, India
Priyanka Rathee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ghazi, M.R., Raghava, N.S. (2018). MapReduce Based Analysis of Sample Applications Using Hadoop. In: Deka, G., Kaiwartya, O., Vashisth, P., Rathee, P. (eds) Applications of Computing and Communication Technologies. ICACCT 2018. Communications in Computer and Information Science, vol 899. Springer, Singapore. https://doi.org/10.1007/978-981-13-2035-4_4

Download citation

DOI: https://doi.org/10.1007/978-981-13-2035-4_4
Published: 30 August 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2034-7
Online ISBN: 978-981-13-2035-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics