Abstract
In Hadoop MapReduce distributed file system, as the input dataset files get loaded and split to every worker, workers start to do the required computation according to user logic. This process is done in parallel using all nodes in the cluster and computes output results. However, the contention of resources between the map and reduce stages cause significant delays in execution time, especially due to the memory IO overheads. This is undesired because the task execution in the Hadoop MapReduce induces an overhead in considering redundant data in case of imprecise applications which increases the execution time. Thus, in this paper we present our approach to optimize local worker memory management mechanism to reduce the presence of null schedule slots. Efficient utilization of slots leads to reduce execution times. The local memory management mechanism adopted enables efficient parallel execution and reduced memory overheads. The approach effectively reduced the MapReduce computation time which minimizes the budget for application execution in the cloud.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Gupta, P.K.: Introduction to Analytics and Big Data/Hadoop. Implementing Information Infrastructure Summit (IIIS). Marina Mandarin, Singapore, 30 May 2013. http://issuu.com/fairfaxbm/docs/cws_jul-aug2013/17
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Kolb, L., Thor, A., Rahm, E.: Load balancing for mapreduce-based entity resolution. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 618–629 (2012)
Luo, Y., Guo, Z., Sun, Y., Plale, B., Qiu, J., Li, W.: A hierarchical framework for cross-domain MapReduce execution. In: Proceedings of ECMLS, pp. 15–22 (2011)
Zaharia, M., Konwinski, A., Joseph, A. D., Katz, R. H., Stoica, I.: Improving mapreduce performance in heterogeneous environments. In: OSDI. USENIX, pp. 29–42 (2008)
Thottethodi, M., Ahmad, F., Lee, S., Vijaykumar, T.N.: Puma: Purdue mapreduce benchmarks suite. Technical Report, Purdue University (2012)
Acknowledgment
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. NRF-2013R1A1A2013401).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Al-Absi, A.A., Kang, DK., Kim, MJ. (2016). Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications. In: Park, J., Chao, HC., Arabnia, H., Yen, N. (eds) Advanced Multimedia and Ubiquitous Engineering. Lecture Notes in Electrical Engineering, vol 354. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-47895-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-662-47895-0_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-47894-3
Online ISBN: 978-3-662-47895-0
eBook Packages: EngineeringEngineering (R0)