Abstract
We address load balancing and data locality problems in Hadoop. These two problems limit its performance, especially, during a reduce phase where the partitioning function assigns the keys to the reducers based on a hash function. We propose in this paper a new approach to assign the keys based on the reducers’ processing capability in order to ensure a good load balancing. In addition, our proposed approach called RTSBL takes into consideration the data locality during the partition. Our experiments prove that RTSBL achieves to up 87% improvements in the load balancing and 3\(\times \) improvements of the data locality during the reduce phase in the standard Hadoop.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amazon elastic MapReduce. http://aws.amazon.com/elasticmapreduce/. Accessed 10 Jan 2018
Aster MapReduce appliance. http://www.asterdata.com/product/deployment/appliance.php. Accessed 10 Jan 2018
Dedoop tool. https://dbs.uni-leipzig.de/howto_dedoop. Accessed 10 Jan 2018
Pivotal greenplum database. http://gopivotal.com/pivotal-products/pivotal-data-fabric/pivotal-analytic-database. Accessed 10 Jan 2018
Chen, L., Lu, W., Wang, L., Bao, E., Xing, W., Yang, Y., Yuan, V.: Optimizing MapReduce partitioner using naive bayes classifier. In: 2017 IEEE 15th International conference on Dependable, Autonomic and Secure Computing, 15th International Conference on Pervasive Intelligence & Computing, 3rd International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), pp. 812–819. IEEE (2017)
Chen, Y., Liu, Z., Wang, T., Wang, L.: Load balancing in MapReduce based on data locality. In: International Conference on Algorithms and Architectures for Parallel Processing, pp. 229–241. Springer (2014)
Grolinger, K., Hayes, M., Higashino, W.A., L’Heureux, A., Allison, D.S., Capretz, M.A.M.: Challenges for MapReduce in big data. In: 2014 IEEE World Congress on Services (SERVICES), pp. 182–189. IEEE (2014)
Gufler, B., Augsten, N., Reiser, A., Kemper, A.: Load balancing in MapReduce based on scalable cardinality estimates. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 522–533. IEEE (2012)
Hammoud, M., Sakr, M.F.: Locality-aware reduce task scheduling for MapReduce. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), pp. 570–576. IEEE (2011)
Hanif, M., Lee, C.: An efficient key partitioning scheme for heterogeneous MapReduce clusters. In: 2016 18th International Conference on Advanced Communication Technology (ICACT), pp. 364–367. IEEE (2016)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. ACM 51, 107–113 (2008)
Jain, R., Chiu, D.-M., Hawe, W.R.: A quantitative measure of fairness and discrimination for resource allocation in shared computer system. In: Eastern Research Laboratory, vol. 38, Digital Equipment Corporation Hudson, MA (1984)
Li, J., Liu, Y., Pan, J., Zhang, P., Chen, W., Wang, L.: Map-balance-reduce: an improved parallel programming model for load balancing of MapReduce. Future Gener. Comput. Syst. (2017)
Lin, J., et al.: The curse of ZIPF and limits to parallelization: a look at the stragglers problem in MapReduce. In: 7th Workshop on Large-Scale Distributed Systems for Information Retrieval, vol. 1, pp. 57–62. ACM, Boston (2009)
Liroz-Gistau, M., Akbarinia, R., Agrawal, D., Pacitti, E., Valduriez, P.: Data partitioning for minimizing transferred data in MapReduce. In: International Conference on Data Management in Cloud, Grid and P2P Systems, pp. 1–12. Springer (2013)
Mestre, D.G., Pires, C.E.S.: Improving load balancing for MapReduce-based entity matching. In: 2013 IEEE Symposium on Computers and Communications (ISCC), pp. 000618–000624. IEEE (2013)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Midoun, K., Hidouci, WK., Loudini, M., Belayadi, D. (2019). RTSBL: Reduce Task Scheduling Based on the Load Balancing and the Data Locality in Hadoop. In: Demigha, O., Djamaa, B., Amamra, A. (eds) Advances in Computing Systems and Applications. CSA 2018. Lecture Notes in Networks and Systems, vol 50. Springer, Cham. https://doi.org/10.1007/978-3-319-98352-3_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-98352-3_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98351-6
Online ISBN: 978-3-319-98352-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)