RTSBL: Reduce Task Scheduling Based on the Load Balancing and the Data Locality in Hadoop

Midoun, Khadidja; Hidouci, Walid-Khaled; Loudini, Malik; Belayadi, Djahida

doi:10.1007/978-3-319-98352-3_29

Khadidja Midoun⁵,
Walid-Khaled Hidouci⁵,
Malik Loudini⁵ &
…
Djahida Belayadi⁵

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 50))

Included in the following conference series:

International Conference on Computer Science and its Applications

492 Accesses

Abstract

We address load balancing and data locality problems in Hadoop. These two problems limit its performance, especially, during a reduce phase where the partitioning function assigns the keys to the reducers based on a hash function. We propose in this paper a new approach to assign the keys based on the reducers’ processing capability in order to ensure a good load balancing. In addition, our proposed approach called RTSBL takes into consideration the data locality during the partition. Our experiments prove that RTSBL achieves to up 87% improvements in the load balancing and 3\(\times \) improvements of the data locality during the reduce phase in the standard Hadoop.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amazon elastic MapReduce. http://aws.amazon.com/elasticmapreduce/. Accessed 10 Jan 2018
Aster MapReduce appliance. http://www.asterdata.com/product/deployment/appliance.php. Accessed 10 Jan 2018
Dedoop tool. https://dbs.uni-leipzig.de/howto_dedoop. Accessed 10 Jan 2018
Pivotal greenplum database. http://gopivotal.com/pivotal-products/pivotal-data-fabric/pivotal-analytic-database. Accessed 10 Jan 2018
Chen, L., Lu, W., Wang, L., Bao, E., Xing, W., Yang, Y., Yuan, V.: Optimizing MapReduce partitioner using naive bayes classifier. In: 2017 IEEE 15th International conference on Dependable, Autonomic and Secure Computing, 15th International Conference on Pervasive Intelligence & Computing, 3rd International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), pp. 812–819. IEEE (2017)
Google Scholar
Chen, Y., Liu, Z., Wang, T., Wang, L.: Load balancing in MapReduce based on data locality. In: International Conference on Algorithms and Architectures for Parallel Processing, pp. 229–241. Springer (2014)
Google Scholar
Grolinger, K., Hayes, M., Higashino, W.A., L’Heureux, A., Allison, D.S., Capretz, M.A.M.: Challenges for MapReduce in big data. In: 2014 IEEE World Congress on Services (SERVICES), pp. 182–189. IEEE (2014)
Google Scholar
Gufler, B., Augsten, N., Reiser, A., Kemper, A.: Load balancing in MapReduce based on scalable cardinality estimates. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 522–533. IEEE (2012)
Google Scholar
Hammoud, M., Sakr, M.F.: Locality-aware reduce task scheduling for MapReduce. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), pp. 570–576. IEEE (2011)
Google Scholar
Hanif, M., Lee, C.: An efficient key partitioning scheme for heterogeneous MapReduce clusters. In: 2016 18th International Conference on Advanced Communication Technology (ICACT), pp. 364–367. IEEE (2016)
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. ACM 51, 107–113 (2008)
Article Google Scholar
Jain, R., Chiu, D.-M., Hawe, W.R.: A quantitative measure of fairness and discrimination for resource allocation in shared computer system. In: Eastern Research Laboratory, vol. 38, Digital Equipment Corporation Hudson, MA (1984)
Google Scholar
Li, J., Liu, Y., Pan, J., Zhang, P., Chen, W., Wang, L.: Map-balance-reduce: an improved parallel programming model for load balancing of MapReduce. Future Gener. Comput. Syst. (2017)
Google Scholar
Lin, J., et al.: The curse of ZIPF and limits to parallelization: a look at the stragglers problem in MapReduce. In: 7th Workshop on Large-Scale Distributed Systems for Information Retrieval, vol. 1, pp. 57–62. ACM, Boston (2009)
Google Scholar
Liroz-Gistau, M., Akbarinia, R., Agrawal, D., Pacitti, E., Valduriez, P.: Data partitioning for minimizing transferred data in MapReduce. In: International Conference on Data Management in Cloud, Grid and P2P Systems, pp. 1–12. Springer (2013)
Google Scholar
Mestre, D.G., Pires, C.E.S.: Improving load balancing for MapReduce-based entity matching. In: 2013 IEEE Symposium on Computers and Communications (ISCC), pp. 000618–000624. IEEE (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Communication in Computer Systems Laboratory, National High School of Computer Science, PO BOX 68M, 16309, Oued-Smar, Algiers, Algeria
Khadidja Midoun, Walid-Khaled Hidouci, Malik Loudini & Djahida Belayadi

Authors

Khadidja Midoun
View author publications
You can also search for this author in PubMed Google Scholar
Walid-Khaled Hidouci
View author publications
You can also search for this author in PubMed Google Scholar
Malik Loudini
View author publications
You can also search for this author in PubMed Google Scholar
Djahida Belayadi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Khadidja Midoun , Walid-Khaled Hidouci , Malik Loudini or Djahida Belayadi .

Editor information

Editors and Affiliations

Department of Computer Science, Ecole Militaire Polytechnique, Algiers, Algeria
Oualid Demigha
Department of Computer Science, Ecole Militaire Polytechnique, Algiers, Algeria
Badis Djamaa
Department of Computer Science, Ecole Militaire Polytechnique, Algiers, Algeria
Abdenour Amamra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Midoun, K., Hidouci, WK., Loudini, M., Belayadi, D. (2019). RTSBL: Reduce Task Scheduling Based on the Load Balancing and the Data Locality in Hadoop. In: Demigha, O., Djamaa, B., Amamra, A. (eds) Advances in Computing Systems and Applications. CSA 2018. Lecture Notes in Networks and Systems, vol 50. Springer, Cham. https://doi.org/10.1007/978-3-319-98352-3_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-98352-3_29
Published: 10 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98351-6
Online ISBN: 978-3-319-98352-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics