Skip to main content

RTSBL: Reduce Task Scheduling Based on the Load Balancing and the Data Locality in Hadoop

  • Conference paper
  • First Online:
Advances in Computing Systems and Applications (CSA 2018)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 50))

Included in the following conference series:

  • 492 Accesses

Abstract

We address load balancing and data locality problems in Hadoop. These two problems limit its performance, especially, during a reduce phase where the partitioning function assigns the keys to the reducers based on a hash function. We propose in this paper a new approach to assign the keys based on the reducers’ processing capability in order to ensure a good load balancing. In addition, our proposed approach called RTSBL takes into consideration the data locality during the partition. Our experiments prove that RTSBL achieves to up 87% improvements in the load balancing and 3\(\times \) improvements of the data locality during the reduce phase in the standard Hadoop.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amazon elastic MapReduce. http://aws.amazon.com/elasticmapreduce/. Accessed 10 Jan 2018

  2. Aster MapReduce appliance. http://www.asterdata.com/product/deployment/appliance.php. Accessed 10 Jan 2018

  3. Dedoop tool. https://dbs.uni-leipzig.de/howto_dedoop. Accessed 10 Jan 2018

  4. Pivotal greenplum database. http://gopivotal.com/pivotal-products/pivotal-data-fabric/pivotal-analytic-database. Accessed 10 Jan 2018

  5. Chen, L., Lu, W., Wang, L., Bao, E., Xing, W., Yang, Y., Yuan, V.: Optimizing MapReduce partitioner using naive bayes classifier. In: 2017 IEEE 15th International conference on Dependable, Autonomic and Secure Computing, 15th International Conference on Pervasive Intelligence & Computing, 3rd International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), pp. 812–819. IEEE (2017)

    Google Scholar 

  6. Chen, Y., Liu, Z., Wang, T., Wang, L.: Load balancing in MapReduce based on data locality. In: International Conference on Algorithms and Architectures for Parallel Processing, pp. 229–241. Springer (2014)

    Google Scholar 

  7. Grolinger, K., Hayes, M., Higashino, W.A., L’Heureux, A., Allison, D.S., Capretz, M.A.M.: Challenges for MapReduce in big data. In: 2014 IEEE World Congress on Services (SERVICES), pp. 182–189. IEEE (2014)

    Google Scholar 

  8. Gufler, B., Augsten, N., Reiser, A., Kemper, A.: Load balancing in MapReduce based on scalable cardinality estimates. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 522–533. IEEE (2012)

    Google Scholar 

  9. Hammoud, M., Sakr, M.F.: Locality-aware reduce task scheduling for MapReduce. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), pp. 570–576. IEEE (2011)

    Google Scholar 

  10. Hanif, M., Lee, C.: An efficient key partitioning scheme for heterogeneous MapReduce clusters. In: 2016 18th International Conference on Advanced Communication Technology (ICACT), pp. 364–367. IEEE (2016)

    Google Scholar 

  11. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. ACM 51, 107–113 (2008)

    Article  Google Scholar 

  12. Jain, R., Chiu, D.-M., Hawe, W.R.: A quantitative measure of fairness and discrimination for resource allocation in shared computer system. In: Eastern Research Laboratory, vol. 38, Digital Equipment Corporation Hudson, MA (1984)

    Google Scholar 

  13. Li, J., Liu, Y., Pan, J., Zhang, P., Chen, W., Wang, L.: Map-balance-reduce: an improved parallel programming model for load balancing of MapReduce. Future Gener. Comput. Syst. (2017)

    Google Scholar 

  14. Lin, J., et al.: The curse of ZIPF and limits to parallelization: a look at the stragglers problem in MapReduce. In: 7th Workshop on Large-Scale Distributed Systems for Information Retrieval, vol. 1, pp. 57–62. ACM, Boston (2009)

    Google Scholar 

  15. Liroz-Gistau, M., Akbarinia, R., Agrawal, D., Pacitti, E., Valduriez, P.: Data partitioning for minimizing transferred data in MapReduce. In: International Conference on Data Management in Cloud, Grid and P2P Systems, pp. 1–12. Springer (2013)

    Google Scholar 

  16. Mestre, D.G., Pires, C.E.S.: Improving load balancing for MapReduce-based entity matching. In: 2013 IEEE Symposium on Computers and Communications (ISCC), pp. 000618–000624. IEEE (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Khadidja Midoun , Walid-Khaled Hidouci , Malik Loudini or Djahida Belayadi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Midoun, K., Hidouci, WK., Loudini, M., Belayadi, D. (2019). RTSBL: Reduce Task Scheduling Based on the Load Balancing and the Data Locality in Hadoop. In: Demigha, O., Djamaa, B., Amamra, A. (eds) Advances in Computing Systems and Applications. CSA 2018. Lecture Notes in Networks and Systems, vol 50. Springer, Cham. https://doi.org/10.1007/978-3-319-98352-3_29

Download citation

Publish with us

Policies and ethics