Advertisement

RTSBL: Reduce Task Scheduling Based on the Load Balancing and the Data Locality in Hadoop

  • Khadidja Midoun
  • Walid-Khaled Hidouci
  • Malik Loudini
  • Djahida Belayadi
Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 50)

Abstract

We address load balancing and data locality problems in Hadoop. These two problems limit its performance, especially, during a reduce phase where the partitioning function assigns the keys to the reducers based on a hash function. We propose in this paper a new approach to assign the keys based on the reducers’ processing capability in order to ensure a good load balancing. In addition, our proposed approach called RTSBL takes into consideration the data locality during the partition. Our experiments prove that RTSBL achieves to up 87% improvements in the load balancing and 3\(\times \) improvements of the data locality during the reduce phase in the standard Hadoop.

Keywords

MapReduce Hadoop Load balancing Data locality Reduce task scheduling 

References

  1. 1.
    Amazon elastic MapReduce. http://aws.amazon.com/elasticmapreduce/. Accessed 10 Jan 2018
  2. 2.
    Aster MapReduce appliance. http://www.asterdata.com/product/deployment/appliance.php. Accessed 10 Jan 2018
  3. 3.
    Dedoop tool. https://dbs.uni-leipzig.de/howto_dedoop. Accessed 10 Jan 2018
  4. 4.
  5. 5.
    Chen, L., Lu, W., Wang, L., Bao, E., Xing, W., Yang, Y., Yuan, V.: Optimizing MapReduce partitioner using naive bayes classifier. In: 2017 IEEE 15th International conference on Dependable, Autonomic and Secure Computing, 15th International Conference on Pervasive Intelligence & Computing, 3rd International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), pp. 812–819. IEEE (2017)Google Scholar
  6. 6.
    Chen, Y., Liu, Z., Wang, T., Wang, L.: Load balancing in MapReduce based on data locality. In: International Conference on Algorithms and Architectures for Parallel Processing, pp. 229–241. Springer (2014)Google Scholar
  7. 7.
    Grolinger, K., Hayes, M., Higashino, W.A., L’Heureux, A., Allison, D.S., Capretz, M.A.M.: Challenges for MapReduce in big data. In: 2014 IEEE World Congress on Services (SERVICES), pp. 182–189. IEEE (2014)Google Scholar
  8. 8.
    Gufler, B., Augsten, N., Reiser, A., Kemper, A.: Load balancing in MapReduce based on scalable cardinality estimates. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 522–533. IEEE (2012)Google Scholar
  9. 9.
    Hammoud, M., Sakr, M.F.: Locality-aware reduce task scheduling for MapReduce. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), pp. 570–576. IEEE (2011)Google Scholar
  10. 10.
    Hanif, M., Lee, C.: An efficient key partitioning scheme for heterogeneous MapReduce clusters. In: 2016 18th International Conference on Advanced Communication Technology (ICACT), pp. 364–367. IEEE (2016)Google Scholar
  11. 11.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. ACM 51, 107–113 (2008)CrossRefGoogle Scholar
  12. 12.
    Jain, R., Chiu, D.-M., Hawe, W.R.: A quantitative measure of fairness and discrimination for resource allocation in shared computer system. In: Eastern Research Laboratory, vol. 38, Digital Equipment Corporation Hudson, MA (1984)Google Scholar
  13. 13.
    Li, J., Liu, Y., Pan, J., Zhang, P., Chen, W., Wang, L.: Map-balance-reduce: an improved parallel programming model for load balancing of MapReduce. Future Gener. Comput. Syst. (2017)Google Scholar
  14. 14.
    Lin, J., et al.: The curse of ZIPF and limits to parallelization: a look at the stragglers problem in MapReduce. In: 7th Workshop on Large-Scale Distributed Systems for Information Retrieval, vol. 1, pp. 57–62. ACM, Boston (2009)Google Scholar
  15. 15.
    Liroz-Gistau, M., Akbarinia, R., Agrawal, D., Pacitti, E., Valduriez, P.: Data partitioning for minimizing transferred data in MapReduce. In: International Conference on Data Management in Cloud, Grid and P2P Systems, pp. 1–12. Springer (2013)Google Scholar
  16. 16.
    Mestre, D.G., Pires, C.E.S.: Improving load balancing for MapReduce-based entity matching. In: 2013 IEEE Symposium on Computers and Communications (ISCC), pp. 000618–000624. IEEE (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Communication in Computer Systems LaboratoryNational High School of Computer ScienceOued-SmarAlgeria

Personalised recommendations