Advertisement

Hadoop Performance Acceleration by Effective Data and Job Placement

  • Ankit Shah
  • Mamta Padole
Conference paper
  • 17 Downloads
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1118)

Abstract

In order to accelerate Hadoop performance, it is important to efficiently handle the data and job placement. More specifically, we focus on to accelerate the performance of heterogeneous distributed cluster as Hadoop default has limited performance outcome for data-intensive jobs. To improve the Hadoop performance, it is important to consider the heterogeneity of nodes, reduce job latency, and improve the data locality of blocks. In this research, we use block rearrangement policy which can rearrange the data blocks considering node’s processing capability or heterogeneity of node for data placement and effectively use node labeling and scheduling schemes for job placement to meet the goal. The experimental result shows that the proposed model accelerates the Hadoop performance by achieving high data locality and less job completion time compared to default configuration and policy.

Keywords

Hadoop performance acceleration Hadoop data placement MapReduce performance Performance improvement Hadoop load balancing 

References

  1. 1.
  2. 2.
  3. 3.
    Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10), IEEE (2010)Google Scholar
  4. 4.
    Vavilapalli, V.K., Seth, S., Saha, B., Curino, C., O’Malley, O., Radia, S., Reed, B., Baldeschwieler, E., Murthy, A., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H.: Apache Hadoop YARN. In: Proceedings of the 4th Annual Symposium on Cloud Computing—SOCC ’13 (2013)Google Scholar
  5. 5.
    Dean, J., Ghemawat, S.: MapReduce. Commun. ACM 51, 107 (2008)CrossRefGoogle Scholar
  6. 6.
    Shah, A., Padole, M.: Load balancing through block rearrangement policy for Hadoop heterogeneous cluster. In: 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), IEEE (2018)Google Scholar
  7. 7.
    Muthukkaruppan, K., Ranganathan, K., Tang, L.: U.S. Patent No. 9,268,808. U.S. Patent and Trademark Office, Washington, DC (2016)Google Scholar
  8. 8.
    Qureshi, F., Muhammad, N., Shin, D.R.: RDP: a storage-tier-aware robust data placement strategy for Hadoop in a cloud-based heterogeneous environment. KSII Trans. Internet Inf. Syst. 10(9) (2016)Google Scholar
  9. 9.
    Meng, L., Zhao, W., Zhao, H., Ding, Y.: A network load sensitive block placement strategy of HDFS. KSII Trans. Internet Inf. Syst. 9(9) (2015)Google Scholar
  10. 10.
    Dai, W., Ibrahim, I., Bassiouni, M.: An improved replica placement policy for Hadoop distributed file system running on cloud platforms. In: 2017 IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud), pp. 270–275), IEEE (2017)Google Scholar
  11. 11.
    Fahmy, M.M., Elghandour, I., Nagi, M.: CoS-HDFS: co-locating geo-distributed spatial data in Hadoop distributed file system. In: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, pp. 123–132, ACM (2016)Google Scholar
  12. 12.
    Park, D., Kang, K., Hong, J., Cho, Y.: An efficient Hadoop data replication method design for heterogeneous clusters. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 2182–2184, ACM (2016)Google Scholar
  13. 13.
    Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: A Self-tuning System for Big Data Analytics (2011)Google Scholar
  14. 14.
  15. 15.
    Shah, A., Padole, M.: Performance analysis of scheduling algorithms in Apache Hadoop. In: Data, Engineering and Applications. Springer, Singapore (2019)Google Scholar
  16. 16.

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  • Ankit Shah
    • 1
  • Mamta Padole
    • 2
  1. 1.Shankersinh Vaghela Bapu Institute of TechnologyGandhinagarIndia
  2. 2.The Maharaja Sayajirao University of BarodaVadodaraIndia

Personalised recommendations