Skip to main content

Hadoop Performance Acceleration by Effective Data and Job Placement

  • Conference paper
  • First Online:
  • 542 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1118))

Abstract

In order to accelerate Hadoop performance, it is important to efficiently handle the data and job placement. More specifically, we focus on to accelerate the performance of heterogeneous distributed cluster as Hadoop default has limited performance outcome for data-intensive jobs. To improve the Hadoop performance, it is important to consider the heterogeneity of nodes, reduce job latency, and improve the data locality of blocks. In this research, we use block rearrangement policy which can rearrange the data blocks considering node’s processing capability or heterogeneity of node for data placement and effectively use node labeling and scheduling schemes for job placement to meet the goal. The experimental result shows that the proposed model accelerates the Hadoop performance by achieving high data locality and less job completion time compared to default configuration and policy.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Ibm. https://www.ibm.com/downloads/cas/XKBEABLN

  2. Apache Hadoop. http://hadoop.apache.org

  3. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10), IEEE (2010)

    Google Scholar 

  4. Vavilapalli, V.K., Seth, S., Saha, B., Curino, C., O’Malley, O., Radia, S., Reed, B., Baldeschwieler, E., Murthy, A., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H.: Apache Hadoop YARN. In: Proceedings of the 4th Annual Symposium on Cloud Computing—SOCC ’13 (2013)

    Google Scholar 

  5. Dean, J., Ghemawat, S.: MapReduce. Commun. ACM 51, 107 (2008)

    Article  Google Scholar 

  6. Shah, A., Padole, M.: Load balancing through block rearrangement policy for Hadoop heterogeneous cluster. In: 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), IEEE (2018)

    Google Scholar 

  7. Muthukkaruppan, K., Ranganathan, K., Tang, L.: U.S. Patent No. 9,268,808. U.S. Patent and Trademark Office, Washington, DC (2016)

    Google Scholar 

  8. Qureshi, F., Muhammad, N., Shin, D.R.: RDP: a storage-tier-aware robust data placement strategy for Hadoop in a cloud-based heterogeneous environment. KSII Trans. Internet Inf. Syst. 10(9) (2016)

    Google Scholar 

  9. Meng, L., Zhao, W., Zhao, H., Ding, Y.: A network load sensitive block placement strategy of HDFS. KSII Trans. Internet Inf. Syst. 9(9) (2015)

    Google Scholar 

  10. Dai, W., Ibrahim, I., Bassiouni, M.: An improved replica placement policy for Hadoop distributed file system running on cloud platforms. In: 2017 IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud), pp. 270–275), IEEE (2017)

    Google Scholar 

  11. Fahmy, M.M., Elghandour, I., Nagi, M.: CoS-HDFS: co-locating geo-distributed spatial data in Hadoop distributed file system. In: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, pp. 123–132, ACM (2016)

    Google Scholar 

  12. Park, D., Kang, K., Hong, J., Cho, Y.: An efficient Hadoop data replication method design for heterogeneous clusters. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 2182–2184, ACM (2016)

    Google Scholar 

  13. Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: A Self-tuning System for Big Data Analytics (2011)

    Google Scholar 

  14. Grid5000. https://www.grid5000.fr/w/Grid5000:Home

  15. Shah, A., Padole, M.: Performance analysis of scheduling algorithms in Apache Hadoop. In: Data, Engineering and Applications. Springer, Singapore (2019)

    Google Scholar 

  16. Apache Hadoop 2.7.2—HDFS Architecture. https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#aMoving_Computation_is_Cheaper_than_Moving_Data

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shah, A., Padole, M. (2020). Hadoop Performance Acceleration by Effective Data and Job Placement. In: Reddy, V., Prasad, V., Wang, J., Reddy, K. (eds) Soft Computing and Signal Processing. ICSCSP 2019. Advances in Intelligent Systems and Computing, vol 1118. Springer, Singapore. https://doi.org/10.1007/978-981-15-2475-2_20

Download citation

Publish with us

Policies and ethics