Hadoop Performance Acceleration by Effective Data and Job Placement

Shah, Ankit; Padole, Mamta

doi:10.1007/978-981-15-2475-2_20

Hadoop Performance Acceleration by Effective Data and Job Placement

Ankit Shah¹⁸ &
Mamta Padole¹⁹

Conference paper
First Online: 14 March 2020

542 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1118))

Abstract

In order to accelerate Hadoop performance, it is important to efficiently handle the data and job placement. More specifically, we focus on to accelerate the performance of heterogeneous distributed cluster as Hadoop default has limited performance outcome for data-intensive jobs. To improve the Hadoop performance, it is important to consider the heterogeneity of nodes, reduce job latency, and improve the data locality of blocks. In this research, we use block rearrangement policy which can rearrange the data blocks considering node’s processing capability or heterogeneity of node for data placement and effectively use node labeling and scheduling schemes for job placement to meet the goal. The experimental result shows that the proposed model accelerates the Hadoop performance by achieving high data locality and less job completion time compared to default configuration and policy.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Ibm. https://www.ibm.com/downloads/cas/XKBEABLN
Apache Hadoop. http://hadoop.apache.org
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10), IEEE (2010)
Google Scholar
Vavilapalli, V.K., Seth, S., Saha, B., Curino, C., O’Malley, O., Radia, S., Reed, B., Baldeschwieler, E., Murthy, A., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H.: Apache Hadoop YARN. In: Proceedings of the 4th Annual Symposium on Cloud Computing—SOCC ’13 (2013)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce. Commun. ACM 51, 107 (2008)
Article Google Scholar
Shah, A., Padole, M.: Load balancing through block rearrangement policy for Hadoop heterogeneous cluster. In: 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), IEEE (2018)
Google Scholar
Muthukkaruppan, K., Ranganathan, K., Tang, L.: U.S. Patent No. 9,268,808. U.S. Patent and Trademark Office, Washington, DC (2016)
Google Scholar
Qureshi, F., Muhammad, N., Shin, D.R.: RDP: a storage-tier-aware robust data placement strategy for Hadoop in a cloud-based heterogeneous environment. KSII Trans. Internet Inf. Syst. 10(9) (2016)
Google Scholar
Meng, L., Zhao, W., Zhao, H., Ding, Y.: A network load sensitive block placement strategy of HDFS. KSII Trans. Internet Inf. Syst. 9(9) (2015)
Google Scholar
Dai, W., Ibrahim, I., Bassiouni, M.: An improved replica placement policy for Hadoop distributed file system running on cloud platforms. In: 2017 IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud), pp. 270–275), IEEE (2017)
Google Scholar
Fahmy, M.M., Elghandour, I., Nagi, M.: CoS-HDFS: co-locating geo-distributed spatial data in Hadoop distributed file system. In: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, pp. 123–132, ACM (2016)
Google Scholar
Park, D., Kang, K., Hong, J., Cho, Y.: An efficient Hadoop data replication method design for heterogeneous clusters. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 2182–2184, ACM (2016)
Google Scholar
Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: A Self-tuning System for Big Data Analytics (2011)
Google Scholar
Grid5000. https://www.grid5000.fr/w/Grid5000:Home
Shah, A., Padole, M.: Performance analysis of scheduling algorithms in Apache Hadoop. In: Data, Engineering and Applications. Springer, Singapore (2019)
Google Scholar
Apache Hadoop 2.7.2—HDFS Architecture. https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#aMoving_Computation_is_Cheaper_than_Moving_Data

Download references

Author information

Authors and Affiliations

Shankersinh Vaghela Bapu Institute of Technology, Gandhinagar, India
Ankit Shah
The Maharaja Sayajirao University of Baroda, Vadodara, India
Mamta Padole

Authors

Ankit Shah
View author publications
You can also search for this author in PubMed Google Scholar
Mamta Padole
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of ECE, Malla Reddy College of Engineering and Technology, Hyderabad, Telangana, India
V. Sivakumar Reddy
Department of CSE, Jawaharlal Nehru Technological University Hyderabad, Hyderabad, Telangana, India
V. Kamakshi Prasad
Department of Computer Science and Software Engineering, Monmouth University, West Long Branch, NJ, USA
Jiacun Wang
Department of ECE, Sir Visvesvaraya Institute of Technology, Nashik, Maharashtra, India
K. T. V. Reddy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shah, A., Padole, M. (2020). Hadoop Performance Acceleration by Effective Data and Job Placement. In: Reddy, V., Prasad, V., Wang, J., Reddy, K. (eds) Soft Computing and Signal Processing. ICSCSP 2019. Advances in Intelligent Systems and Computing, vol 1118. Springer, Singapore. https://doi.org/10.1007/978-981-15-2475-2_20

Download citation

DOI: https://doi.org/10.1007/978-981-15-2475-2_20
Published: 14 March 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2474-5
Online ISBN: 978-981-15-2475-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics