Skip to main content

Data Aware Distributed Storage (DAS) for Performance Improvement Across a Hadoop Commodity Cluster

  • Conference paper
  • First Online:
Advances in Decision Sciences, Image Processing, Security and Computer Vision (ICETE 2019)

Part of the book series: Learning and Analytics in Intelligent Systems ((LAIS,volume 3))

Included in the following conference series:

Abstract

Big Data is the order of the day and has found in-roads into many areas of working other than just the internet, which has been the breeding ground for this technology. The Remote Sensing domain has also seen growth in volumes and velocity of spatial data and thus the term Spatial Big Data has been coined to refer to this type of data. Processing the spatial data for applications such as urban mapping, object detection, change detection have undergone changes for the sake of computational efficiency from being single monolithic centralized processing to distributed processing and from single core CPUs to Multicore CPUs and further to GPUs and specific hardware in terms of architecture. The two major problems faced in this regard is the size of the data to be processed per unit of memory/time and the storage and retrieval of data for efficient processing. In this paper, we discuss a method of distributing data across a HDFS cluster, which aids in fast retrieval and faster processing per unit of available memory in the Image Processing domain. We evaluate our technique and compare the same with the traditional approach on a 4-node HDFS cluster. Significant improvement is found while performing edge detection on large spatial data, which has been tabulated in the results section.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lee CA, Gasster SD, Plaza A, Chang C-I, Huang B (2011) Recent developments in high performance computing for remote sensing: a review. IEEE J Sel Top Appl Earth Obs Remote Sens 4(3):508–527

    Article  Google Scholar 

  2. Lv Z, Hu Y, Zhong H, Wu J, Li B, Zhao H (2010) Parallel k-means clustering of remote sensing images based on mapreduce. In: Proceedings of the 2010 international conference on web information systems and mining, ser. WISM 2010. Springer-Verlag, Berlin, Heidelberg, pp 162–170

    Google Scholar 

  3. Li Y, Crandall DJ, Huttenlocher DP (2009) Landmark classification in large-scale image collections. In: ICCV, 1957–1964

    Google Scholar 

  4. Bajcsy P, Vandecreme A, Amelot J, Nguyen P, Chalfoun J, Brady M (2013) Terabyte sized image computations on hadoop cluster platforms. In: Big Data, 2013 IEEE international conference, October 2013, pp 729–737

    Google Scholar 

  5. Zhao JY, Li Q, Zhou HW (2011) A cloud-based system for spatial analysis service. In: 2011 international conference on remote sensing, environment and transportation engineering (RSETE), Nanjing, 24–26 June 2011, pp 1–4

    Google Scholar 

  6. Yang C-T, Chen L-T, Chou W-L, Wang K-C (2010) Implementation of a medical image file accessing system on cloud computing. In: 2010 IEEE 13th international conference on computational science and engineering (CSE), Hong Kong, 11–13 December 2010, pp 321–326. http://dx.doi.org/10.1109/CSE.2010.48

  7. Shelly, Raghava NS (2011) Iris recognition on hadoop: a biometrics system implementation on cloud computing. In: 2011 IEEE international conference on cloud computing and intelligence systems (CCIS), Beijing, 15–17 September 2011, pp 482–485. http://dx.doi.org/10.1109/CCIS.2011.6045114

  8. Alonso-Calvo R, Crespo J, Maojo V, Muñoz A, Garcia-Remesal M, Perez-Rey D (2011) Cloud computing service for managing large medical Image data-sets using balanced collaborative agents. Adv Intell Soft Comput 88:265–270. https://doi.org/10.1007/978-3642-19875-5_34

    Article  Google Scholar 

  9. Phani Bhushan R, Somayajulu DVLN, Venkatraman S et al (2018) A raster data framework based on distributed heterogeneous cluster. J Indian Soc Remote Sens. https://doi.org/10.1007/s12524-018-0897-5

  10. https://kr.mathworks.com/examples/image/mw/images-ex86052154-blockprocessing-large-images

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. Phani Bhushan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Phani Bhushan, R., Somayajulu, D.V.L.N., Venkatraman, S., Subramanyam, R.B.V. (2020). Data Aware Distributed Storage (DAS) for Performance Improvement Across a Hadoop Commodity Cluster. In: Satapathy, S.C., Raju, K.S., Shyamala, K., Krishna, D.R., Favorskaya, M.N. (eds) Advances in Decision Sciences, Image Processing, Security and Computer Vision. ICETE 2019. Learning and Analytics in Intelligent Systems, vol 3. Springer, Cham. https://doi.org/10.1007/978-3-030-24322-7_45

Download citation

Publish with us

Policies and ethics