Skip to main content

A Packaging Approach for Massive Amounts of Small Geospatial Files with HDFS

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7418))

Abstract

The efficiency of dealing with massive small geospatial files deeply affects the performance of Web Geography Information System (WebGIS). The Hadoop Distributed File System (HDFS) is scalable to satisfy the requirement of massive data files storage, but not efficient in dealing with small files. In this paper, we proposed a method to pack a group of small files into one large logical file, and set up Hilbert spatial index inside the block with their spatial adjacency relation. The experimentation proved that this method reduces the size of block indices and increases the speed to search and retrieve the massive small spatial files.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yanga, C., Goodchildb, M., Huanga, Q., Nebertc, D., Raskind, R., Xue, Y., Bambacusf, M., Faye, D.: Spatial cloud computing: how can the geospatial sciences use and helpshape cloud computing? International Journal of Digital Earth 4(4), 305–329 (2011)

    Article  Google Scholar 

  2. Siddhisena, B., Warusawithana, L., Mendis, M.: Next generation multi-tenant virtualization cloud computing platform. In: Advanced Communication Technology (ICACT), pp. 405–410 (2011)

    Google Scholar 

  3. Armbrust, M., Fox, A., et al.: Above the Clouds: A Berkeley View of Cloud Computing, Technical ReportNo. UCB/EECS-2009-28, University of California at Berkley (2009)

    Google Scholar 

  4. Dean, J., Ghemawat, S.: MapReduce: Simpli_ed Data Processing on Large Clusters. In: OSDI (2004)

    Google Scholar 

  5. Dick, M.E.: Leveraging P2P overlays for Largescale and Highly Robust Content Distribution and Search. In: VLDB 2009, p. 1059 (2009)

    Google Scholar 

  6. Yang, C.P., Raskin, R., Goodchild, M.F., Gahegan, M.: Geospatial Cyberinfrastructure: Past, present and future. Computers, Environment and Urban Systems 34(4), 264–277 (2010)

    Article  Google Scholar 

  7. Amirian, P., Alesheikh, A., Bassiri, A.: Interoperable Exchange and Share of Urban Services Data through Geospatial Services and XML Database, Complex. In: 2010 International Conference on Complex, Intelligent and Software Intensive Systems, pp. 62–68 (2010)

    Google Scholar 

  8. Zhang, J., You, S., Gruenwald, L.: Indexing large-scale raster geospatial data using massively parallel GPGPU computing. In: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems (GIS 2010), pp. 450–453. ACM, New York (2010)

    Google Scholar 

  9. Hadoop archives, http://hadoopapache.org/common/docs/current/hadoop_archives.html

  10. Lopes, P.A., Medeiros, P.D.: pCFS vs. PVFS: Comparing a Highly-Available Symmetrical Parallel Cluster File System with an Asymmetrical Parallel File System. In: D’Ambra, P., Guarracino, M., Talia, D. (eds.) Euro-Par 2010, Part I. LNCS, vol. 6271, pp. 131–142. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  11. Von Laszewski, G.: Concurrency and Computation: Practice and Experience. Special Issue: Grid Computing. High Performance and Distributed Application 22(11), 1433–1449 (2010)

    Google Scholar 

  12. Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google File System. In: SOSP 2003, Bolton Landing, NewYork, USA, pp. 29–43 (October 2003)

    Google Scholar 

  13. Liu, X., Han, J., Zhong, Y., Han, C., He, X.: Implementing WebGIS on Hadoop: A Case Study of Improving Small File I/O Performance on HDFS. In: IEEE International Conference on Cluster Computing and Workshops, CLUSTER 2009, pp. 1–8 (2009)

    Google Scholar 

  14. Dong, B., Qiu, J., Zheng, Q., Zhong, X., Li, J., Li, Y.: A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint Files. In: 2010 IEEE International Conference on Services Computing, pp. 65–72 (2010)

    Google Scholar 

  15. Jiang, L., Li, B., Song, M.: The Optimization of HDFS Based on Small Files. In: Proceedings of IC-BNMT 2010, The 3rd IEEE International Conference on Broadband Network& Multimedia Technology, pp. 912–915 (2010)

    Google Scholar 

  16. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshma, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. In: Proceeding SOSP 2007 Proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles, vol. 41(6), ACM, New York (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cui, J., Zhang, Y., Li, C., Xing, C. (2012). A Packaging Approach for Massive Amounts of Small Geospatial Files with HDFS. In: Gao, H., Lim, L., Wang, W., Li, C., Chen, L. (eds) Web-Age Information Management. WAIM 2012. Lecture Notes in Computer Science, vol 7418. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32281-5_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32281-5_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32280-8

  • Online ISBN: 978-3-642-32281-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics