Abstract
The efficiency of dealing with massive small geospatial files deeply affects the performance of Web Geography Information System (WebGIS). The Hadoop Distributed File System (HDFS) is scalable to satisfy the requirement of massive data files storage, but not efficient in dealing with small files. In this paper, we proposed a method to pack a group of small files into one large logical file, and set up Hilbert spatial index inside the block with their spatial adjacency relation. The experimentation proved that this method reduces the size of block indices and increases the speed to search and retrieve the massive small spatial files.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Yanga, C., Goodchildb, M., Huanga, Q., Nebertc, D., Raskind, R., Xue, Y., Bambacusf, M., Faye, D.: Spatial cloud computing: how can the geospatial sciences use and helpshape cloud computing? International Journal of Digital Earth 4(4), 305–329 (2011)
Siddhisena, B., Warusawithana, L., Mendis, M.: Next generation multi-tenant virtualization cloud computing platform. In: Advanced Communication Technology (ICACT), pp. 405–410 (2011)
Armbrust, M., Fox, A., et al.: Above the Clouds: A Berkeley View of Cloud Computing, Technical ReportNo. UCB/EECS-2009-28, University of California at Berkley (2009)
Dean, J., Ghemawat, S.: MapReduce: Simpli_ed Data Processing on Large Clusters. In: OSDI (2004)
Dick, M.E.: Leveraging P2P overlays for Largescale and Highly Robust Content Distribution and Search. In: VLDB 2009, p. 1059 (2009)
Yang, C.P., Raskin, R., Goodchild, M.F., Gahegan, M.: Geospatial Cyberinfrastructure: Past, present and future. Computers, Environment and Urban Systems 34(4), 264–277 (2010)
Amirian, P., Alesheikh, A., Bassiri, A.: Interoperable Exchange and Share of Urban Services Data through Geospatial Services and XML Database, Complex. In: 2010 International Conference on Complex, Intelligent and Software Intensive Systems, pp. 62–68 (2010)
Zhang, J., You, S., Gruenwald, L.: Indexing large-scale raster geospatial data using massively parallel GPGPU computing. In: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems (GIS 2010), pp. 450–453. ACM, New York (2010)
Hadoop archives, http://hadoopapache.org/common/docs/current/hadoop_archives.html
Lopes, P.A., Medeiros, P.D.: pCFS vs. PVFS: Comparing a Highly-Available Symmetrical Parallel Cluster File System with an Asymmetrical Parallel File System. In: D’Ambra, P., Guarracino, M., Talia, D. (eds.) Euro-Par 2010, Part I. LNCS, vol. 6271, pp. 131–142. Springer, Heidelberg (2010)
Von Laszewski, G.: Concurrency and Computation: Practice and Experience. Special Issue: Grid Computing. High Performance and Distributed Application 22(11), 1433–1449 (2010)
Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google File System. In: SOSP 2003, Bolton Landing, NewYork, USA, pp. 29–43 (October 2003)
Liu, X., Han, J., Zhong, Y., Han, C., He, X.: Implementing WebGIS on Hadoop: A Case Study of Improving Small File I/O Performance on HDFS. In: IEEE International Conference on Cluster Computing and Workshops, CLUSTER 2009, pp. 1–8 (2009)
Dong, B., Qiu, J., Zheng, Q., Zhong, X., Li, J., Li, Y.: A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint Files. In: 2010 IEEE International Conference on Services Computing, pp. 65–72 (2010)
Jiang, L., Li, B., Song, M.: The Optimization of HDFS Based on Small Files. In: Proceedings of IC-BNMT 2010, The 3rd IEEE International Conference on Broadband Network& Multimedia Technology, pp. 912–915 (2010)
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshma, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. In: Proceeding SOSP 2007 Proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles, vol. 41(6), ACM, New York (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cui, J., Zhang, Y., Li, C., Xing, C. (2012). A Packaging Approach for Massive Amounts of Small Geospatial Files with HDFS. In: Gao, H., Lim, L., Wang, W., Li, C., Chen, L. (eds) Web-Age Information Management. WAIM 2012. Lecture Notes in Computer Science, vol 7418. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32281-5_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-32281-5_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32280-8
Online ISBN: 978-3-642-32281-5
eBook Packages: Computer ScienceComputer Science (R0)