Skip to main content

Metadata-Aware Small Files Storage Architecture on Hadoop

  • Conference paper
Web Information Systems and Mining (WISM 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7529))

Included in the following conference series:

Abstract

The ZB (trillion GB) scales of data produced globally each year, making the distributed data storage become a trend. Research and application on Hadoop which is the most representative open source distributed file system is increasing. However, Hadoop is not suitable for handling massive small files, this paper presents a metadata-aware storage architecture for massive small files, taking full advantage of the metadata of file, merging the small files into Sequence File by the classification algorithm of merge module, and the efficient indexing mechanism be introduced, make a good solution to the problem about the bottleneck of NameNode memory. Taking MP3 files as an example, the experiments show that the architecture can obtain good results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. IDC EMC. Digital Universe 2011 Infographic Study (2011)

    Google Scholar 

  2. Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google File System

    Google Scholar 

  3. White, T.: Hadoop: The Definitive Guide, pp. 150–190 (2009)

    Google Scholar 

  4. http://www.cloudera.com/blog/2009/02/the-small-files-problem/

  5. Mackey, G., Sehrish, S., Wang, J.: Improving metadata management for small files in HDFS(C/OL). In: Proceedings of 2009 IEEE International Conference on Cluster Computing and Workshops (August 10, 2010), http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5289133

  6. Liu, X., Han, J., Zhong, Y., Han, C., He, X.: Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS. Cluster, 1–8 (2009)

    Google Scholar 

  7. Dong, B., Qiu, J., Zheng, Q., Zhong, X., Li, J., Li, Y.: A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: A Case Study by PowerPoint Files. In: Proceedings of IEEE SCC 2010, pp. 65–72 (2010)

    Google Scholar 

  8. Hadoop Sequence File, http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.html

  9. CouchDB(EB/OL) (2011), http://couchdb.apache.org/docs/overview.html

  10. Memcached(EB/OL) (2011), http://memcached.org/

  11. http://top100.cn/

  12. MP3Format, http://en.wikipedia.org/wiki/MP3

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhao, X., Yang, Y., Sun, Ll., Huang, H. (2012). Metadata-Aware Small Files Storage Architecture on Hadoop. In: Wang, F.L., Lei, J., Gong, Z., Luo, X. (eds) Web Information Systems and Mining. WISM 2012. Lecture Notes in Computer Science, vol 7529. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33469-6_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33469-6_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33468-9

  • Online ISBN: 978-3-642-33469-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics