Abstract
The ZB (trillion GB) scales of data produced globally each year, making the distributed data storage become a trend. Research and application on Hadoop which is the most representative open source distributed file system is increasing. However, Hadoop is not suitable for handling massive small files, this paper presents a metadata-aware storage architecture for massive small files, taking full advantage of the metadata of file, merging the small files into Sequence File by the classification algorithm of merge module, and the efficient indexing mechanism be introduced, make a good solution to the problem about the bottleneck of NameNode memory. Taking MP3 files as an example, the experiments show that the architecture can obtain good results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
IDC EMC. Digital Universe 2011 Infographic Study (2011)
Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google File System
White, T.: Hadoop: The Definitive Guide, pp. 150–190 (2009)
http://www.cloudera.com/blog/2009/02/the-small-files-problem/
Mackey, G., Sehrish, S., Wang, J.: Improving metadata management for small files in HDFS(C/OL). In: Proceedings of 2009 IEEE International Conference on Cluster Computing and Workshops (August 10, 2010), http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5289133
Liu, X., Han, J., Zhong, Y., Han, C., He, X.: Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS. Cluster, 1–8 (2009)
Dong, B., Qiu, J., Zheng, Q., Zhong, X., Li, J., Li, Y.: A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: A Case Study by PowerPoint Files. In: Proceedings of IEEE SCC 2010, pp. 65–72 (2010)
Hadoop Sequence File, http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.html
CouchDB(EB/OL) (2011), http://couchdb.apache.org/docs/overview.html
Memcached(EB/OL) (2011), http://memcached.org/
MP3Format, http://en.wikipedia.org/wiki/MP3
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhao, X., Yang, Y., Sun, Ll., Huang, H. (2012). Metadata-Aware Small Files Storage Architecture on Hadoop. In: Wang, F.L., Lei, J., Gong, Z., Luo, X. (eds) Web Information Systems and Mining. WISM 2012. Lecture Notes in Computer Science, vol 7529. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33469-6_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-33469-6_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33468-9
Online ISBN: 978-3-642-33469-6
eBook Packages: Computer ScienceComputer Science (R0)