Skip to main content

Hmfs: Efficient Support of Small Files Processing over HDFS

  • Conference paper
Algorithms and Architectures for Parallel Processing (ICA3PP 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8631))

Abstract

The storage and access of massive small files are one of the challenges in the design of distributed file system. Hadoop distributed file system (HDFS) is primarily designed for reliable storage and fast access of very big files while it suffers a performance penalty with increasing number of small files. A middleware called Hmfs is proposed in this paper to improve the efficiency of storing and accessing small files on HDFS. It is made up of three layers, file operation interfaces to make it easier for software developers to submit different file requests, file management tasks to merge small files into big ones or extract small files from big ones in the background, and file buffers to improve the I/O performance. Hmfs boosts the file upload speed by using asynchronous write mechanism and the file download speed by adopting prefetching and caching strategy. The experimental results show that Hmfs can help to obtain high speed of storage and access for massive small files on HDFS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hadoop, http://hadoop.apache.org/

  2. Shvachko, K., Kuang, H.: Radia. S.: The hadoop distributed file system. In: IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST 2010). Incline Village, Nevada (2010)

    Google Scholar 

  3. Dong, B., Zheng, Q., Tian, F., et al.: An optimized approach for storing and accessing small files on cloud storage. Journal of Network and Computer Applications 35(6), 1847–1862 (2012)

    Article  Google Scholar 

  4. Dong, B., Qiu, J., Zheng, Q., et al.: A novel approach to improving the efficiency of storing and accessing small files on hadoop: a case study by powerpoint files. In: IEEE International Conference on Services Computing (SCC 2010), Miami, Florida, USA (2010)

    Google Scholar 

  5. Liu, X., Han, J., Zhong, Y., et al.: Implementing WebGIS on hadoop: a case study of improving small file I/O performance on HDFS. In: IEEE International Conference on Cluster Computing and Workshops (CLUSTER 2009), New Orleans, LA, USA (2009)

    Google Scholar 

  6. Cui, J., Zhang, Y., Li, C., Xing, C.: A packaging approach for massive amounts of small geospatial files with HDFS. In: Gao, H., Lim, L., Wang, W., Li, C., Chen, L. (eds.) WAIM 2012. LNCS, vol. 7418, pp. 210–215. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  7. Hadoop Archives, http://hadoop.apache.org/common/docs/r0.20.2/hadoop_archive

  8. Sequence File, http://wiki.apache.org/hadoop/SequenceFile

  9. Hbase, http://hbase.apache.org/

  10. Gohil, P., Panchal, B.: Efficient ways to improve the performance of HDFS for small files. Computer Engineering and Intelligent Systems 5(1), 45–49 (2014)

    Google Scholar 

  11. Wang, Y., Zhang, S., Liu, H.: The design of distributed file system based on HDFS. Applied Mechanics and Materials 423, 2733–2736 (2013)

    Article  Google Scholar 

  12. Mao, Y., Min, W.: Storage and accessing small files based on HDFS. In: Patnaik, S., Li, X. (eds.) 4th International Conference on Computer Science and Information Technology (CCSIT 2014). AISC, vol. 255, pp. 565–573. Springer, Heidelberg (2014)

    Google Scholar 

  13. Chandrasekar, S., Dakshinamurthy, R., Seshakumar, P., et al.: A novel indexing scheme for efficient handling of small files in hadoop distributed file system. In: 2013 International Conference on Computer Communication and Informatics, ICCCI 2013 (2013)

    Google Scholar 

  14. Mackey, G., Sehrish, S., Wang, J.: Improving metadata management for small files in HDFS. In: IEEE International Conference on Cluster Computing and Workshops (CLUSTER 2009), New Orleans, Louisiana, USA (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Yan, C., Li, T., Huang, Y., Gan, Y. (2014). Hmfs: Efficient Support of Small Files Processing over HDFS. In: Sun, Xh., et al. Algorithms and Architectures for Parallel Processing. ICA3PP 2014. Lecture Notes in Computer Science, vol 8631. Springer, Cham. https://doi.org/10.1007/978-3-319-11194-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11194-0_5

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11193-3

  • Online ISBN: 978-3-319-11194-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics