Skip to main content

HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases

  • Living reference work entry
  • First Online:

Definition

Modern NewSQL database systems can be used to store fully normalized metadata for distributed hierarchical file systems, and provide high throughput and low operational latencies for the file system operations.

Introduction

For many years, researchers have investigated the use of database technology to manage file system metadata, with the goal of providing extensible typed metadata and support for fast, rich metadata search. However, previous attempts failed mainly due to the reduced performance introduced by adding database operations to the file system’s critical path. However, recent improvements in the performance of distributed in-memory online transaction processing databases (NewSQL databases) led us to reinvestigate the possibility of using a database to manage file system metadata, but this time for a distributed, hierarchical file system, the Hadoop file system (HDFS). The single-host metadata service of HDFS is a well-known bottleneck for both the size of HDFS...

This is a preview of subscription content, log in via an institution.

References

  • Abad CL (2014) Big data storage workload characterization, modeling and synthetic generation. PhD thesis, University of Illinois at Urbana-Champaign

    Google Scholar 

  • Guerraoui R, Raynal M (2006) A leader election protocol for eventually synchronous shared memory systems. In: The fourth IEEE workshop on software technologies for future embedded and ubiquitous systems, 2006 and the 2006 second international workshop on collaborative computing, integration, and assurance, SEUS 2006/WCCIA, pp 6–

    Google Scholar 

  • Hammer-Bench (2016) Distributed metadata benchmark to HDFS. https://github.com/smkniazi/hammer-bench. [Online; Accessed 1 Jan 2016]

  • Ismail M, Gebremeskel E, Kakantousis T, Berthou G, Dowling J (2017) Hopsworks: improving user experience and development on hadoop with scalable, strongly consistent metadata. In: 2017 IEEE 37th international conference on distributed computing systems (ICDCS), pp 2525–2528

    Google Scholar 

  • Ismail M, Niazi S, Ronström M, Haridi S, Dowling J (2017) Scaling HDFS to more than 1 million operations per second with HopsFS. In: Proceedings of the 17th IEEE/ACM international symposium on cluster, cloud and grid computing, CCGrid ’17. IEEE Press, Piscataway, pp 683–688

    Google Scholar 

  • Niazi S, Haridi S, Dowling J (2017) Size matters: improving the performance of small files in HDF. https://eurosys2017.github.io/assets/data/posters/poster09-Niazi.pdfl. [Online; Accessed 30 June 2017]

  • Niazi S, Ismail M, Haridi S, Dowling J, Grohsschmiedt S, Ronström M (2017) Hopsfs: scaling hierarchical file system metadata using newsql databases. In: 15th USENIX conference on file and storage technologies (FAST’17). USENIX Association, Santa Clara, pp 89–104

    Google Scholar 

  • Noll MG (2015) Benchmarking and stress testing an hadoop cluster with TeraSort. TestDFSIO & Co. http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/. [Online; Accessed 3 Sept 2015]

  • Ovsiannikov M, Rus S, Reeves D, Sutter P, Rao S, Kelly J (2013) The quantcast file system. Proc VLDB Endow 6(11):1092–1101

    Google Scholar 

  • Patil SV Gibson GA Lang S, Polte M (2007) GIGA+: scalable directories for shared file systems. In: Proceedings of the 2nd international workshop on petascale data storage: held in conjunction with supercomputing ’07, PDSW ’07. ACM, New York, pp 26–29

    Google Scholar 

  • Ren K, Kwon Y, Balazinska M, Howe B (2013) Hadoop’s adolescence: an analysis of hadoop usage in scientific workloads. Proc VLDB Endow 6(10):853–864

    Google Scholar 

  • Salman Niazi GB, Ismail M, Dowling J (2015) Leader election using NewSQL systems. In: Proceeding of DAIS 2015. Springer, pp 158–172

    Google Scholar 

  • Shvachko KV (2010) HDFS scalability: the limits to growth. Login Mag USENIX 35(2):6–16

    Google Scholar 

  • Thomson A, Abadi DJ (2015) CalvinFS: consistent WAN replication and scalable metadata management for distributed file systems. In: 13th USENIX conference on file and storage technologies (FAST 15). USENIX Association, Santa Clara, pp 1–14

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Salman Niazi .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Niazi, S., Ismail, M., Haridi, S., Dowling, J. (2018). HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_146-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63962-8_146-1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63962-8

  • Online ISBN: 978-3-319-63962-8

  • eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics