Encyclopedia of Big Data Technologies

2019 Edition
| Editors: Sherif Sakr, Albert Y. Zomaya

HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases

  • Salman NiaziEmail author
  • Mahmoud Ismail
  • Seif Haridi
  • Jim Dowling
Reference work entry
DOI: https://doi.org/10.1007/978-3-319-77525-8_146

Definitions

Modern NewSQL database systems can be used to store fully normalized metadata for distributed hierarchical file systems, and provide high throughput and low operational latencies for the file system operations.

Introduction

For many years, researchers have investigated the use of database technology to manage file system metadata, with the goal of providing extensible typed metadata and support for fast, rich metadata search. However, previous attempts failed mainly due to the reduced performance introduced by adding database operations to the file system’s critical path. However, recent improvements in the performance of distributed in-memory online transaction processing databases (NewSQL databases) led us to reinvestigate the possibility of using a database to manage file system metadata, but this time for a distributed, hierarchical file system, the Hadoop file system (HDFS). The single-host metadata service of HDFS is a well-known bottleneck for both the size of HDFS...

This is a preview of subscription content, log in to check access.

References

  1. Abad CL (2014) Big data storage workload characterization, modeling and synthetic generation. PhD thesis, University of Illinois at Urbana-ChampaignGoogle Scholar
  2. Hammer-Bench (2016) Distributed metadata benchmark to HDFS. https://github.com/smkniazi/hammer-bench. [Online; Accessed 1 Jan 2016]
  3. Ismail M, Gebremeskel E, Kakantousis T, Berthou G, Dowling J (2017) Hopsworks: improving user experience and development on hadoop with scalable, strongly consistent metadata. In: 2017 IEEE 37th international conference on distributed computing systems (ICDCS), pp 2525–2528Google Scholar
  4. Ismail M, Niazi S, Ronström M, Haridi S, Dowling J (2017) Scaling HDFS to more than 1 million operations per second with HopsFS. In: Proceedings of the 17th IEEE/ACM international symposium on cluster, cloud and grid computing, CCGrid ’17. IEEE Press, Piscataway, pp 683–688Google Scholar
  5. Niazi S, Haridi S, Dowling J (2017) Size matters: improving the performance of small files in HDF. https://eurosys2017.github.io/assets/data/posters/poster09-Niazi.pdfl. [Online; Accessed 30 June 2017]
  6. Niazi S, Ismail M, Haridi S, Dowling J, Grohsschmiedt S, Ronström M (2017) Hopsfs: scaling hierarchical file system metadata using newsql databases. In: 15th USENIX conference on file and storage technologies (FAST’17). USENIX Association, Santa Clara, pp 89–104Google Scholar
  7. Noll MG (2015) Benchmarking and stress testing an hadoop cluster with TeraSort. TestDFSIO & Co. http:// www.michael-noll.com/blog/2011/04/09/benchmarking- and-stress-testing-an-hadoop-cluster-with-terasort- testdfsio-nnbench-mrbench/. [Online; Accessed 3 Sept 2015]
  8. Ovsiannikov M, Rus S, Reeves D, Sutter P, Rao S, Kelly J (2013) The quantcast file system. Proc VLDB Endow 6(11):1092–1101CrossRefGoogle Scholar
  9. Patil SV Gibson GA Lang S, Polte M (2007) GIGA+: scalable directories for shared file systems. In: Proceedings of the 2nd international workshop on petascale data storage: held in conjunction with supercomputing ’07, PDSW ’07. ACM, New York, pp 26–29CrossRefGoogle Scholar
  10. Ren K, Kwon Y, Balazinska M, Howe B (2013) Hadoop’s adolescence: an analysis of hadoop usage in scientific workloads. Proc VLDB Endow 6(10):853–864CrossRefGoogle Scholar
  11. Salman Niazi GB, Ismail M, Dowling J (2015) Leader election using NewSQL systems. In: Proceeding of DAIS 2015. Springer, pp 158–172Google Scholar
  12. Shvachko KV (2010) HDFS scalability: the limits to growth. Login Mag USENIX 35(2):6–16Google Scholar
  13. Thomson A, Abadi DJ (2015) CalvinFS: consistent WAN replication and scalable metadata management for distributed file systems. In: 13th USENIX conference on file and storage technologies (FAST 15). USENIX Association, Santa Clara, pp 1–14Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Salman Niazi
    • 1
    Email author
  • Mahmoud Ismail
    • 1
  • Seif Haridi
    • 1
  • Jim Dowling
    • 1
  1. 1.KTH – Royal Institute of TechnologyStockholmSweden