HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases
Modern NewSQL database systems can be used to store fully normalized metadata for distributed hierarchical file systems, and provide high throughput and low operational latencies for the file system operations.
For many years, researchers have investigated the use of database technology to manage file system metadata, with the goal of providing extensible typed metadata and support for fast, rich metadata search. However, previous attempts failed mainly due to the reduced performance introduced by adding database operations to the file system’s critical path. However, recent improvements in the performance of distributed in-memory online transaction processing databases (NewSQL databases) led us to reinvestigate the possibility of using a database to manage file system metadata, but this time for a distributed, hierarchical file system, the Hadoop file system (HDFS). The single-host metadata service of HDFS is a well-known bottleneck for both the size of HDFS...
- Abad CL (2014) Big data storage workload characterization, modeling and synthetic generation. PhD thesis, University of Illinois at Urbana-ChampaignGoogle Scholar
- Hammer-Bench (2016) Distributed metadata benchmark to HDFS. https://github.com/smkniazi/hammer-bench. [Online; Accessed 1 Jan 2016]
- Ismail M, Gebremeskel E, Kakantousis T, Berthou G, Dowling J (2017) Hopsworks: improving user experience and development on hadoop with scalable, strongly consistent metadata. In: 2017 IEEE 37th international conference on distributed computing systems (ICDCS), pp 2525–2528Google Scholar
- Ismail M, Niazi S, Ronström M, Haridi S, Dowling J (2017) Scaling HDFS to more than 1 million operations per second with HopsFS. In: Proceedings of the 17th IEEE/ACM international symposium on cluster, cloud and grid computing, CCGrid ’17. IEEE Press, Piscataway, pp 683–688Google Scholar
- Niazi S, Haridi S, Dowling J (2017) Size matters: improving the performance of small files in HDF. https://eurosys2017.github.io/assets/data/posters/poster09-Niazi.pdfl. [Online; Accessed 30 June 2017]
- Niazi S, Ismail M, Haridi S, Dowling J, Grohsschmiedt S, Ronström M (2017) Hopsfs: scaling hierarchical file system metadata using newsql databases. In: 15th USENIX conference on file and storage technologies (FAST’17). USENIX Association, Santa Clara, pp 89–104Google Scholar
- Noll MG (2015) Benchmarking and stress testing an hadoop cluster with TeraSort. TestDFSIO & Co. http:// www.michael-noll.com/blog/2011/04/09/benchmarking- and-stress-testing-an-hadoop-cluster-with-terasort- testdfsio-nnbench-mrbench/. [Online; Accessed 3 Sept 2015]
- Salman Niazi GB, Ismail M, Dowling J (2015) Leader election using NewSQL systems. In: Proceeding of DAIS 2015. Springer, pp 158–172Google Scholar
- Shvachko KV (2010) HDFS scalability: the limits to growth. Login Mag USENIX 35(2):6–16Google Scholar
- Thomson A, Abadi DJ (2015) CalvinFS: consistent WAN replication and scalable metadata management for distributed file systems. In: 13th USENIX conference on file and storage technologies (FAST 15). USENIX Association, Santa Clara, pp 1–14Google Scholar