Metadata Management Algorithm Based on Improved LSM Tree

  • Yonghua Huo
  • Ningling Ge
  • Jinxi Han
  • Kun Wang
  • Yang YangEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1143)


Hadoop distributed file system (HDFS) is one of the cores of Hadoop, but because HDFS storage and management of data capacity is limited by the memory size of NameNode, its scalability is constrained. In this article, we analyze two problems when NameNode manages metadata: loading FSImage takes too long and the capacity is limited by memory size. We propose optimizing the HDFS hierarchical metadata structure into a flat structure and removing metadata from memory. To this end, we design the F-HDFS based on improved log-structured merge-tree (LSM tree) and memory-mapped file for metadata management and introduce the F-HDFS metadata operations. In addition, F-HDFS is also compatible with features such as high availability of HDFS and snapshot, so that F-HDFS can be applied to existing HDFS-based applications. We implement the F-HDFS prototype system and compare it with HDFS. The results show that F-HDFS performance is better than HDFS for providing stable and fast metadata services.


Metadata NameNode Hash map Bloom filter 



This work was supported in part by Open Subject Funds of Science and Technology on Information Transmission and Dissemination in Communication Networks Laboratory (SKX182010049), Fundamental Research Funds for the Central Universities (2019PTB-019) and the Industrial Internet Innovation and Development Project 2018 of China.


  1. 1.
    Haddad, I.F.: PVFS: a parallel virtual file system for linux clusters. Linux Journal 2000(80es), 5 (2000)Google Scholar
  2. 2.
    Bai, S., Wu, H.: The performance study on several distributed file systems. In: International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, pp. 226–229 (2011)Google Scholar
  3. 3.
    Ghemawat, S., Gobioff, H., Leung, S. T.: The Google file system. In: Nineteenth ACM Symposium on Operating Systems Principles, pp. 29–43 (2003)Google Scholar
  4. 4.
    Schmuck, F.B., Roger, L.H.: GPFS: a shared-disk file system for large computing clusters. FAST 2(19) (2002)Google Scholar
  5. 5.
    Nagle, D., Serenyi, D., Matthews, A.: The panasas activescale storage cluster: delivering scalable high bandwidth storage. In: Proceedings of the ACM/IEEE SC2004 Conference, vol. 53 (2004)Google Scholar
  6. 6.
    Weil, S.A., Brandt, S.A., Miller, E.L. et al.: Ceph: a scalable, high-performance distributed file system. In: Symposium on Operating Systems Design and Implementation, pp. 307–320 (2006)Google Scholar
  7. 7.
    Karun, A.K., Chitharanjan, K.: A review on Hadoop-HDFS infrastructure extensions. In: Information and Communication Technologies, pp. 132–137 (2013)Google Scholar
  8. 8.
  9. 9.

Copyright information

© Springer Nature Singapore Pte Ltd. 2021

Authors and Affiliations

  • Yonghua Huo
    • 1
  • Ningling Ge
    • 2
  • Jinxi Han
    • 3
  • Kun Wang
    • 2
  • Yang Yang
    • 2
    Email author
  1. 1.The 54th Research Institute of CETCShijiazhuangChina
  2. 2.State Key Laboratory of Networking and Switching TechnologyBeijing University of Posts and TelecommunicationsBeijingChina
  3. 3.Institute of Systems EngineeringBeijingChina

Personalised recommendations