Skip to main content

Hashing Computation for Scalable Metadata

  • Chapter
  • First Online:
Searchable Storage in Cloud Computing
  • 456 Accesses

Abstract

This section presents a scalable and adaptive decentralized metadata lookup scheme for ultra large-scale file systems (more than Petabytes or even Exabytes). Our scheme logically organizes metadata servers (MDSs) into a multilayered query hierarchy and exploits grouped Bloom filters to efficiently route metadata requests to desired MDSs through the hierarchy. This metadata lookup scheme can be executed at the network or memory speed, without being bounded by the performance of slow disks. An effective workload balance method is also developed for server reconfigurations. This scheme is evaluated through extensive trace-driven simulations and a prototype implementation in Linux. Experimental results show that this scheme can significantly improve metadata management scalability and query efficiency in ultra large-scale storage systems (©{2011}IEEE. Reprinted, with permission, from Ref. [1].).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Y. Hua, Y. Zhu, H. Jiang, D. Feng, L. Tian, Supporting scalable and adaptive metadata management in ultra large-scale file systems. IEEE Trans. Parallel Distrib. Syst. (TPDS) 22, 580–593 (2011)

    Article  Google Scholar 

  2. J. Piernas, The design of new journaling file systems: the DualFS case. IEEE Trans. Comput. 56(2), 267–281 (2007)

    Article  MathSciNet  Google Scholar 

  3. S.A. Brandt, E.L. Miller, D.D.E. Long, L. Xue, Efficient metadata management in large distributed storage systems, in Proceedings of the MSST (2003)

    Google Scholar 

  4. D. Roselli, J.R. Lorch, T.E. Anderson, A comparison of file system workloads, in Proceedings of the Annual USENIX Technical Conference (2000)

    Google Scholar 

  5. L. Guy, P. Kunszt, E. Laure, H. Stockinger, K. Stockinger, Replica management in data grids, in Global Grid Forum, vol. 5 (2002)

    Google Scholar 

  6. S. Moon, T. Roscoe, Metadata management of terabyte datasets from an IP backbone network: experience and challenges, in Proceedings of the NRDM (2001)

    Google Scholar 

  7. M. Cai, M. Frank, B. Yan, R. MacGregor, A subscribable peer-to-peer RDF repository for distributed metadata management. J. Web Semant. Sci. Serv. Agents World Wide Web 2(2) (2005)

    Article  Google Scholar 

  8. C. Lukas, M. Roszkowski, The Isaac network: LDAP and distributed metadata for resource discovery, in Internet Scout Project (2001), http://scout.cs.wisc.edu/research/isaac/ldap.html

  9. D. Fisher, J. Sobolewski, T. Tyler, Distributed metadata management in the high performance storage system, in Proceedings of the IEEE Metadata Conference (1996)

    Google Scholar 

  10. A. Foster, C. Salisbury, S. Tuecke, The data grid: towards an architecture for the distributed management and analysis of large scientific datasets. J. Netw. Comput. Appl. 23, 187–200 (2001)

    Google Scholar 

  11. M. Zingler, Architectural components for metadata management in earth observation, in Proceedings of the IEEE Metadata Conference (1996)

    Google Scholar 

  12. B. Bloom, Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)

    Article  Google Scholar 

  13. P.J. Braam, Lustre whitepaper (2005), http://www.lustre.org

  14. P.F. Corbett, D.G. Feitelson, The vesta parallel file system. ACM Trans. Comput. Syst. 14(3), 225–264 (1996)

    Article  Google Scholar 

  15. P.J. Braam, P.A. Nelson, Removing bottlenecks in distributed file systems: Coda & intermezzo as examples, in Proceedings of the Linux Expo (1999)

    Google Scholar 

  16. T.E. Anderson, M.D. Dahlin, J.M. Neefe, D.A. Patterson, D.S. Roselli, R.Y. Wang, Serverless network file systems. ACM Trans. Comput. Syst. 14(1), 41–79 (1996)

    Article  Google Scholar 

  17. O. Rodeh, A. Teperman, zFS-a scalable distributed file system using object disks, in Proceedings of the MSST (2003), pp. 207–218

    Google Scholar 

  18. B. Pawlowski, C. Juszczak, P. Staubach, C. Smith, D. Lebel, D. Hitz, NFS version3: design and implementation, in Proceedings of the USENIX Technical Conference (1994), pp. 137–151

    Google Scholar 

  19. J.H. Morris, M. Satyanarayanan, M.H. Conner, J.H. Howard, D.S. Rosenthal, F.D. Smith, Andrew: a distributed personal computing environment. Commun. ACM 29(3), 184–201 (1986)

    Article  Google Scholar 

  20. M. Satyanarayanan, J.J. Kistler, P. Kumar, M.E. Okasaki, E.H. Siegel, D.C. Steere, Coda: a highly available file system for a distributed workstation environment. IEEE Trans. Comput. 39(4), 447–459 (1990)

    Article  Google Scholar 

  21. M.N. Nelson, B.B. Welch, J.K. Ousterhout, Caching in the sprite network file system. ACM Trans. Comput. Syst. 6(1), 134–154 (1988)

    Article  Google Scholar 

  22. A. Adya, R. Wattenhofer, W. Bolosky, M. Castro, G. Cermak, R. Chaiken, J. Douceur, J. Howell, J. Lorch, M. Theimer, Farsite: federated, available, and reliable storage for an incompletely trusted environment. ACM SIGOPS Oper. Syst. Rev. 36, 1–14 (2002)

    Article  Google Scholar 

  23. V. Cate, T. Gross, Combining the concepts of compression and caching for a two-level filesystem. ACM SIGARCH Comput. Archit. News 19(2), 200–211 (1991)

    Article  Google Scholar 

  24. S. Weil, K. Pollack, S.A. Brandt, E.L. Miller, Dynamic metadata management for petabyte-scale file systems, in Proceedings of the ACM/IEEE Supercomputing (2004)

    Google Scholar 

  25. S. Weil, S.A. Brandt, E.L. Miller, D.D.E. Long, C. Maltzahn, Ceph: a scalable, high-performance distributed file system, in Proceedings of the OSDI (2006)

    Google Scholar 

  26. S. Weil, S.A. Brandt, E.L. Miller, C. Maltzahn, Crush: controlled, scalable, decentralized placement of replicated data, in Proceedings of the ACM/IEEE Supercomputing (2006)

    Google Scholar 

  27. R.J. Honicky, E.L. Miller, Replication under scalable hashing: a family of algorithms for scalable decentralized data distribution, in Proceedings of the IEEE IPDPS (2004)

    Google Scholar 

  28. L. Fan, P. Cao, J. Almeida, A.Z. Broder, Summary cache: a scalable wide area web cache sharing protocol. IEEE/ACM Trans. Netw. 8(3), 281–293 (2000)

    Article  Google Scholar 

  29. A. Chervenak, N. Palavalli, S. Bharathi, C. Kesselman, R. Schwartzkopf, Performance and scalability of a replica location service, in Proceedings of the HPDC (2004)

    Google Scholar 

  30. Y. Zhu, H. Jiang, J. Wang, F. Xian, HBA: distributed metadata management for large cluster-based storage systems. IEEE Trans. Parallel Distrib. Syst. 19(4), 1–14 (2008)

    Article  Google Scholar 

  31. A. Broder, M. Mitzenmacher, Network applications of Bloom filters: a survey. Internet Math. 1, 485–509 (2005)

    Article  MathSciNet  Google Scholar 

  32. E. Riedel, M. Kallahalla, R. Swaminathan, A framework for evaluating storage system security, in Proceedings of the FAST (2002), pp. 15–30

    Google Scholar 

  33. Y. Zhu, H. Jiang, False rate analysis of Bloom filter replicas in distributed systems, in Proceedings of the ICPP (2006), pp. 255–262

    Google Scholar 

  34. D. Ellard, J. Ledlie, P. Malkani, M. Seltzer, Passive NFS tracing of email and research workloads, in Proceedings of the FAST (2003), pp. 203–216

    Google Scholar 

  35. J. Kubiatowicz, D. Bindel, Y. Chen, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C. Wells, B. Zhao, Oceanstore: an architecture for global-scale persistent storage, in Proceedings of the ACM ASPLOS (2000)

    Google Scholar 

  36. A. Azagury, V. Dreizin, M. Factor, E. Henis, D. Naor, N. Rinetzky, O. Rodeh, J. Satran, A. Tavory, L. Yerushalmi, Towards an object store, in Proceedings of the MSST, pp. 165–176, Apr 2003

    Google Scholar 

  37. B. Welch, G. Gibson, Managing scalability in object storage systems for HPC Linux clusters, in Proceedings of the MSST, pp. 433–445, Apr 2004

    Google Scholar 

  38. E.L. Miller, R.H. Katz, RAMA: an easy-to-use, high-performance parallel file system, in Parallel Computing, vol. 23 (1997)

    Article  Google Scholar 

  39. P. Carns, W. Ligon III, R. Ross, R. Thakur, PVFS: a parallel file system for Linux clusters, in Proceedings of the Annual Linux Showcase and Conference (2000), pp. 317–327

    Google Scholar 

  40. N. Nieuwejaar, D. Kotz, The Galley Parallel File System (ACM Press, New York, NY, USA, 1996)

    Book  Google Scholar 

  41. A.W. Leung, M. Shao, T. Bisson, S. Pasupathy, E.L. Miller, Spyglass: fast, scalable metadata search for large-scale storage systems. Technical Report UCSC-SSRC-08-01 (2008)

    Google Scholar 

  42. A. Sweeney, D. Doucette, W. Hu, C. Anderson, M. Nishimoto, G. Peck, Scalability in the XFS file system, in Proceedings of the USENIX Technical Conference (1996), pp. 1–14

    Google Scholar 

  43. M. Mitzenmacher, Compressed Bloom filters. IEEE/ACM Trans. Netw. 10(5), 604–612 (2002)

    Article  Google Scholar 

  44. A. Kumar, J. Xu, E.W. Zegura, Efficient and scalable query routing for unstructured peer-to-peer networks, in Proceedings of the INFOCOM (2005)

    Google Scholar 

  45. C. Saar, M. Yossi, Spectral Bloom filters, in Proceedings of the SIGMOD (2003)

    Google Scholar 

  46. Y. Zhang, D. Li, L. Chen, X. Lu, Collaborative search in large-scale unstructured peer-to-peer networks, in Proceedings of the ICPP (2007)

    Google Scholar 

  47. F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, G. Varghese, Beyond Bloom filters: from approximate membership checks to approximate state machines, in Proceedings of the SIGCOMM (2006)

    Google Scholar 

  48. D. Guo, J. Wu, H. Chen, X. Luo, Theory and network application of dynamic Bloom filters, in Proceedings of the INFOCOM (2006)

    Google Scholar 

  49. B. Xiao, Y. Hua, Using parallel Bloom filters for multi-attribute representation on network services. IEEE Trans. Parallel Distrib. Syst. 21, 20–32 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Hua .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Hua, Y., Liu, X. (2019). Hashing Computation for Scalable Metadata. In: Searchable Storage in Cloud Computing. Springer, Singapore. https://doi.org/10.1007/978-981-13-2721-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-2721-6_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-2720-9

  • Online ISBN: 978-981-13-2721-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics