Skip to main content

Minimal and Monotone Minimal Perfect Hash Functions

  • Conference paper
  • First Online:
Mathematical Foundations of Computer Science 2015 (MFCS 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9234))

  • 760 Accesses

Abstract

A minimal perfect hash function (MPHF) is a (data structure providing a) bijective map from a set S of n keys to the set of the first n natural numbers. In the static case (i.e., when the set S is known in advance), there is a wide spectrum of solutions available, offering different trade-offs in terms of construction time, access time and size of the data structure. MPHFs have been shown to be useful to compress data in several data management tasks. In particular, order-preserving minimal perfect hash functions have been used to retrieve the position of a key in a given list of keys: however, the ability to preserve any given order leads to an unavoidable \(\varOmega (n \log n)\) lower bound on the number of bits required to store the function. Recently, it was observed that very frequently the keys to be hashed are sorted in their intrinsic (i.e., lexicographical) order. This is typically the case of dictionaries of search engines, list of URLs of web graphs, etc. MPHFs that preserve the intrinsic order of the keys are called monotone (MMPHF). The problem of building MMPHFs is more recent and less studied (for example, no lower bounds are known) but once more there is a wide spectrum of solutions available, by now. In this paper, we survey some of the most practical techniques and tools for the construction of MPHFs and MMPHFs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    I write \(\left( \begin{array}{l}X \\ t \end{array}\right) \) for the set of subsets of X of cardinality t.

  2. 2.

    The sub-hypergraph induced by \(X \subseteq V\) is \((X,E_X)\) where \(E_X = E \cap {\left( \begin{array}{l}X \\ t \end{array}\right) }\).

  3. 3.

    The degree of a vertex is the number of hyperedges including it.

  4. 4.

    Keeping only the (at most) n non-zero \(a^*_i\) and storing their indices in an array on which a rank/select structure is provided.

  5. 5.

    In [20] a variant is also discussed that directly produces a MPHF, but its construction time is no longer linear in expectation.

  6. 6.

    In order to highlight better the differences between the various approaches, in this section I consider the long-keys scenario.

  7. 7.

    The last bucket may, of course, be smaller than b.

  8. 8.

    If \(N_{\gamma }\) is a sequence of 1s, \(N_{\gamma }^+\) will not be added to the set.

  9. 9.

    If \({\text {exit}}(x)\) is the root, the algorithm will return \(i=-1\), so it is still true that \(x[:i+1]\) is the name of the exit node.

References

  1. Fredman, M.L., Komlós, J., Szemerédi, E.: Storing a sparse table with \(O(1)\) worst case access time. J. Assoc. Comput. Mach. 31, 538–544 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  2. Fredman, M.L., Komlós, J.: On the size of separating systems and families of perfect hash functions. SIAM J. Algebr. Discret. Methods 5, 61–68 (1984)

    Article  MATH  Google Scholar 

  3. Fox, E.A., Chen, Q.F., Daoud, A.M., Heath, L.S.: Order-preserving minimal perfect hash functions and information retrieval. ACM Trans. Inf. Sys. 9, 281–308 (1991)

    Article  Google Scholar 

  4. Majewski, B.S., Wormald, N.C., Havas, G., Czech, Z.J.: A family of perfect hashing methods. Comput. J. 39, 547–554 (1996)

    Article  Google Scholar 

  5. Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Monotone minimal perfect hashing: Searching a sorted table with \(O(1)\) accesses. In: Proceedings of the 20th Annual ACM-SIAM Symposium On Discrete Mathematics (SODA), pp. 785–794, New York, ACM Press (2009)

    Google Scholar 

  6. Boldi, P., Vigna, S.: The WebGraph framework i: compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004), pp. 595–601, Manhattan, USA, ACM Press (2004)

    Google Scholar 

  7. Boldi, P., Rosa, M., Santini, M., Vigna, S.: Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Srinivasan, S., Ramamritham, K., Kumar, A., Ravindra, M.P., Bertino, E., Kumar, R. (eds.) Proceedings of the 20th International Conference on World Wide Web, pp. 587–596. ACM (2011)

    Google Scholar 

  8. Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co. Inc., Boston (1999)

    Google Scholar 

  9. Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Theory and practise of monotone minimal perfect hashing. In: Proceedings of the Tenth Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 132–144. SIAM (2009)

    Google Scholar 

  10. Knuth, D.E.: The Art of Computer Programming. Addison-Wesley, Boston (1973)

    Google Scholar 

  11. Mitzenmacher, M., Vadhan, S.: Why simple hash functions work: exploiting the entropy in a data stream. In: Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, pp. 746–755. Society for Industrial and Applied Mathematics, Philadelphia (2008)

    Google Scholar 

  12. Jacobson, G.: Space-efficient static trees and graphs. In: 30th Annual Symposium on Foundations of Computer Science (FOCS 1989), pp. 549–554. IEEE Computer Society Press, Research Triangle Park, North Carolina (1989)

    Google Scholar 

  13. Patrascu, M.: Succincter. In: 49th Annual IEEE Symposium on Foundations of Computer Science, pp. 305–313. IEEE Computer Society (2008)

    Google Scholar 

  14. Vigna, S.: Broadword implementation of rank/select queries. In: McGeoch, C.C. (ed.) WEA 2008. LNCS, vol. 5038, pp. 154–168. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  15. Gog, S., Petri, M.: Optimized succinct data structures for massive data. Software: Practice and Experience (2014). To appear

    Google Scholar 

  16. Bloom, B.H.: Space-time trade-offs in hash coding with allowable errors. Commun. ACM 13, 422–426 (1970)

    Article  MATH  Google Scholar 

  17. Mehlhorn, K.: Data Structures and Algorithms 1: Sorting and Searching. EATCS monographs on theoretical computer science, vol. 1. Springer, Heidelberg (1984)

    Book  MATH  Google Scholar 

  18. Hagerup, T., Tholey, T.: Efficient minimal perfect hashing in nearly minimal space. In: Ferreira, A., Reichel, H. (eds.) STACS 2001. LNCS, vol. 2010, pp. 317–326. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  19. Chazelle, B., Kilian, J., Rubinfeld, R., Tal, A.: The Bloomier filter: an efficient data structure for static support lookup tables. In: Munro, J.I. (ed.) Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, pp. 30–39. SIAM (2004)

    Google Scholar 

  20. Belazzougui, D., Botelho, F.C., Dietzfelbinger, M.: Hash, displace, and compress. In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 682–693. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  21. Molloy, M.: Cores in random hypergraphs and Boolean formulas. Random Struct. Algorithms 27, 124–135 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  22. Dietzfelbinger, M.: Design strategies for minimal perfect hash functions. In: Hromkovič, J., Královič, R., Nunkesser, M., Widmayer, P. (eds.) SAGA 2007. LNCS, vol. 4665, pp. 2–17. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  23. Fredriksson, K., Nikitin, F.: Simple compression code supporting random access and fast string matching. In: Demetrescu, C. (ed.) WEA 2007. LNCS, vol. 4525, pp. 203–216. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  24. Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Theory and practice of monotone minimal perfect hashing. ACM J. Exp. Algorithmic 16, 3.2:1–3.2:26 (2011)

    MathSciNet  Google Scholar 

  25. Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Fast prefix search in little space, with applications. In: de Berg, M., Meyer, U. (eds.) ESA 2010, Part I. LNCS, vol. 6346, pp. 427–438. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

Download references

Acknowledgements

I want to thank Sebastiano Vigna for his comments and insightful suggestions. This paper is partially funded by the Google Focused Award “Web Algorithmics for Large-Scale Data Analysis”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paolo Boldi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Boldi, P. (2015). Minimal and Monotone Minimal Perfect Hash Functions. In: Italiano, G., Pighizzini, G., Sannella, D. (eds) Mathematical Foundations of Computer Science 2015. MFCS 2015. Lecture Notes in Computer Science(), vol 9234. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48057-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-48057-1_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-48056-4

  • Online ISBN: 978-3-662-48057-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics