Minimal and Monotone Minimal Perfect Hash Functions

Boldi, Paolo

doi:10.1007/978-3-662-48057-1_1

Paolo Boldi¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9234))

Included in the following conference series:

International Symposium on Mathematical Foundations of Computer Science

760 Accesses

Abstract

A minimal perfect hash function (MPHF) is a (data structure providing a) bijective map from a set S of n keys to the set of the first n natural numbers. In the static case (i.e., when the set S is known in advance), there is a wide spectrum of solutions available, offering different trade-offs in terms of construction time, access time and size of the data structure. MPHFs have been shown to be useful to compress data in several data management tasks. In particular, order-preserving minimal perfect hash functions have been used to retrieve the position of a key in a given list of keys: however, the ability to preserve any given order leads to an unavoidable \(\varOmega (n \log n)\) lower bound on the number of bits required to store the function. Recently, it was observed that very frequently the keys to be hashed are sorted in their intrinsic (i.e., lexicographical) order. This is typically the case of dictionaries of search engines, list of URLs of web graphs, etc. MPHFs that preserve the intrinsic order of the keys are called monotone (MMPHF). The problem of building MMPHFs is more recent and less studied (for example, no lower bounds are known) but once more there is a wide spectrum of solutions available, by now. In this paper, we survey some of the most practical techniques and tools for the construction of MPHFs and MMPHFs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
I write \(\left( \begin{array}{l}X \\ t \end{array}\right) \) for the set of subsets of X of cardinality t.
2.
The sub-hypergraph induced by \(X \subseteq V\) is \((X,E_X)\) where \(E_X = E \cap {\left( \begin{array}{l}X \\ t \end{array}\right) }\).
3.
The degree of a vertex is the number of hyperedges including it.
4.
Keeping only the (at most) n non-zero \(a^*_i\) and storing their indices in an array on which a rank/select structure is provided.
5.
In [20] a variant is also discussed that directly produces a MPHF, but its construction time is no longer linear in expectation.
6.
In order to highlight better the differences between the various approaches, in this section I consider the long-keys scenario.
7.
The last bucket may, of course, be smaller than b.
8.
If \(N_{\gamma }\) is a sequence of 1s, \(N_{\gamma }^+\) will not be added to the set.
9.
If \({\text {exit}}(x)\) is the root, the algorithm will return \(i=-1\), so it is still true that \(x[:i+1]\) is the name of the exit node.

References

Fredman, M.L., Komlós, J., Szemerédi, E.: Storing a sparse table with \(O(1)\) worst case access time. J. Assoc. Comput. Mach. 31, 538–544 (1984)
Article MathSciNet MATH Google Scholar
Fredman, M.L., Komlós, J.: On the size of separating systems and families of perfect hash functions. SIAM J. Algebr. Discret. Methods 5, 61–68 (1984)
Article MATH Google Scholar
Fox, E.A., Chen, Q.F., Daoud, A.M., Heath, L.S.: Order-preserving minimal perfect hash functions and information retrieval. ACM Trans. Inf. Sys. 9, 281–308 (1991)
Article Google Scholar
Majewski, B.S., Wormald, N.C., Havas, G., Czech, Z.J.: A family of perfect hashing methods. Comput. J. 39, 547–554 (1996)
Article Google Scholar
Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Monotone minimal perfect hashing: Searching a sorted table with \(O(1)\) accesses. In: Proceedings of the 20th Annual ACM-SIAM Symposium On Discrete Mathematics (SODA), pp. 785–794, New York, ACM Press (2009)
Google Scholar
Boldi, P., Vigna, S.: The WebGraph framework i: compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004), pp. 595–601, Manhattan, USA, ACM Press (2004)
Google Scholar
Boldi, P., Rosa, M., Santini, M., Vigna, S.: Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Srinivasan, S., Ramamritham, K., Kumar, A., Ravindra, M.P., Bertino, E., Kumar, R. (eds.) Proceedings of the 20th International Conference on World Wide Web, pp. 587–596. ACM (2011)
Google Scholar
Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co. Inc., Boston (1999)
Google Scholar
Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Theory and practise of monotone minimal perfect hashing. In: Proceedings of the Tenth Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 132–144. SIAM (2009)
Google Scholar
Knuth, D.E.: The Art of Computer Programming. Addison-Wesley, Boston (1973)
Google Scholar
Mitzenmacher, M., Vadhan, S.: Why simple hash functions work: exploiting the entropy in a data stream. In: Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, pp. 746–755. Society for Industrial and Applied Mathematics, Philadelphia (2008)
Google Scholar
Jacobson, G.: Space-efficient static trees and graphs. In: 30th Annual Symposium on Foundations of Computer Science (FOCS 1989), pp. 549–554. IEEE Computer Society Press, Research Triangle Park, North Carolina (1989)
Google Scholar
Patrascu, M.: Succincter. In: 49th Annual IEEE Symposium on Foundations of Computer Science, pp. 305–313. IEEE Computer Society (2008)
Google Scholar
Vigna, S.: Broadword implementation of rank/select queries. In: McGeoch, C.C. (ed.) WEA 2008. LNCS, vol. 5038, pp. 154–168. Springer, Heidelberg (2008)
Chapter Google Scholar
Gog, S., Petri, M.: Optimized succinct data structures for massive data. Software: Practice and Experience (2014). To appear
Google Scholar
Bloom, B.H.: Space-time trade-offs in hash coding with allowable errors. Commun. ACM 13, 422–426 (1970)
Article MATH Google Scholar
Mehlhorn, K.: Data Structures and Algorithms 1: Sorting and Searching. EATCS monographs on theoretical computer science, vol. 1. Springer, Heidelberg (1984)
Book MATH Google Scholar
Hagerup, T., Tholey, T.: Efficient minimal perfect hashing in nearly minimal space. In: Ferreira, A., Reichel, H. (eds.) STACS 2001. LNCS, vol. 2010, pp. 317–326. Springer, Heidelberg (2001)
Chapter Google Scholar
Chazelle, B., Kilian, J., Rubinfeld, R., Tal, A.: The Bloomier filter: an efficient data structure for static support lookup tables. In: Munro, J.I. (ed.) Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, pp. 30–39. SIAM (2004)
Google Scholar
Belazzougui, D., Botelho, F.C., Dietzfelbinger, M.: Hash, displace, and compress. In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 682–693. Springer, Heidelberg (2009)
Chapter Google Scholar
Molloy, M.: Cores in random hypergraphs and Boolean formulas. Random Struct. Algorithms 27, 124–135 (2005)
Article MathSciNet MATH Google Scholar
Dietzfelbinger, M.: Design strategies for minimal perfect hash functions. In: Hromkovič, J., Královič, R., Nunkesser, M., Widmayer, P. (eds.) SAGA 2007. LNCS, vol. 4665, pp. 2–17. Springer, Heidelberg (2007)
Chapter Google Scholar
Fredriksson, K., Nikitin, F.: Simple compression code supporting random access and fast string matching. In: Demetrescu, C. (ed.) WEA 2007. LNCS, vol. 4525, pp. 203–216. Springer, Heidelberg (2007)
Chapter Google Scholar
Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Theory and practice of monotone minimal perfect hashing. ACM J. Exp. Algorithmic 16, 3.2:1–3.2:26 (2011)
MathSciNet Google Scholar
Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Fast prefix search in little space, with applications. In: de Berg, M., Meyer, U. (eds.) ESA 2010, Part I. LNCS, vol. 6346, pp. 427–438. Springer, Heidelberg (2010)
Chapter Google Scholar

Download references

Acknowledgements

I want to thank Sebastiano Vigna for his comments and insightful suggestions. This paper is partially funded by the Google Focused Award “Web Algorithmics for Large-Scale Data Analysis”.

Author information

Authors and Affiliations

Dipartimento di Informatica, Università degli Studi di Milano, Milan, Italy
Paolo Boldi

Authors

Paolo Boldi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paolo Boldi .

Editor information

Editors and Affiliations

Università di Roma "Tor Vergata", Rome, Italy
Giuseppe F Italiano
Università degli Studi di Milano, Milan, Italy
Giovanni Pighizzini
University of Edinburgh, Edinburgh, United Kingdom
Donald T. Sannella

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Boldi, P. (2015). Minimal and Monotone Minimal Perfect Hash Functions. In: Italiano, G., Pighizzini, G., Sannella, D. (eds) Mathematical Foundations of Computer Science 2015. MFCS 2015. Lecture Notes in Computer Science(), vol 9234. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48057-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-662-48057-1_1
Published: 11 August 2015
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48056-4
Online ISBN: 978-3-662-48057-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics