Large-Scale Data Dictionaries Based on Hash Tables

Juhász, Sándor

doi:10.1007/978-3-540-85257-5_27

Sándor Juhász¹

Part of the book series: Studies in Computational Intelligence ((SCI,volume 162))

453 Accesses

Summary

Data dictionaries allow efficient transformation of repeating input values. The attention is focused on the analysis of voluminous lookup tables that store up to a few tens of millions of key-value pairs. Because of their compactness and search efficiency, hash tables turn out to provide the best solutions in such cases. This paper deals with performance issues of such structures and its main contribution is to take into consideration the effect of the multi-level memory hierarchies present in all the current computers. The paper enumerates and compares various choices and methods in order to give an indication how to choose the structure and the parameters of hash tables in case of large-scale, in-memory data dictionaries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lum, V.Y., Yuen, P.S.T., Dodd, M.: Key-to-address transform techniques: A fundamental performance study on large existing formatted files. Communications of the ACM 14(4), 228–239 (1971)
Article Google Scholar
Lum, V.Y.: General performance analysis of key-to-address transformation methods using an abstract file concept. Com. of the ACM 16(10), 603–612 (1973)
Article MATH Google Scholar
Ramakrishna, M.V.: Hashing in Practice, Analysis of Hashing and Universal Hashing. In: Proc. of ACM SIGMOD Int. Conf. on Management of Data, pp. 191–199 (1988)
Google Scholar
Juhász, S., Iváncsy, R.: Tracking Activity of Real Individuals in Web Logs. International Journal of Computer Science 2(3), 172–177 (2007)
Google Scholar
Lengyel, L., Levendovszky, T., Charaf, H.: Normalizing OCL Constraints in UML Class Diagram-Based Metamodels - AND/OR Clauses. In: Proceedings of the IEEE EUROCON 2005, Belgrade, November 21-24, pp. 579–582 (2005)
Google Scholar
Litwin, W.: Linear hashing: A new tool for file and table addressing. In: Proceedings of the Sixth International Conference on Very Large Data Bases, New York, pp. 212–223 (1980)
Google Scholar
Mitzenmacher, M.: Good Hash Tables & Multiple Hash Functions. Dr. Dobbs Journal 336, 28–32 (2002), http://www.ddj.com/dept/architect/184405046
Google Scholar
Owolabi, O.: Empirical studies of some hashing functions. Information & Software Technology 45(2), 109–112 (2003)
Article Google Scholar
van der Pas, R.: Memory Hierarchy in Cache-Based Systems, Technical report, High Performance Computing, Sun Microsystems, Inc. (2005), http://www.sun.com/blueprints/1102/817-0742.pdf
Wulf, W.A., McKee, S.A.: Hitting the Memory Wall: Implications of the Obvious. Computer Architecture News 23, 20–24 (1995)
Article Google Scholar
Pagh, A., Pagh, R., Ruzic, M.: Linear probing with constant independence. In: Proceedings of the 39th ACM Symp. on Theory of Computing, San Diego, pp. 318–327 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Automation and Applied Informatics, Budapest University of Technology and Economics, 1111 Budapest, Goldmann György tér 3. IV. em., Hungary
Sándor Juhász

Authors

Sándor Juhász
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Costin Badica Giuseppe Mangioni Vincenza Carchiolo Dumitru Dan Burdescu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Juhász, S. (2008). Large-Scale Data Dictionaries Based on Hash Tables. In: Badica, C., Mangioni, G., Carchiolo, V., Burdescu, D.D. (eds) Intelligent Distributed Computing, Systems and Applications. Studies in Computational Intelligence, vol 162. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85257-5_27

Download citation

DOI: https://doi.org/10.1007/978-3-540-85257-5_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85256-8
Online ISBN: 978-3-540-85257-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics