Summary
Data dictionaries allow efficient transformation of repeating input values. The attention is focused on the analysis of voluminous lookup tables that store up to a few tens of millions of key-value pairs. Because of their compactness and search efficiency, hash tables turn out to provide the best solutions in such cases. This paper deals with performance issues of such structures and its main contribution is to take into consideration the effect of the multi-level memory hierarchies present in all the current computers. The paper enumerates and compares various choices and methods in order to give an indication how to choose the structure and the parameters of hash tables in case of large-scale, in-memory data dictionaries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lum, V.Y., Yuen, P.S.T., Dodd, M.: Key-to-address transform techniques: A fundamental performance study on large existing formatted files. Communications of the ACM 14(4), 228–239 (1971)
Lum, V.Y.: General performance analysis of key-to-address transformation methods using an abstract file concept. Com. of the ACM 16(10), 603–612 (1973)
Ramakrishna, M.V.: Hashing in Practice, Analysis of Hashing and Universal Hashing. In: Proc. of ACM SIGMOD Int. Conf. on Management of Data, pp. 191–199 (1988)
Juhász, S., Iváncsy, R.: Tracking Activity of Real Individuals in Web Logs. International Journal of Computer Science 2(3), 172–177 (2007)
Lengyel, L., Levendovszky, T., Charaf, H.: Normalizing OCL Constraints in UML Class Diagram-Based Metamodels - AND/OR Clauses. In: Proceedings of the IEEE EUROCON 2005, Belgrade, November 21-24, pp. 579–582 (2005)
Litwin, W.: Linear hashing: A new tool for file and table addressing. In: Proceedings of the Sixth International Conference on Very Large Data Bases, New York, pp. 212–223 (1980)
Mitzenmacher, M.: Good Hash Tables & Multiple Hash Functions. Dr. Dobbs Journal 336, 28–32 (2002), http://www.ddj.com/dept/architect/184405046
Owolabi, O.: Empirical studies of some hashing functions. Information & Software Technology 45(2), 109–112 (2003)
van der Pas, R.: Memory Hierarchy in Cache-Based Systems, Technical report, High Performance Computing, Sun Microsystems, Inc. (2005), http://www.sun.com/blueprints/1102/817-0742.pdf
Wulf, W.A., McKee, S.A.: Hitting the Memory Wall: Implications of the Obvious. Computer Architecture News 23, 20–24 (1995)
Pagh, A., Pagh, R., Ruzic, M.: Linear probing with constant independence. In: Proceedings of the 39th ACM Symp. on Theory of Computing, San Diego, pp. 318–327 (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Juhász, S. (2008). Large-Scale Data Dictionaries Based on Hash Tables. In: Badica, C., Mangioni, G., Carchiolo, V., Burdescu, D.D. (eds) Intelligent Distributed Computing, Systems and Applications. Studies in Computational Intelligence, vol 162. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85257-5_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-85257-5_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85256-8
Online ISBN: 978-3-540-85257-5
eBook Packages: EngineeringEngineering (R0)