Skip to main content

Large-Scale Data Dictionaries Based on Hash Tables

  • Conference paper
Book cover Intelligent Distributed Computing, Systems and Applications

Part of the book series: Studies in Computational Intelligence ((SCI,volume 162))

  • 453 Accesses

Summary

Data dictionaries allow efficient transformation of repeating input values. The attention is focused on the analysis of voluminous lookup tables that store up to a few tens of millions of key-value pairs. Because of their compactness and search efficiency, hash tables turn out to provide the best solutions in such cases. This paper deals with performance issues of such structures and its main contribution is to take into consideration the effect of the multi-level memory hierarchies present in all the current computers. The paper enumerates and compares various choices and methods in order to give an indication how to choose the structure and the parameters of hash tables in case of large-scale, in-memory data dictionaries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lum, V.Y., Yuen, P.S.T., Dodd, M.: Key-to-address transform techniques: A fundamental performance study on large existing formatted files. Communications of the ACM 14(4), 228–239 (1971)

    Article  Google Scholar 

  2. Lum, V.Y.: General performance analysis of key-to-address transformation methods using an abstract file concept. Com. of the ACM 16(10), 603–612 (1973)

    Article  MATH  Google Scholar 

  3. Ramakrishna, M.V.: Hashing in Practice, Analysis of Hashing and Universal Hashing. In: Proc. of ACM SIGMOD Int. Conf. on Management of Data, pp. 191–199 (1988)

    Google Scholar 

  4. Juhász, S., Iváncsy, R.: Tracking Activity of Real Individuals in Web Logs. International Journal of Computer Science 2(3), 172–177 (2007)

    Google Scholar 

  5. Lengyel, L., Levendovszky, T., Charaf, H.: Normalizing OCL Constraints in UML Class Diagram-Based Metamodels - AND/OR Clauses. In: Proceedings of the IEEE EUROCON 2005, Belgrade, November 21-24, pp. 579–582 (2005)

    Google Scholar 

  6. Litwin, W.: Linear hashing: A new tool for file and table addressing. In: Proceedings of the Sixth International Conference on Very Large Data Bases, New York, pp. 212–223 (1980)

    Google Scholar 

  7. Mitzenmacher, M.: Good Hash Tables & Multiple Hash Functions. Dr. Dobbs Journal 336, 28–32 (2002), http://www.ddj.com/dept/architect/184405046

    Google Scholar 

  8. Owolabi, O.: Empirical studies of some hashing functions. Information & Software Technology 45(2), 109–112 (2003)

    Article  Google Scholar 

  9. van der Pas, R.: Memory Hierarchy in Cache-Based Systems, Technical report, High Performance Computing, Sun Microsystems, Inc. (2005), http://www.sun.com/blueprints/1102/817-0742.pdf

  10. Wulf, W.A., McKee, S.A.: Hitting the Memory Wall: Implications of the Obvious. Computer Architecture News 23, 20–24 (1995)

    Article  Google Scholar 

  11. Pagh, A., Pagh, R., Ruzic, M.: Linear probing with constant independence. In: Proceedings of the 39th ACM Symp. on Theory of Computing, San Diego, pp. 318–327 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Costin Badica Giuseppe Mangioni Vincenza Carchiolo Dumitru Dan Burdescu

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Juhász, S. (2008). Large-Scale Data Dictionaries Based on Hash Tables. In: Badica, C., Mangioni, G., Carchiolo, V., Burdescu, D.D. (eds) Intelligent Distributed Computing, Systems and Applications. Studies in Computational Intelligence, vol 162. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85257-5_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85257-5_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85256-8

  • Online ISBN: 978-3-540-85257-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics