Skip to main content

Online Data Structures in External Memory

  • Conference paper
  • First Online:
Algorithms and Data Structures (WADS 1999)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1663))

Included in the following conference series:

Abstract

The data sets for many of today’s computer applications are too large to fit within the computer’s internal memory and must instead be stored on external storage devices such as disks. A major performance bottleneck can be the input/output communication (or I/O) between the external and internal memories. In this paper we discuss a variety of on-line data structures for external memory—some very old and some very new—such as hashing (for dictionaries), B-trees (for dictionaries and 1-D range search), buffer trees (for batched dynamic problems), interval trees with weight-balanced B-trees (for stabbing queries), priority search trees (for 3-sided 2-D range search), and R-trees and other spatial structures. We also discuss several open problems along the way.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. P. K. Agarwal, L. Arge, G. S. Brodal, and J. S. Vitter. I/O-efficient dynamic point location in monotone planar subdivisions. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 11–20, 1999.

    Google Scholar 

  2. P. K. Agarwal, L. Arge, J. Erickson, P. G. Franciosa, and J. S. Vitter. Efficient searching with linear constraints. In Proc. 17th ACM Symposium on Principles of Database Systems, 169–178, 1998.

    Google Scholar 

  3. P. K. Agarwal, L. Arge, T. M. Murali, K. Varadarajan, and J. S. Vitter. I/O-e_cient algorithms for contour line extraction and planar graph blocking. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 117–126, 1998.

    Google Scholar 

  4. P. K. Agarwal and J. Erickson. Geometric range searching and its relatives. In B. Chazelle, J. E. Goodman, and R. Pollack, editors, Advances in Discrete and Computational Geometry, volume 23 of Contemporary Mathematics, 1–56. AMS Press, Providence, RI, 1999.

    Chapter  Google Scholar 

  5. L. Arge. The buffer tree: A new technique for optimal I/O-algorithms. In Proceedings of the Workshop on Algorithms and Data Structures, volume 955 of Lecture Notes in Computer Science, 334–345. Springer-Verlag, 1995. A complete version appears as BRICS technical report RS-96-28, University of Aarhus.

    Chapter  Google Scholar 

  6. L. Arge. K. H. Hinrichs, J. Vahrenhold, and J. S. Vitter, Efficient bulk operations on dynamic R-trees. In Proceedings of the 1st Workshop on Algorithm Engineering and Experimentation, Baltimore, January 1999.

    Google Scholar 

  7. L. Arge, V. Samoladas, and J. S. Vitter. Two-dimensional indexability and optimal range search indexing. In Proceedings of the ACM Symposium Principles of Database Systems, Philadelphia, PA, May-June 1999.

    Google Scholar 

  8. L. Arge and J. S. Vitter. Optimal dynamic interval management in external memory. In Proceedings of the IEEE Symposium on Foundations of Computer Science, 560–569, Burlington, VT, October 1996.

    Google Scholar 

  9. R. A. Baeza-Yates. Expected behaviour of B+-trees under random insertions. Acta Informatica, 26(5), 439–472, 1989.

    Article  MathSciNet  MATH  Google Scholar 

  10. R. D. Barve, E. A. M. Shriver, P. B. Gibbons, B. K. Hillyer, Y. Matias, and J. S. Vitter. Modeling and optimizing I/O throughput of multiple disks on a bus: the long version. Technical report, Bell Labs, 1997.

    Google Scholar 

  11. R. D. Barve and J. S. Vitter. External memory algorithms with dynamically changing memory allocations: Long version. Technical Report CS-1998-09, Duke University, 1998.

    Google Scholar 

  12. R. Bayer and E. McCreight. Organization of large ordered indexes. Acta Inform., 1, 173–189, 1972.

    Article  Google Scholar 

  13. B. Becker, S. Gschwind, T. Ohler, B. Seeger, and P. Widmayer. An asymptotically optimal multiversion B-tree. The VLDB Journal, 5(4), 264–275, December 1996.

    Article  Google Scholar 

  14. N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The RB-tree: An efficient and robust access method for points and rectangles. In Proceedings of the SIGMOD International Conference on Management of Data, 322–331, 1990.

    Google Scholar 

  15. J. L. Bentley. Multidimensional divide and conquer. Communications of the ACM, 23(6), 214–229, 1980.

    Article  MathSciNet  MATH  Google Scholar 

  16. S. Berchtold, C. Böhm, and H.-P. Kriegel. Improving the query performance of high dimensional index structures by bulk load operations. In Proceedings of the International Conference on Extending Database Technology, 1998.

    Google Scholar 

  17. G. S. Brodal and J. Katajainen. Worst-case efficient external-memory priority queues. In Proceedings of the Scandinavian Workshop on Algorithms Theory, volume 1432 of Lecture Notes in Computer Science, 107–118, Stockholm, Sweden, July 1998. Springer-Verlag.

    Google Scholar 

  18. P. Callahan, M. T. Goodrich, and K. Ramaiyer. Topology B-trees and their applications. In Proceedings of the Workshop on Algorithms and Data Structures, volume 955 of Lecture Notes in Computer Science, 381–392. Springer-Verlag, 1995.

    Chapter  Google Scholar 

  19. B. Chazelle. Filtering search: a new approach to query-answering. SIAM Journal on Computing, 15, 703–724, 1986.

    Article  MathSciNet  MATH  Google Scholar 

  20. B. Chazelle. Lower bounds for orthogonal range searching: I. The reporting case. Journal of the ACM, 37(2), 200–212, April 1990.

    Article  MathSciNet  MATH  Google Scholar 

  21. B. Chazelle and H. Edelsbrunner. Linear space data structures for two types of range search. Discrete & Computational Geometry, 2, 113–126, 1987.

    Article  MathSciNet  MATH  Google Scholar 

  22. P. M. Chen, E. K. Lee, G. A. Gibson, R. H. Katz, and D. A. Patterson. RAID: high performance, reliable secondary storage. ACM Computing Surveys, 26(2), 145–185, June 1994.

    Article  Google Scholar 

  23. Y.-J. Chiang and C. T. Silva. External memory techniques for isosurface extraction in scientific visualization. In J. Abello and J. S. Vitter, editors, External Memory Algorithms and Visualization, Providence, RI, 1999. AMS Press.

    Google Scholar 

  24. D. Comer. The ubiquitous B-tree. Comput. Surveys, 11(2), 121–137, 1979.

    Article  MATH  Google Scholar 

  25. H. Edelsbrunner. A new approach to rectangle intersections, part I. Int. J. Computer Mathematics, 13, 209–219, 1983.

    Article  MathSciNet  MATH  Google Scholar 

  26. H. Edelsbrunner. A new approach to rectangle intersections, part II. Int. J. Computer Mathematics, 13, 221–229, 1983.

    Article  MathSciNet  MATH  Google Scholar 

  27. R. J. Enbody and H. C. Du. Dynamic hashing schemes. ACM Computing Surveys, 20(2), 85–113, June 1988.

    Article  Google Scholar 

  28. G. Evangelidis, D. B. Lomet, and B. Salzberg. The hBΠ-tree: A multi-attribute index supporting concurrency, recovery and node consolidation. VLDB Journal, 6, 1–25, 1997.

    Article  Google Scholar 

  29. R. Fagin, J. Nievergelt, N. Pippinger, and H. R. Strong. Extendible hashing-a fast access method for dynamic files. ACM Transactions on Database Systems, 4(3), 315–344, 1979.

    Article  Google Scholar 

  30. P. Flajolet. On the performance evaluation of extendible hashing and trie searching. Acta Informatica, 20(4), 345–369, 1983.

    Article  MathSciNet  MATH  Google Scholar 

  31. W. Frakes and R. Baeza-Yates. editors. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, 1992.

    Google Scholar 

  32. V. Gaede and O. Günther. Multidimensional access methods. Computing Surveys, 30(2), 170–231, June 1998.

    Article  Google Scholar 

  33. G. A. Gibson, J. S. Vitter, and J. Wilkes. Report of the working group on storage I/O issues in large-scale computing. ACM Computing Surveys, 28(4), 779–793, December 1996.

    Article  Google Scholar 

  34. D. Greene. An implementation and performance analysis of spatial data access methods. In Proceedings of the IEEE International Conference on Data Engineering, 606–615, 1989.

    Google Scholar 

  35. R. Grossi and G. F. Italiano. Efficient cross-trees for external memory. In J. Abello and J. S. Vitter, editors, External Memory Algorithms and Visualization. AMS Press, Providence, RI, 1999.

    Google Scholar 

  36. R. Grossi and G. F. Italiano. Efficient splitting and merging algorithms for order decomposable problems. Information and Computation, in press. An earlier version appears in Proceedings of the 24th International Colloquium on Automata, Languages and Programming, volume 1256 of Lecture Notes in Computer Science, Springer Verlag, 605–615, 1997.

    Google Scholar 

  37. A. Guttman. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD Conference on Management of Data, 47–57, 1985.

    Google Scholar 

  38. J. M. Hellerstein, E. Koutsoupias, and C. H. Papadimitriou. On the analysis of indexing schemes. In Proceedings of the 16th ACM Symposium on Principles of Database Systems, 249–256, Tucson, AZ, May 1997.

    Google Scholar 

  39. L. Hellerstein, G. Gibson, R. M. Karp, R. H. Katz, and D. A. Patterson. Coding techniques for handling failures in large disk arrays. Algorithmica, 12(2-3), 182–208, 1994.

    Article  MathSciNet  Google Scholar 

  40. K. H. Hinrichs. The grid file system: Implementation and case studies of applications. PhD thesis, Dept. Information Science, ETH, Zürich, 1985.

    Google Scholar 

  41. I. Kamel and C. Faloutsos. On packing R-trees. In Proceedings of the 2nd International Conference on Information and Knowledge Management, 490–499, 1993.

    Google Scholar 

  42. I. Kamel and C. Faloutsos. Hilbert R-tree: An improved R-tree using fractals. In Proceedings of the 20th International Conference on Very Large Databases, 500–509, 1994.

    Google Scholar 

  43. I. Kamel, M. Khalil, and V. Kouramajian. Bulk insertion in dynamic R-trees. In Proceedings of the 4th International Symposium on Spatial Data Handling, 3B, 31–42, 1996.

    Google Scholar 

  44. P. C. Kanellakis, S. Ramaswamy, D. E. Vengroff, and J. S. Vitter. Indexing for data models with constraints and classes. Journal of Computer and System Science, 52(3), 589–612, 1996.

    Article  MathSciNet  MATH  Google Scholar 

  45. K. V. R. Kanth and A. K. Singh. Optimal dynamic range searching in non-replicating index structures. In Proceedings of the 7th International Conference on Database Theory, Jerusalem, January 1999.

    Google Scholar 

  46. D. E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming. Addison-Wesley, Reading MA, second edition, 1998.

    Google Scholar 

  47. E. Koutsoupias and D. S. Taylor. Tight bounds for 2-dimensional indexing schemes. In Proceedings of the 17th ACM Symposium on Principles of Database Systems, Seattle, WA, June 1998.

    Google Scholar 

  48. R. Krishnamurthy and K.-Y. Wang. Multilevel grid files. Tech. Report, IBM T. J. Watson Center, Yorktown Heights, NY, November 1985.

    Google Scholar 

  49. K. Küspert. Storage utilization in B*-trees with a generalized overflow technique. Acta Informatica, 19, 35–55, 1983.

    Article  MathSciNet  MATH  Google Scholar 

  50. W. Litwin. Linear hashing: A new tool for files and tables addressing. In International Conference On Very Large Data Bases, 212–223, Montreal, Quebec, Canada, October 1980.

    Google Scholar 

  51. D. B. Lomet and B. Salzberg. The hB-tree: a multiattribute indexing method with good guaranteed performance. ACM Transactions on Database Systems, 15(4), 625–658, 1990.

    Article  Google Scholar 

  52. D. B. Lomet and B. Salzberg. Concurrency and recovery for index trees. The VLDB Journal, 6(3), 224–240, 1997.

    Article  Google Scholar 

  53. E. M. McCreight. Priority search trees. SIAM Journal on Computing, 14(2), 257–276, May 1985.

    Article  MathSciNet  MATH  Google Scholar 

  54. H. Mendelson. Analysis of extendible hashing. IEEE Transactions on Software Engineering, SE-8, 611–619, November 1982.

    Article  Google Scholar 

  55. J. Nievergelt, H. Hinterberger, and K. C. Sevcik. The grid file: An adaptable, symmetric multi key file structure. ACM Trans. Database Syst., 9, 38–71, 1984.

    Article  Google Scholar 

  56. J. Nievergelt and P. Widmayer. Spatial data structures: Concepts and design choices. In M. van Kreveld, J. Nievergelt, T. Roos, and P. Widmayer, editors, Algorithmic Foundations of GIS, volume 1340 of Lecture Notes in Computer Science. Springer-Verlag, 1997.

    Google Scholar 

  57. S. Ramaswamy and S. Subramanian. Path caching: a technique for optimal external searching. Proceedings of the 13th ACM Conference on Principles of Database Systems, 1994.

    Google Scholar 

  58. J. T. Robinson. The k-d-b-tree: a search structure for large multidimensional dynamic indexes. In Proc. ACM Conference Principles Database Systems, 10–18, 1981.

    Google Scholar 

  59. C. Ruemmler and J. Wilkes. An introduction to disk drive modeling. IEEE Computer, 17–28, March 1994.

    Google Scholar 

  60. V. Samoladas and D. Miranker. A lower bound theorem for indexing schemes and its application to multidimensional range queries. In Proc. 17th ACM Conf. on Princ. of Database Systems, Seattle, WA, June 1998.

    Google Scholar 

  61. B. Seeger and H.-P. Kriegel. The buddy-tree: An efficient and robust access method for spatial data base systems. In Proc. 16th VLDB Conference, 590–601, 1990.

    Google Scholar 

  62. E. Shriver, A. Merchant, and J. Wilkes. An analytic behavior model for disk drives with readahead caches and request reordering. In Joint International Conference on Measurement and Modeling of Computer Systems, June 1998.

    Google Scholar 

  63. S. Subramanian and S. Ramaswamy. The P-range tree: a new data structure for range searching in secondary memory. Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 1995.

    Google Scholar 

  64. J. van den Bercken, B. Seeger, and P. Widmayer. A generic approach to bulk loading multidimensional index structures. In Proceedings 23rd VLDB Conference, 406–415, 1997.

    Google Scholar 

  65. P. J. Varman and R. M. Verma. An efficient multiversion access structure. IEEE Transactions on Knowledge and Data Engineering, 9(3), 391–409, May/June 1997.

    Article  Google Scholar 

  66. D. E. Vengroff and J. S. Vitter. Efficient 3-d range searching in external memory. In Proceedings of the ACM Symposium on Theory of Computation, 192–201, Philadelphia, PA, May 1996.

    Google Scholar 

  67. J. S. Vitter. Efficient memory access in large-scale computation. In Proceedings of the 1991 Symposium on Theoretical Aspects of Computer Science, Lecture Notes in Computer Science. Springer-Verlag, 1991. Invited paper.

    Google Scholar 

  68. J. S. Vitter. External memory algorithms and data structures. In J. Abello and J. S. Vitter, editors, External Memory Algorithms and Visualization. AMS Press, Providence, RI, 1999. An updated version is available via the author’s web page http://www.cs.duke.edu/~jsv/.

    Google Scholar 

  69. J. S. Vitter and E. A. M. Shriver Algorithms for parallel memory I: Two-level memories. Algorithmica, 12(2-3), 110–147, 1994.

    Article  MathSciNet  MATH  Google Scholar 

  70. D. Willard and G. Lueker. Adding range restriction capability to dynamic data structures. Journal of the ACM, 32(3), 597–617, 1985.

    Article  MathSciNet  MATH  Google Scholar 

  71. A. C. Yao. On random 2-3 trees. Acta Informatica, 9, 159–170, 1978

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vitter, J.S. (1999). Online Data Structures in External Memory. In: Dehne, F., Sack, JR., Gupta, A., Tamassia, R. (eds) Algorithms and Data Structures. WADS 1999. Lecture Notes in Computer Science, vol 1663. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48447-7_35

Download citation

  • DOI: https://doi.org/10.1007/3-540-48447-7_35

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66279-2

  • Online ISBN: 978-3-540-48447-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics