Advertisement

Online Data Structures in External Memory

  • Jeffrey Scott Vitter
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1663)

Abstract

The data sets for many of today’s computer applications are too large to fit within the computer’s internal memory and must instead be stored on external storage devices such as disks. A major performance bottleneck can be the input/output communication (or I/O) between the external and internal memories. In this paper we discuss a variety of on-line data structures for external memory—some very old and some very new—such as hashing (for dictionaries), B-trees (for dictionaries and 1-D range search), buffer trees (for batched dynamic problems), interval trees with weight-balanced B-trees (for stabbing queries), priority search trees (for 3-sided 2-D range search), and R-trees and other spatial structures. We also discuss several open problems along the way.

Keywords

Range Query Range Search External Memory Internal Memory Query Performance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    P. K. Agarwal, L. Arge, G. S. Brodal, and J. S. Vitter. I/O-efficient dynamic point location in monotone planar subdivisions. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 11–20, 1999.Google Scholar
  2. 2.
    P. K. Agarwal, L. Arge, J. Erickson, P. G. Franciosa, and J. S. Vitter. Efficient searching with linear constraints. In Proc. 17th ACM Symposium on Principles of Database Systems, 169–178, 1998.Google Scholar
  3. 3.
    P. K. Agarwal, L. Arge, T. M. Murali, K. Varadarajan, and J. S. Vitter. I/O-e_cient algorithms for contour line extraction and planar graph blocking. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 117–126, 1998.Google Scholar
  4. 4.
    P. K. Agarwal and J. Erickson. Geometric range searching and its relatives. In B. Chazelle, J. E. Goodman, and R. Pollack, editors, Advances in Discrete and Computational Geometry, volume 23 of Contemporary Mathematics, 1–56. AMS Press, Providence, RI, 1999.CrossRefGoogle Scholar
  5. 5.
    L. Arge. The buffer tree: A new technique for optimal I/O-algorithms. In Proceedings of the Workshop on Algorithms and Data Structures, volume 955 of Lecture Notes in Computer Science, 334–345. Springer-Verlag, 1995. A complete version appears as BRICS technical report RS-96-28, University of Aarhus.CrossRefGoogle Scholar
  6. 6.
    L. Arge. K. H. Hinrichs, J. Vahrenhold, and J. S. Vitter, Efficient bulk operations on dynamic R-trees. In Proceedings of the 1st Workshop on Algorithm Engineering and Experimentation, Baltimore, January 1999.Google Scholar
  7. 7.
    L. Arge, V. Samoladas, and J. S. Vitter. Two-dimensional indexability and optimal range search indexing. In Proceedings of the ACM Symposium Principles of Database Systems, Philadelphia, PA, May-June 1999.Google Scholar
  8. 8.
    L. Arge and J. S. Vitter. Optimal dynamic interval management in external memory. In Proceedings of the IEEE Symposium on Foundations of Computer Science, 560–569, Burlington, VT, October 1996.Google Scholar
  9. 9.
    R. A. Baeza-Yates. Expected behaviour of B+-trees under random insertions. Acta Informatica, 26(5), 439–472, 1989.MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    R. D. Barve, E. A. M. Shriver, P. B. Gibbons, B. K. Hillyer, Y. Matias, and J. S. Vitter. Modeling and optimizing I/O throughput of multiple disks on a bus: the long version. Technical report, Bell Labs, 1997.Google Scholar
  11. 11.
    R. D. Barve and J. S. Vitter. External memory algorithms with dynamically changing memory allocations: Long version. Technical Report CS-1998-09, Duke University, 1998.Google Scholar
  12. 12.
    R. Bayer and E. McCreight. Organization of large ordered indexes. Acta Inform., 1, 173–189, 1972.CrossRefGoogle Scholar
  13. 13.
    B. Becker, S. Gschwind, T. Ohler, B. Seeger, and P. Widmayer. An asymptotically optimal multiversion B-tree. The VLDB Journal, 5(4), 264–275, December 1996.CrossRefGoogle Scholar
  14. 14.
    N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The RB-tree: An efficient and robust access method for points and rectangles. In Proceedings of the SIGMOD International Conference on Management of Data, 322–331, 1990.Google Scholar
  15. 15.
    J. L. Bentley. Multidimensional divide and conquer. Communications of the ACM, 23(6), 214–229, 1980.MathSciNetzbMATHCrossRefGoogle Scholar
  16. 16.
    S. Berchtold, C. Böhm, and H.-P. Kriegel. Improving the query performance of high dimensional index structures by bulk load operations. In Proceedings of the International Conference on Extending Database Technology, 1998.Google Scholar
  17. 17.
    G. S. Brodal and J. Katajainen. Worst-case efficient external-memory priority queues. In Proceedings of the Scandinavian Workshop on Algorithms Theory, volume 1432 of Lecture Notes in Computer Science, 107–118, Stockholm, Sweden, July 1998. Springer-Verlag.Google Scholar
  18. 18.
    P. Callahan, M. T. Goodrich, and K. Ramaiyer. Topology B-trees and their applications. In Proceedings of the Workshop on Algorithms and Data Structures, volume 955 of Lecture Notes in Computer Science, 381–392. Springer-Verlag, 1995.CrossRefGoogle Scholar
  19. 19.
    B. Chazelle. Filtering search: a new approach to query-answering. SIAM Journal on Computing, 15, 703–724, 1986.MathSciNetzbMATHCrossRefGoogle Scholar
  20. 20.
    B. Chazelle. Lower bounds for orthogonal range searching: I. The reporting case. Journal of the ACM, 37(2), 200–212, April 1990.MathSciNetzbMATHCrossRefGoogle Scholar
  21. 21.
    B. Chazelle and H. Edelsbrunner. Linear space data structures for two types of range search. Discrete & Computational Geometry, 2, 113–126, 1987.MathSciNetzbMATHCrossRefGoogle Scholar
  22. 22.
    P. M. Chen, E. K. Lee, G. A. Gibson, R. H. Katz, and D. A. Patterson. RAID: high performance, reliable secondary storage. ACM Computing Surveys, 26(2), 145–185, June 1994.CrossRefGoogle Scholar
  23. 23.
    Y.-J. Chiang and C. T. Silva. External memory techniques for isosurface extraction in scientific visualization. In J. Abello and J. S. Vitter, editors, External Memory Algorithms and Visualization, Providence, RI, 1999. AMS Press.Google Scholar
  24. 24.
    D. Comer. The ubiquitous B-tree. Comput. Surveys, 11(2), 121–137, 1979.zbMATHCrossRefGoogle Scholar
  25. 25.
    H. Edelsbrunner. A new approach to rectangle intersections, part I. Int. J. Computer Mathematics, 13, 209–219, 1983.MathSciNetzbMATHCrossRefGoogle Scholar
  26. 26.
    H. Edelsbrunner. A new approach to rectangle intersections, part II. Int. J. Computer Mathematics, 13, 221–229, 1983.MathSciNetzbMATHCrossRefGoogle Scholar
  27. 27.
    R. J. Enbody and H. C. Du. Dynamic hashing schemes. ACM Computing Surveys, 20(2), 85–113, June 1988.CrossRefGoogle Scholar
  28. 28.
    G. Evangelidis, D. B. Lomet, and B. Salzberg. The hBΠ-tree: A multi-attribute index supporting concurrency, recovery and node consolidation. VLDB Journal, 6, 1–25, 1997.CrossRefGoogle Scholar
  29. 29.
    R. Fagin, J. Nievergelt, N. Pippinger, and H. R. Strong. Extendible hashing-a fast access method for dynamic files. ACM Transactions on Database Systems, 4(3), 315–344, 1979.CrossRefGoogle Scholar
  30. 30.
    P. Flajolet. On the performance evaluation of extendible hashing and trie searching. Acta Informatica, 20(4), 345–369, 1983.MathSciNetzbMATHCrossRefGoogle Scholar
  31. 31.
    W. Frakes and R. Baeza-Yates. editors. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, 1992.Google Scholar
  32. 32.
    V. Gaede and O. Günther. Multidimensional access methods. Computing Surveys, 30(2), 170–231, June 1998.CrossRefGoogle Scholar
  33. 33.
    G. A. Gibson, J. S. Vitter, and J. Wilkes. Report of the working group on storage I/O issues in large-scale computing. ACM Computing Surveys, 28(4), 779–793, December 1996.CrossRefGoogle Scholar
  34. 34.
    D. Greene. An implementation and performance analysis of spatial data access methods. In Proceedings of the IEEE International Conference on Data Engineering, 606–615, 1989.Google Scholar
  35. 35.
    R. Grossi and G. F. Italiano. Efficient cross-trees for external memory. In J. Abello and J. S. Vitter, editors, External Memory Algorithms and Visualization. AMS Press, Providence, RI, 1999.Google Scholar
  36. 36.
    R. Grossi and G. F. Italiano. Efficient splitting and merging algorithms for order decomposable problems. Information and Computation, in press. An earlier version appears in Proceedings of the 24th International Colloquium on Automata, Languages and Programming, volume 1256 of Lecture Notes in Computer Science, Springer Verlag, 605–615, 1997.Google Scholar
  37. 37.
    A. Guttman. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD Conference on Management of Data, 47–57, 1985.Google Scholar
  38. 38.
    J. M. Hellerstein, E. Koutsoupias, and C. H. Papadimitriou. On the analysis of indexing schemes. In Proceedings of the 16th ACM Symposium on Principles of Database Systems, 249–256, Tucson, AZ, May 1997.Google Scholar
  39. 39.
    L. Hellerstein, G. Gibson, R. M. Karp, R. H. Katz, and D. A. Patterson. Coding techniques for handling failures in large disk arrays. Algorithmica, 12(2-3), 182–208, 1994.MathSciNetCrossRefGoogle Scholar
  40. 40.
    K. H. Hinrichs. The grid file system: Implementation and case studies of applications. PhD thesis, Dept. Information Science, ETH, Zürich, 1985.Google Scholar
  41. 41.
    I. Kamel and C. Faloutsos. On packing R-trees. In Proceedings of the 2nd International Conference on Information and Knowledge Management, 490–499, 1993.Google Scholar
  42. 42.
    I. Kamel and C. Faloutsos. Hilbert R-tree: An improved R-tree using fractals. In Proceedings of the 20th International Conference on Very Large Databases, 500–509, 1994.Google Scholar
  43. 43.
    I. Kamel, M. Khalil, and V. Kouramajian. Bulk insertion in dynamic R-trees. In Proceedings of the 4th International Symposium on Spatial Data Handling, 3B, 31–42, 1996.Google Scholar
  44. 44.
    P. C. Kanellakis, S. Ramaswamy, D. E. Vengroff, and J. S. Vitter. Indexing for data models with constraints and classes. Journal of Computer and System Science, 52(3), 589–612, 1996.MathSciNetzbMATHCrossRefGoogle Scholar
  45. 45.
    K. V. R. Kanth and A. K. Singh. Optimal dynamic range searching in non-replicating index structures. In Proceedings of the 7th International Conference on Database Theory, Jerusalem, January 1999.Google Scholar
  46. 46.
    D. E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming. Addison-Wesley, Reading MA, second edition, 1998.Google Scholar
  47. 47.
    E. Koutsoupias and D. S. Taylor. Tight bounds for 2-dimensional indexing schemes. In Proceedings of the 17th ACM Symposium on Principles of Database Systems, Seattle, WA, June 1998.Google Scholar
  48. 48.
    R. Krishnamurthy and K.-Y. Wang. Multilevel grid files. Tech. Report, IBM T. J. Watson Center, Yorktown Heights, NY, November 1985.Google Scholar
  49. 49.
    K. Küspert. Storage utilization in B*-trees with a generalized overflow technique. Acta Informatica, 19, 35–55, 1983.MathSciNetzbMATHCrossRefGoogle Scholar
  50. 50.
    W. Litwin. Linear hashing: A new tool for files and tables addressing. In International Conference On Very Large Data Bases, 212–223, Montreal, Quebec, Canada, October 1980.Google Scholar
  51. 51.
    D. B. Lomet and B. Salzberg. The hB-tree: a multiattribute indexing method with good guaranteed performance. ACM Transactions on Database Systems, 15(4), 625–658, 1990.CrossRefGoogle Scholar
  52. 52.
    D. B. Lomet and B. Salzberg. Concurrency and recovery for index trees. The VLDB Journal, 6(3), 224–240, 1997.CrossRefGoogle Scholar
  53. 53.
    E. M. McCreight. Priority search trees. SIAM Journal on Computing, 14(2), 257–276, May 1985.MathSciNetzbMATHCrossRefGoogle Scholar
  54. 54.
    H. Mendelson. Analysis of extendible hashing. IEEE Transactions on Software Engineering, SE-8, 611–619, November 1982.CrossRefGoogle Scholar
  55. 55.
    J. Nievergelt, H. Hinterberger, and K. C. Sevcik. The grid file: An adaptable, symmetric multi key file structure. ACM Trans. Database Syst., 9, 38–71, 1984.CrossRefGoogle Scholar
  56. 56.
    J. Nievergelt and P. Widmayer. Spatial data structures: Concepts and design choices. In M. van Kreveld, J. Nievergelt, T. Roos, and P. Widmayer, editors, Algorithmic Foundations of GIS, volume 1340 of Lecture Notes in Computer Science. Springer-Verlag, 1997.Google Scholar
  57. 57.
    S. Ramaswamy and S. Subramanian. Path caching: a technique for optimal external searching. Proceedings of the 13th ACM Conference on Principles of Database Systems, 1994.Google Scholar
  58. 58.
    J. T. Robinson. The k-d-b-tree: a search structure for large multidimensional dynamic indexes. In Proc. ACM Conference Principles Database Systems, 10–18, 1981.Google Scholar
  59. 59.
    C. Ruemmler and J. Wilkes. An introduction to disk drive modeling. IEEE Computer, 17–28, March 1994.Google Scholar
  60. 60.
    V. Samoladas and D. Miranker. A lower bound theorem for indexing schemes and its application to multidimensional range queries. In Proc. 17th ACM Conf. on Princ. of Database Systems, Seattle, WA, June 1998.Google Scholar
  61. 61.
    B. Seeger and H.-P. Kriegel. The buddy-tree: An efficient and robust access method for spatial data base systems. In Proc. 16th VLDB Conference, 590–601, 1990.Google Scholar
  62. 62.
    E. Shriver, A. Merchant, and J. Wilkes. An analytic behavior model for disk drives with readahead caches and request reordering. In Joint International Conference on Measurement and Modeling of Computer Systems, June 1998.Google Scholar
  63. 63.
    S. Subramanian and S. Ramaswamy. The P-range tree: a new data structure for range searching in secondary memory. Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 1995.Google Scholar
  64. 64.
    J. van den Bercken, B. Seeger, and P. Widmayer. A generic approach to bulk loading multidimensional index structures. In Proceedings 23rd VLDB Conference, 406–415, 1997.Google Scholar
  65. 65.
    P. J. Varman and R. M. Verma. An efficient multiversion access structure. IEEE Transactions on Knowledge and Data Engineering, 9(3), 391–409, May/June 1997.CrossRefGoogle Scholar
  66. 66.
    D. E. Vengroff and J. S. Vitter. Efficient 3-d range searching in external memory. In Proceedings of the ACM Symposium on Theory of Computation, 192–201, Philadelphia, PA, May 1996.Google Scholar
  67. 67.
    J. S. Vitter. Efficient memory access in large-scale computation. In Proceedings of the 1991 Symposium on Theoretical Aspects of Computer Science, Lecture Notes in Computer Science. Springer-Verlag, 1991. Invited paper.Google Scholar
  68. 68.
    J. S. Vitter. External memory algorithms and data structures. In J. Abello and J. S. Vitter, editors, External Memory Algorithms and Visualization. AMS Press, Providence, RI, 1999. An updated version is available via the author’s web page http://www.cs.duke.edu/~jsv/.Google Scholar
  69. 69.
    J. S. Vitter and E. A. M. Shriver Algorithms for parallel memory I: Two-level memories. Algorithmica, 12(2-3), 110–147, 1994.MathSciNetzbMATHCrossRefGoogle Scholar
  70. 70.
    D. Willard and G. Lueker. Adding range restriction capability to dynamic data structures. Journal of the ACM, 32(3), 597–617, 1985.MathSciNetzbMATHCrossRefGoogle Scholar
  71. 71.
    A. C. Yao. On random 2-3 trees. Acta Informatica, 9, 159–170, 1978zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Jeffrey Scott Vitter
    • 1
    • 2
  1. 1.Department of Computer ScienceDuke University, Center for Geometric ComputingDurhamUSA
  2. 2.I.N.R.I.A. Sophia AntipolisFrance

Personalised recommendations