Advertisement

A survey of DBMS research issues in supporting very large tables

  • C. Mohan
Invited Talk
Part of the Lecture Notes in Computer Science book series (LNCS, volume 730)

Abstract

A number of interesting problems arise in supporting the efficient and flexible storage, maintenance and manipulation of large volumes of data (e.g., >100 gigabytes of data in a single table). Very large tables are becoming common. Typically, high availability is an important requirement for such data. The currently-popular relational DBMSs have been very slow in providing the needed support. To make it possible for RDBMSs to be deployed for managing many large enterprises' operational data and to support complex queries efficiently, these features are very crucial. We discuss some of the issues involved in improving the availability and efficient accessibility of partitioned tables via parallelism, fine-granularity locking, transient versioning and partition independence. We outline some solutions that have been proposed. These solutions relate to algorithms for index building, utilities for fuzzy backups, incremental recovery and reorganization, buffer management, transient versioning, concurrency control and record management.

Keywords

Global Index Concurrency Control Large Data Base Query Execution Index Tree 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. AgIS93.
    Agrawal, R., Imielinski, T., Swami, A. Mining Association Rules Between Set of Items in Large Databases, Proc. ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 1993.Google Scholar
  2. BACCD90.
    Boral, H., Alexander, W., Clay, L., Copeland, G., Danforth, S., Franklin, M., Hart, B., Smith, M., Valduriez, P. Prototyping Bubba, a Highly Parallel Database System, IEEE Transactions on Knowledge and Data Engineering, Vol. 2, No. 1, March 1990.Google Scholar
  3. BoCa92.
    Bober, P., Carey, M. On Mixing Queries and Transactions Via Multiversion Locking, Proc. 8th International Conference on Data Engineering, Tempe, February 1992.Google Scholar
  4. Borr84.
    Borr, A. Robustness to Crash in a Distributed Database: A Non Shared-Memory Multi-Processor Approach, Proc. 10th International Conference on Very Large Data Bases, Singapore, August 1984.Google Scholar
  5. CaHL93.
    Carey, M., Haas, L., Livny, M. Tapes Hold Data, Too: Challenges of Tuples on Tertiary Store, Proc. ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 1993.Google Scholar
  6. CaKo92.
    Carino, F., Kostamaa, P. Exegesis of DBC/1012 and P-90 — Industrial Supercomputer Database Machines, Proc. 4th International PARLE Conference, Paris, June 1992, Springer-Verlag.Google Scholar
  7. CEHH90.
    Crus, R., Engles, R., Haderle, D., Herron, H. Method for Referential Constraint Enforcement in a Database Management System, U.S. Patent 4,947,320, IBM, August 1990.Google Scholar
  8. CHHIM91.
    Cheng, J., Haderle, D., Hedges, R., Iyer, B., Messinger, T., Mohan, C., Wang, Y. An Efficient Hybrid Join Algorithm: A DB2 Prototype, Proc. 7th International Conference on Data Engineering, Kobe, April 1991. A longer version of this paper is available as IBM Research Report RJ7884, IBM Almaden Research Center, December 1990.Google Scholar
  9. ChMo93.
    Choy, D., Mohan, C. Locking Protocols for Two-Tier Indexing of Partitioned Data, IBM Research Report, IBM Almaden Research Center, June 1993.Google Scholar
  10. CoKB89.
    Cohen, E., King, G., Brady, J. Storage Hierarchies, IBM Systems Journal, Vol. 28, No. 1, 1989.Google Scholar
  11. CrHT90.
    Crus, R., Haderle, D., Teng, J. Method for Minimizing Locking and Reading in a Segmented Storage Space, U.S. Patent 4,961,134, IBM, October 1990.Google Scholar
  12. Crus84.
    Crus, R. Data Recovery in IBM Database 2, IBM Systems Journal, Vol. 23, No. 2, 1984.Google Scholar
  13. Davi92.
    Davison, W. Parallel Index Building in Informix OnLine 6.0, Proc. ACM SIGMOD International Conference on Management of Data, San Diego, June 1992.Google Scholar
  14. DeGr90.
    DeWitt, D., Gray, J. Parallel Database Systems: The Future of Database Processing or a Passing Fad?, ACM SIGMOD Record, Volume 19, Number 4, Decemeber 1990.Google Scholar
  15. DGSBH90.
    DeWitt, D., Ghandeharizadeh, S., Schneider, D., Bricker, A., Hsiao, H.-I, Rasmussen, R. The Gamma Database Machine Project, IEEE Transactions on Knowledge and Data Engineering, Vol. 2, No. 1, March 1990.Google Scholar
  16. DIRY89.
    Dias, D., Iyer, B., Robinson, J., Yu, P. Integrated Concurrency-Coherency Controls for Multisystem Data Sharing, IEEE Transactions on Software Engineering, Vol. 15, No. 4, April 1989.Google Scholar
  17. FrPM91.
    Frawley, W., Piatetsky-Shapiro, G., Matheus, C. Knowledge Discovery in Databases: An Overview, In Knowledge Discovery in Databases, G. Piatetsky-Shapiro, W. Frawley (Eds.), The MIT Press, 1991.Google Scholar
  18. GaKi85.
    Gawlick, D., Kinkade, D. Varieties of Concurrency Control in IMS/VS Fast Path, IEEE Database Engineering, Vol. 8, No. 2, June 1985.Google Scholar
  19. GaPo90.
    Garcia-Molina, H., Polyzois, C. Issues in Disaster Recovery, Proc. IEEE Compcon Spring '90, March 1990.Google Scholar
  20. GMBLL81.
    Gray, J., McJones, P., Blasgen, M., Lindsay, B., Lorie, R., Price, T., Putzolu, F., Traiger, I. The Recovery Manager of the System R Database Manager, ACM Computing Surveys, Vol. 13, No. 2, June 1981.Google Scholar
  21. Gray78.
    Gray, J. Notes on Data Base Operating Systems, In Operating Systems — An Advanced Course, R. Bayer, R. Graham, and G. Seegmuller (Eds.), Lecture Notes in Computer Science, Volume 60, Springer-Verlag, 1978.Google Scholar
  22. GrWa90.
    Gray, J., Walker, M. Parity Striping of Disc Arrays: Low-Cost Reliable Storage with Acceptable Throughput, Proc. 16th International Conference on Very Large Data Bases, Brisbane, August 1990.Google Scholar
  23. HaJa84.
    Haderle, D., Jackson, R. IBM Database 2 Overview, IBM Systems Journal, Vol. 23, No. 2, 1984.Google Scholar
  24. HaSh92.
    Hauser, D., Shibamiya, A. Evolution of DB2 Performance, InfoDB, Summer 1992.Google Scholar
  25. HaWa90.
    Haderle, D., Watts, J. Method for Enforcing Referential Constraints in a Database Management System, U.S. Patent 4,933,848, IBM, June 1990.Google Scholar
  26. HSTMR91.
    Hvasshovd, S., Saeter, T., Torbjornsen, O., Moe, P., Risnes, O. A Continuously Available and Highly Scalable Transaction Server: Design Experience from the HypRa Project, Proc. 4th International Workshop on High Performance Transaction Systems, Asilomar, September 1991.Google Scholar
  27. IBM87.
    IMS/VS Extended Recovery Facility (XRF): General Information, Document Number GG24-3150, IBM, March 1987.Google Scholar
  28. IBM91.
    Database 2 — The Competitive Edge, Document Number G520-6905-00, IBM, November 1991.Google Scholar
  29. IBM92.
    DB2 V2.3 Nondistributed Performance Topics, Document Number GG24-3823, IBM, August 1992.Google Scholar
  30. IyDi90.
    Iyer, B., Dias, D. System Issues in Parallel Sorting for Database Systems, Proc. 6th IEEE International Conference on Data Engineering, Los Angeles, February 1990.Google Scholar
  31. KrIm91.
    Krishnamurthy, R., Imielinski, T. Research Directions in Knowledge Discovery, ACM SIGMOD Record, Volume 20, Number 3, September 1991.Google Scholar
  32. LoSa89.
    Lomet, D., Salzberg, B. Access Methods for Multiversion Data, Proc. ACM SIGMOD International Conference on Management of Data, Portland, May 1989.Google Scholar
  33. LoSa92.
    Lomet, D., Salzberg, B. Rollback Databases, Technical Report CRL 92/1, DEC Cambridge Research Laboratory, January 1992.Google Scholar
  34. Lyon90.
    Lyon, J. Tandem's Remote Data Facility, Proc. IEEE Compcon Spring '90, March 1990.Google Scholar
  35. MHLPS92.
    Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P. ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging, ACM Transactions on Database Systems, Vol. 17, No. 1, March 1992. Also available as IBM Research Report RJ6649, IBM Almaden Research Center, January 1989; Revised November 1990.Google Scholar
  36. MHWC90.
    Mohan, C., Haderle, D., Wang, Y., Cheng, J. Single Table Access Using Multiple Indexes: Optimization, Execution, and Concurrency Control Techniques, Proc. International Conference on Extending Data Base Technology, Venice, March 1990. An expanded version of this paper is available as IBM Research Report RJ7341, IBM Almaden Research Center, March 1990.Google Scholar
  37. Moha90a.
    Mohan, C. ARIES/KVL: A Key-Value Locking Method for Concurrency Control of Multiactton Transactions Operating on B-Tree Indexes, Proc. 16th International Conference on Very Large Data Bases, Brisbane, August 1990. A different version of this paper is available as IBM Research Report RJ7008, IBM Almaden Research Center, September 1989.Google Scholar
  38. Moha90b.
    Mohan, C. Comit_LSN: A Novel and Simple Method for Reducing Locking and Latching in Transaction Processing Systems, Proc. 16th International Conference on Very Large Data Bases, Brisbane, August 1990. Also available as IBM Research Report RJ7344, IBM Almaden Research Center, February 1990.Google Scholar
  39. Moha92.
    Mohan, C. Interactions Between Query Optimization and Concurrency Control, Proc. 2nd International Workshop on Research Issues on Data Engineering: Transaction and Query Processing, Tempe, February 1992. Also available as IBM Research Report RJ8681, IBM Almaden Research Center, March 1992.Google Scholar
  40. Moha93a.
    Mohan, C. IBM's Relational DBMS Products: Features and Technologies, Proc. ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 1993.Google Scholar
  41. Moha93b.
    Mohan, C. A Cost-Effective Method for Providing Improved Data Availability During DBMS Restart Recovery After a Failure, Proc. 19th International Conference on Very Large Data Bases, Dublin, August 1993. Also available as IBM Research Report RJ8114, IBM Almaden Research Center, May 1991.Google Scholar
  42. MoLe92.
    Mohan, C., Levine, F. ARIES/IM: An Efficient and High Concurrency Index Management Method Using Write-Ahead Logging, Proc. ACM SIGMOD International Conference on Management of Data, San Diego, June 1992. A longer version of this paper is available as IBM Research Report RJ6846, IBM Almaden Research Center, August 1989; Revised June 1991.Google Scholar
  43. MoNa91.
    Mohan, C., Narang, I. Recovery and Coherency-Control Protocols for Fast Intersystem Page Transfer and Fine-Granularity Locking in a Shared Disks Transaction Environment, Proc. 17th International Conference on Very Large Data Bases, Barcelona, September 1991. A longer version of this paper is available as IBM Research Report RJ8017, IBM Almaden Research Center, March 1991.Google Scholar
  44. MoNa92a.
    Mohan, C., Narang, I. Efficient Locking and Caching of Data in the Multisystem Shared Disks Transaction Environment, Proc. International Conference on Extending Data Base Technology, Vienna, March 1992. Also available as IBM Research Report RJ8301, IBM Almaden Research Center, August 1991.Google Scholar
  45. MoNa92b.
    Mohan, C., Narang, I. Data Base Recovery in Shared Disks and Client-Server Architectures, Proc. 12th International Conference on Distributed Computing Systems, Yokohama, June 1992. Also available as IBM Research Report RJ8685, IBM Almaden Research Center, March 1992.Google Scholar
  46. MoNa92c.
    Mohan, C., Narang, I. Algorithms for Creating Indexes for Very Large Tables Without Quiescing Updates, Proc. ACM SIGMOD International Conference on Management of Data, San Diego, June 1992. A longer version of this paper is available as IBM Research Report RJ8016, IBM Almaden Research Center, March 1991.Google Scholar
  47. MoNa93.
    Mohan, C., Narang, I. An Efficient and Flexible Method for Archiving a Data Base, Proc. ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 1993.Google Scholar
  48. MoNS91.
    Mohan, C., Narang, I., Silen, S. Solutions to Hot Spot Problems in a Shared Disks Transaction Environment, Proc. 4th International Workshop on High Performance Transaction Systems, Asilomar, September 1991. Also available as IBM Research Report RJ8281, IBM Almaden Research Center, August 1991.Google Scholar
  49. MoPi91.
    Mohan, C., Pirahesh, H. ARIES-RRH: Restricted Repeating of History in the ARIES Transaction Recovery Method, Proc. 7th International Conference on Data Engineering, Kobe, April 1991. Also available as IBM Research Report RJ7342, IBM Almaden Research Center, February 1990.Google Scholar
  50. MoPL92.
    Mohan, C., Pirahesh, H., Lorie, R. Efficient and Flexible Methods for Transient Versioning of Records to Avoid Locking by Read-Only Transactions, Proc. ACM SIGMOD International Conference on Management of Data, San Diego, June 1992. Also available as IBM Research Report RJ8683, IBM Almaden Research Center, March 1992.Google Scholar
  51. MoSo90.
    Moore, M., Sodhi, A. Parallelism in NonStop SQL Release 2, Tandem Systems Review, Vol. 6, No. 2, October 1990.Google Scholar
  52. MoTO93.
    Mohan, C., Treiber, K., Obermarck, R. Algorithms for the Management of Remote Backup Data Bases for Disaster Recovery, Proc. 9th International Conference on Data Engineering, Vienna, April 1993. Also available as IBM Research Report RJ7885, IBM Almaden Research Center, December 1990; Revised June 1991.Google Scholar
  53. OlRo89.
    Olken, F., Rotem, D. Random Sampling from B + -trees, Proc. 15th International Conference on Very Large Data Bases, Amsterdam, August 1989.Google Scholar
  54. Omer92.
    Omerza, R. United Parcel Service DIALS Overview, Proc. 4th Annual International DB2 User Group Conference, New York, May 1992.Google Scholar
  55. PaGK88.
    Patterson, D., Gibson, G., Katz, R. A Case for Redundant Arrays of Inexpensive Disks (RAID), Proc. ACM-SIGMOD International Conference on Management of Data, Chicago, May 1988.Google Scholar
  56. PMCLS90.
    Pirahesh, H., Mohan, C., Cheng, J., Liu, T.S., Selinger, P. Parallelism in Relational Data Base Systems: Architectural Issues and Design Approaches, Proc. 2nd International Symposium on Databases in Parallel and Distributed Systems, Dublin, July 1990, IEEE Computer Society Press. An expanded version of this paper is available as IBM Research Report RJ7724, IBM Almaden Research Center, October 1990.Google Scholar
  57. Poly92.
    Polyzois, C. Disaster Recovery for Transaction Processing Systems, PhD Thesis, Princeton University, June 1992.Google Scholar
  58. Pong90.
    Pong, M. An Overview of NonStop SQL Release 2, Tandem Systems Review, Vol. 6, No. 2, October 1990.Google Scholar
  59. Rahm91.
    Rahm, E. Recovery Concepts for Data Sharing Systems, Proc. 21st International Symposium on Fault-Tolerant Computing, Montreal, June 1991.Google Scholar
  60. Rahm93.
    Rahm, E. Parallel Query Processing in Shared Disk Database Systems, Technical Report 1/93, University of Kaiserslautern, March 1993.Google Scholar
  61. RaRe91.
    Raghavan, A., Rengarajan, T.K. Database Availability for Transaction Processing, Digital Technical Journal, Vol. 3, No. 1, Winter 1991.Google Scholar
  62. ReSW89.
    Rengarajan, T.K., Spiro, P., Wright, W. High Availability Mechanisms of VAX DBMS Software, Digital Technical Journal, No. 8, February 1989.Google Scholar
  63. RoMo89.
    Rothermel, K., Mohan, C. ARIES/NT: A Recovery Method Based on Write-Ahead Logging for Nested Transactions, Proc. 15th International Conference on Very Large Data Bases, Amsterdam, August 1989. A longer version appears as IBM Research Report RJ6650, IBM Almaden Research Center, January 1989.Google Scholar
  64. SaDi92.
    Salzberg, B., Dimock, A. Principles of Transaction-Based On-Line Reorganization, Proc. 18th International Conference on Very Large Data Bases, Vancouver, August 1992.Google Scholar
  65. Scru87.
    Scrutchin, T. TPF: Performance, Capacity, Availability, Proc. IEEE Compcon Spring '87, San Francisco, February 1987.Google Scholar
  66. SFGM93.
    Stonebraker, M., Frew, J., Gardels, K., Meredith, J. The Sequoia 2000 Storage Benchmark, Proc. ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 1993.Google Scholar
  67. SiSU91.
    Silberschatz, A., Stonebraker, M., Ullman, J. (Eds.) Database Systems: Achievements and Opportunities, Communications of the ACM, Volume 34, Number 10, October 1991.Google Scholar
  68. Smit90.
    Smith, G. Online Reorganization of Key-Sequenced Tables and Files, Tandem Systems Review, Vol. 6, No. 2, October 1990.Google Scholar
  69. SoIy93.
    Sockut, G., Iyer, B. Reorganizing Databases Concurrently with Usage: A Survey, Technical Report TR 03.488, IBM Santa Teresa Laboratory, June 1993.Google Scholar
  70. SrCa91.
    Srinivasan, V., Carey, M. On-Line Index Construction Algorithms, Proc. 4th International Workshop on High Performance Transaction Systems, Asilomar, September 1991.Google Scholar
  71. Srin92.
    Srinivasan, V. On-Line Processing in Large-Scale Transaction Systems, PhD Thesis, Technical Report 1071, University of Wisconsin at Madison.Google Scholar
  72. Ston87.
    Stonebraker, M. The Design of the POSTGRES Storage System, Proc. 13th International Conference on Very Large Data Bases, Brighton, September 1987.Google Scholar
  73. Ston90.
    Stonebraker, M. Architecture of Future Data Base Systems, Data Engineering, Volume 13, Number 4, Decemeber 1990.Google Scholar
  74. Ston91.
    Stonebraker, M. Managing Persistent Objects in a Multi-Level Store, Proc. ACM-SIGMOD International Conference on Management of Data, Denver, May 1991.Google Scholar
  75. TeGu84.
    Teng, J., Gumaer, R. Managing IBM Database 2 Buffers to Maximize Performance, IBM Systems Journal, Vol. 23, No. 2, 1984.Google Scholar
  76. Tsur90.
    Tsur, S. Data Dredging, Data Engineering, Volume 13, Number 4, Decemeber 1990.Google Scholar
  77. WiCK93.
    Witowski, A., Carino, F., Kostamma, P. NCR 3700 — The Next Generation Industrial Database Computer, Proc. 19th International Conference on Very Large Data Bases, Dublin, August 1993.Google Scholar
  78. Youn93.
    Young, C. A 1.4 Terabyte Database Faces Utilities, Proc. 5th Annual IDUG North American Conference, Dallas, May 1993.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1993

Authors and Affiliations

  • C. Mohan
    • 1
  1. 1.Data Base Technology InstituteIBM Almaden Research CenterSan JoseUSA

Personalised recommendations