Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Master-detail clustering using merged indexes

Abstract

Merged indexes are B-trees that contain multiple traditional indexes and interleave their records based on a common sort order. In relational databases, merged indexes implement ‘‘master-detail clustering’’ of related records, e.g., orders and order details. Thus, merged indexes shift de-normalization from the logical level of tables and rows to the physical level of indexes and records, which is a much more appropriate place for it. For object-oriented applications, clustering can reduce the I/O cost for joining rows in related tables to a fraction compared to traditional indexes, with additional beneficial effects on buffer pool requirements.

Prior research has covered merged indexes without providing much guidance for their implementation. Enabling the design proposed here is a strict separation of B-tree and index into two layers of abstraction. In addition, this paper provides algorithms (i) for data layout including bitmap indexes and column stores, (ii) for concurrency control and recovery including locking of individual keys and of complex objects, (iii) for update operations including bulk insertions, bulk deletions, and deferred index maintenance, (iv) for adding and removing individual indexes within a merged index, (v) for enforcement of relational integrity constraints from uniqueness constraints to foreign key constraints, and (vi) for query processing including caching in query execution plans dominated by nested iteration and index navigation.

In the proposed design for merged indexes, the set of tables, views, and indexes can evolve without restriction. The set of clustering columns can also evolve freely. A relational query processor can search and update index records just as in traditional indexes. With these abilities, the proposed design may finally bring general master-detail clustering to traditional databases together with its advantages in performance and cost.

Zusammenfassung

Merged Indexes (eine Art physischer Sammelindex) sind spezielle B-Bäume, die mehrere herkömmliche Indexe vereinigen und ihre Einträge in einer gemeinsamen Sortierreihenfolge speichern. In relationalen Datenbanken implementieren Merged Indexes eine ,,hierarchische Clusterbildung‘‘ (master-detail clustering) zusammengehöriger Sätze, beispielsweise von Bestellungen und Bestellposten. Damit verlagern Merged Indexes die Denormalisierung von der logischen Ebene der Tabellen und Zeilen auf die physische Ebene der Indexe und Sätze, was eine viel besser geeignete Stelle dafür ist. Im Vergleich zum Einsatz herkömmlicher Indexe können bei objektorientierten Anwendungen durch Clusterbildung die E/A-Kosten für den Verbund in Beziehung stehender Tabellen auf einen Bruchteil gesenkt werden, wobei zusätzlich Spareffekte bei der Puffernutzung erzielt werden.

In früheren Publikationen werden Merged Indexes ohne spezielle Hinweise zu ihrer Implementierung erwähnt. Unsere Vorgehensweise bei ihrer Realisierung sieht eine strikte Trennung von B-Baum und Index in zwei Abstraktionsebenen vor. Zusätzlich liefert unser Beitrag Algorithmen (i) für die Datenabbildung einschließlich der Bitlisten-Indexe und der spaltenorientierten Speicherung, (ii) für Mehrbenutzersynchronisation und Recovery einschließlich spezieller Sperren für einzelne Schlüsselwerte und komplexe Objekte, (iii) für Aktualisierungsoperationen einschließlich Einfügungen und Löschungen großer Datenmengen, (iv) für das Hinzufügen und Entfernen einzelner Indexe innerhalb eines Merged Index, (v) zur Kontrolle relationaler Integritätsbedingungen wie Unique- und Fremdschlüsselbedingungen und (vi) für die Anfrageverarbeitung einschließlich Caching-Vorkehrungen bei Anfrageausführungsplänen mit hohem Anteil an geschachtelter Iteration und Index-Navigation.

In unserem Entwurf für Merged Indexes kann die Menge der beteiligten Tabellen, Sichten und Indexe unbeschränkt erweitert werden. Auch die Menge der Attribute (Spalten) mit Clusterbildung unterliegt keinen Beschränkungen. Ein relationaler Anfrageprozessor kann Indexeinträge genauso wie in herkömmlichen Indexen suchen und modifizieren. Diese Eigenschaften unseres Entwurfs ermöglichen es schließlich, das Konzept der allgemeinen hierarchischen Clusterbildung in herkömmliche Datenbanken zu integrieren und dabei ihre Vorteile hinsichtlich Leistung und Kosten zu erhalten.

This is a preview of subscription content, log in to check access.

References

  1. 1.

    Abadi DJ (2007) Column stores for wide and sparse data. CIDR 2007

  2. 2.

    Agarwal S, Blakeley JA, Casey T, Delaney K, Galindo-Legaria CA, Graefe G, Rys M, Zwilling MJ (2001) Microsoft SQL Server. Chapter 27, p 969–1006. In: Silberschatz A, Korth HF, Sudarshan S (eds) Database System Concepts 4th Edn. McGraw-Hill Higher Education

  3. 3.

    Bayer R, Unterauer K (1977) Prefix B-Trees. ACM TODS 2(1):11–26

  4. 4.

    Fernandez PM (1994) Red Brick Warehouse: A Read-Mostly RDBMS for Open SMP Platforms. SIGMOD 1994, p 492

  5. 5.

    Graefe G (2003) Sorting and Indexing with Partitioned B-Trees. CIDR 2003

  6. 6.

    Graefe G (2003) Executing Nested Queries. BTW Conf. 2003, pp 58–77

  7. 7.

    Graefe G (2006) B-tree indexes, interpolation search, and skew. DaMoN 2006, p 5

  8. 8.

    Graefe G (2007) Hierarchical locking in B-tree Indexes. BTW Conf. 2007

  9. 9.

    Graefe G (2007) Algorithms for merged indexes. BTW Conf. 2007

  10. 10.

    Graefe G: Surrogate compression for columnar storage. To appear in ACM SIGMOD Record

  11. 11.

    Graefe G, Bunker R, Cooper S (1998) Hash joins and hash teams in Microsoft SQL Server. VLDB 1998, pp 86–97

  12. 12.

    Gray J, Graefe G (1997) The Five-Minute Rule ten years later, and other computer storage rules of thumb. SIGMOD Record 26(4):63–68

  13. 13.

    Gottemukkala V, Lehman TJ (1992) Locking and Latching in a Memory-Resident Database System. VLDB 1992, pp 533–544

  14. 14.

    Graefe G, Larson P (2001) B-tree indexes and CPU caches. ICDE 2001, pp 349–358

  15. 15.

    Gray J, Lorie RA, Putzolu GR, Traiger IL (1975) Granularity of Locks in a Large Shared Data Base. VLDB 1975, pp 428–451

  16. 16.

    Gassner P, Lohman GM, Schiefer KB, Wang Y (1993) Query Optimization in the IBM DB2 Family. IEEE Data Eng Bull 16(4):4–18

  17. 17.

    Gray J, Reuter A (1993) Transaction Processing: Concepts and Techniques. Morgan-Kaufman, San Mateo, CA

  18. 18.

    Graefe G, Shapiro LD (1991) Data compression and database performance. ACM/IEEE-CS Symp. on Applied Computing

  19. 19.

    Galindo-Legaria CA, Stefani S, Waas F (2004) Query Processing for SQL Updates. SIGMOD 2004:844–849

  20. 20.

    Graefe G, Zwilling MJ (2004) Transaction support for indexed views. SIGMOD 2004:323–334

  21. 21.

    Härder T (1978) Implementing a Generalized Access Path Structure for a Relational Database System. ACM TODS 3(3):285–298

  22. 22.

    Hamilton J (2007) An architecture for modular data centers. CIDR 2007

  23. 23.

    Harizopoulos S, Liang V, Abadi DJ, Madden S (2006) Performance Tradeoffs in Read-Optimized Databases. VLDB 2006, pp 487–498

  24. 24.

    Härder T, Reuter A (1983) Principles of Transaction-Oriented Database Recovery. ACM Comput Surv 15(4):287–317

  25. 25.

    Härder T, Reinert J (1996) Access Path Support for Referential Integrity in SQL2. VLDB J 5(3):196–214

  26. 26.

    Hsu WW, Smith AJ (2004) The performance impact of I/O optimizations and disk improvements. IBM J Res Dev 48(2):255–289

  27. 27.

    Joshi AM (1991) Adaptive Locking Strategies in a Multi-node Data Sharing Environment. VLDB 1991, pp 181–191

  28. 28.

    Korth HF (1983) Locking Primitives in a Database System. J ACM 30(1):55–79

  29. 29.

    Lomet DB (1993) Key Range Locking Strategies for Improved Concurrency. VLDB 1993, pp 655–664

  30. 30.

    Lomet DB (2001) The Evolution of Effective B-tree: Page Organization and Techniques: A Personal Account. SIGMOD Record 30(3):64–69

  31. 31.

    Lehman TJ, Carey MJ (1989) A Concurrency Control Algorithm for Memory-Resident Database Systems. FODO 1989, pp 490–504

  32. 32.

    Leslie H, Jain R, Birdsall D, Yaghmai H (1995) Efficient Search of Multi-Dimensional B-Trees. VLDB 1995, pp 710–719

  33. 33.

    Lomet DB, Tuttle MR (1995) Redo Recovery after System Crashes. VLDB 1995, pp 457–468

  34. 34.

    Mohan C, ARIES/KVL: A Key-Value Locking Method for Concurrency Control of Multiaction Transactions Operating on B-Tree Indexes. VLDB 1990, pp 392–405

  35. 35.

    McJones PR (1997) The 1995 SQL Reunion: People, Project, and Politics, May 29, 1995. Digital System Research Center Report SRC1997-018

  36. 36.

    Mohan C, Levine F (1992) ARIES/IM: An Efficient and High Concurrency Index Management Method Using Write-Ahead Logging. SIGMOD 1992, pp 371–380

  37. 37.

    Mohan C, Narang I (1992) Algorithms for Creating Indexes for Very Large Tables without Quiescing Updates. SIGMOD Conf. 1992, pp 361–370

  38. 38.

    O’Neil PE (1986) The Escrow Transactional Method. ACM TODS 11(4):405–430

  39. 39.

    O’Neil PE, Graefe G (1995) Multi-table joins through bitmapped join indices. SIGMOD Record 24(3):8–11

  40. 40.

    Orenstein JA, Merrett TH (1984) A Class of Data Structures for Associative Searching. PODS 1984, pp 181–190

  41. 41.

    Pöss M, Potapov D (2003) Data Compression in Oracle. VLDB 2003, pp 937–947

  42. 42.

    Ramsak F, Markl V, Fenk R, Zirkel M, Elhardt K, Bayer R (2000) Integrating the UB-Tree into a Database System Kernel. VLDB 2000, pp 263–272

  43. 43.

    Short-stroking. Storage magazine. http://storagemagazine.techtarget.com, July 2005

  44. 44.

    Stonebraker M, Abadi DJ, Batkin A, Chen X, Cherniack M, Ferreira M, Lau E, Lin A, Madden S, O’Neil EJ, O’Neil PE, Rasin A, Tran N, Zdonik SB (2005) C-Store: A Column-oriented DBMS. VLDB 2005, pp 553–564

  45. 45.

    Severance DG, Lohman GM (1976) Differential Files: Their Application to the Maintenance of Large Databases. ACM TODS 1(3):256–267

  46. 46.

    Tsatalos OG, Solomon MH, Ioannidis YE (1996) VLDB J 5(2):101–118

  47. 47.

    Valduriez P (1987) Join Indices. ACM TODS 12(2):218–246

  48. 48.

    Wu K, Otoo EJ, Shoshani A (2006) Optimizing bitmap indices with efficient compression. ACM TODS 31(1):1–38

Download references

Author information

Correspondence to Goetz Graefe.

Additional information

CR subject classification

H2.2; H2.4; H3.2; H3.3

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Graefe, G. Master-detail clustering using merged indexes . Informatik Forsch. Entw. 21, 127–145 (2007). https://doi.org/10.1007/s00450-007-0022-4

Download citation

Keywords

  • Relational database
  • B-tree index
  • Master-detail clustering
  • Concurrency control
  • Index utilities
  • Query processing

Schlagworte

  • Relationale Datenbank
  • B-Baum-Index
  • Hierarchische Clusterbildung
  • Mehrbenutzersynchronisation
  • Anfragebearbeitung