Abstract
Enterprise wide data warehouses are becoming increasingly adopted as the main source and underlying infrastructure for business intelligence (BI) solutions. Note that a data warehouse can be viewed as an integration system, where data sources are duplicated in the same repository. Data warehouses are designed to handle the queries required to discover trends and critical factors are called Online Analytical Processing (OLAP) systems. Examples of an OLAP query are: Amazon (www.amazon.com) company analyzes purchases by its customers to come up with an individual screen with products of likely interest to the customer. Analysts at Wal-Mart (www.walmart.com) look for items with increasing sales in some city. Star schemes or their variants are usually used to model warehouse applications. They are composed of thousand of dimension tables and multiple fact tables [15, 18]. Figure 2.1 shows an example of star schema of the widely-known data warehouse benchmark APB-1 release II [21]. Here, the fact table Sales is joint to the following four dimension tables: Product, Customer, Time, Channel. Star queries are typically executed against the warehouse. Queries running on such applications contain a large number of costly joins, selections and aggregations. They are called mega queries [24]. To optimize these queries, the use of advanced optimization techniques is necessary.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
These queries are described in Appendix Section at: http://www.lisi.ensma.fr/ftp/pub/documents/papers/2009/2009-DOLAP-Bellatreche.pdf
- 2.
For example, we can partition a table using three fragmentation attributes.
- 3.
The Oracle cost-based optimizer recognizes star queries.
References
Bellatreche, L., Boukhalfa, K., Abdalla, H.I.: SAGA: A Combination of Genetic and Simulated Annealing Algorithms for Physical Data Warehouse Design. In: Proceedings of BNCOD’06, pp. 212–219 (2006)
Bellatreche, L., Boukhalfa, K., Richard, P.: Data Partitioning in Data Warehouses: Hardness Study, Heuristics and ORACLE Validation. In: Proceedings of DaWaK’2008, pp. 87–96 (2008)
Bellatreche, L., Boukhalfa, K., Richard, P., Woameno, K.Y.: Referential Horizontal Partitioning Selection Problem in Data Warehouses: Hardness Study and Selection Algorithms. In IJDWM. 5(4), 1–23 (2009)
Bellatreche, L., Karlapalem, K., Simonet. A.: Algorithms and Support for Horizontal Class Partitioning in Object-Oriented Databases. In the Distributed and Parallel Databases Journal, 8(2), 155–179 (2000)
Bellatreche, L., Woameno, K.Y.: Dimension Table Driven Approach to Referential Partition Relational Data Warehouses. In: ACM 12th International Workshop on Data Warehousing and OLAP (DOLAP), pp. 9–16 (2009)
Boukhalfa, K.: De la Conception Physique aux Outils d’Administration et de Tuning des Entrepts de Donnes. Poitiers University, France, PhD. Thesis. (2009)
Ceri, S., Negri, M., Pelagatti, G.: Horizontal Data Partitioning in Database Design. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. SIGPLAN Notices, pp. 128–136 (1982)
Cho, W.S., Park, C.M., Whang, K.Y., So, S.H.: A New Method for estimating the number of objects satisfying an object-oriented query involving partial participation of classes. Inf. Syst. 21(3), 253–267 (1996)
Eadon, G., Chong, E.I., Shankar, S., Raghavan, A., Srinivasan, J., Das, S.: Supporting Table Partitioning By Reference in Oracle. In: Proceedings of SIGMOD’08, pp. 1111–1122 (2008)
Furtado, P.: Experimental evidence on partitioning in parallel data warehouses. In: Proceedings Of DOLAP, pp. 23–30 (2004)
Gibbons, P.B., Matias, Y., Poosala, V.: Fast incremental maintenance of approximate histograms. ACM Trans. Database Syst. 27(3), 261–298 (2002)
Gray, J., Slutz, D.: Data Mining the SDSS SkyServer Database. Microsoft Research, Technical Report MSR-TR-2002-01 (2002)
Karlapalem, K., Navathe, S.B., Ammar, M.: Optimal Redesign Policies to Support Dynamic Processing of Applications on a Distributed Database System. Information Systems, 21(4), 353–367 (1996)
Lei, H., Ross, K.A.: Faster Joins, Self-Joins and Multi-Way Joins Using Join Indices. In Data and Knowledge Engineering, 28(3), 277–298 (1998)
Legler, T., Lehner, W., Ross, A.: Query Optimization For Data Warehouse System With Different Data Distribution Strategies, In BTW, pp. 502–513 (2007)
Mahboubi, H., Darmont, J.: Data mining-based fragmentation of XML data warehouses. In: Proceedings DOLAP’08, pp. 9–16 (2008)
Munneke, D., Wahlstrom, K., Mohania, M.K.: Fragmentation of Multidimensional Databases. In: Proceedings of ADC’99 pp. 153–164 (1999)
Neumann, T.: Query simplification: graceful degradation for join-order optimization. In: Proceedings of SIGMOD’09, pp. 403–414 (2009)
Noaman, A.Y., Barker, K.: A Horizontal Fragmentation Algorithm for the Fact Relation in a Distributed Data Warehouse. In: Proceedings of CIKM’99, pp. 154–161 (1999)
Oracle Data Sheet: Oracle Partitioning (2007) White Paper: http://www.oracle.com/technology/products/bi/db/11g
OLAP Council: APB-1 OLAP Benchmark, Release II. http://www.olapcouncil.org/research/bmarkly.htm (1998)
Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems, Second Ed. Prentice Hall (1999)
Sanjay, A., Narasayya, V.R., Yang, B.: Integrating Vertical and Horizontal Partitioning Into Automated Physical Database Design. In: Proceedings of SIGMOD’04, pp. 359–370 (2004)
Simon, E.: Reality check: a case study of an EII research prototype encountering customer needs. In Proceedings of EDBT’08, pp. 1 (2008)
Stöhr, T., Märtens, H., Rahm, E.: Multi-Dimensional Database Allocation for Parallel Data Warehouses. In: Proceedings of VLDB2000, pp. 273–284 (2000)
Steinbrunn, M., Moerkotte, G., Kemper, A.: Heuristic and Randomized Optimization for the Join Ordering Problem. In VLDB Journal. 6(3), 191–208 (1997)
Swami, A.N., Schiefer, K.B.: On the Estimation of Join Result Sizes. In: Proceedings of EDBT’04, pp. 287–300 (1994)
Sybase: Sybase Adaptive Server Enterprise 15 Data Partitioning. White paper (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Vienna
About this chapter
Cite this chapter
Bellatreche, L. (2012). Dimension Table Selection Strategies to Referential Partition a Fact Table of Relational Data Warehouses. In: Özyer, T., Kianmehr, K., Tan, M. (eds) Recent Trends in Information Reuse and Integration. Springer, Vienna. https://doi.org/10.1007/978-3-7091-0738-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-7091-0738-6_2
Published:
Publisher Name: Springer, Vienna
Print ISBN: 978-3-7091-0737-9
Online ISBN: 978-3-7091-0738-6
eBook Packages: Computer ScienceComputer Science (R0)