OLAP cube partitioning based on association rules method
- 66 Downloads
Abstract
Partitioning is an optimization method used in Business intelligence (BI) systems to improve query and processing performances. That is why most of BI vendors integrate partitioning functionality in their solutions. However, they do not provide partitioning strategies which remain a serious challenge for BI administrators. Some works in the literature have proposed algorithms and strategies for Data warehouse partitioning. Nevertheless, most of them focused on the relational data warehouse partitioning and ignore the OLAP cubes although they are the first concerned by the user multidimensional queries. To deal with this, we propose in this paper a dynamic partitioning strategy for OLAP cubes based on the association rules algorithm. The first step in the proposal consists on analyzing the user queries for a specific period with a view to finding the frequent predicates itemsets. Afterwards, we use our proposed algorithm based on the association rules method to partition the data cube according to the frequent predicates itemsets obtained in the first step. Finally, we present a case study and experiences results to evaluate and validate our approach.
Keywords
Data warehouse Partition OLAP cube Association rules Cube maintenance Cube performanceReferences
- 1.Inmon WH (2005) Building the data warehouse. Wiley, New YorkGoogle Scholar
- 2.Vaisman A, Zimányi E (2014) Data warehouse systems design and implementation. Springer, BerlinGoogle Scholar
- 3.AlHammad N, Taha Y (2016) Performance Evaluation Study of Data Retrieval in Data Warehouse Environment. ICCIP ’16 ACM, SingaporeGoogle Scholar
- 4.Kimball R, Ross M (2002) The data warehouse toolkit second edition the complete guide to dimensional modeling. Wiley, New YorkGoogle Scholar
- 5.Common Warehouse Metamodel (CWM) Specification Version 1.1, Volume 1 (March 2003)Google Scholar
- 6.Meta Data Coalition Open Information Model Version 1.1 (August, 1999)Google Scholar
- 7.Han J, Kamber M (2006) Data Mining. Elsevier, AmsterdamzbMATHGoogle Scholar
- 8.Bellatreche L, Boukhalfa K (2005) An evolutionary approach to schema partitioning selection in a data warehouse. In: Proceedings of the 7th International Conference DaWaK. LNCS, vol 3589. Springer, Berlin, pp 115–125Google Scholar
- 9.Bellatreche L, Boukhalfa K, Richard P (2009) Referential horizontal partitioning selection problem in data warehouses: hardness study and selection algorithms. Int J Data Warehouse Min 5(4):1–23CrossRefGoogle Scholar
- 10.Hamdi I, Bouazizi E, Alshomrani S, Feki J (2015) 2LPA-RTDW: a two-level data partitioning approach for real-time data warehouse. Computer and Information Science (ICIS). IEEE, Las VegasGoogle Scholar
- 11.Baluch O, Eavis T (2014) Soft real-time OLAP: exploiting modern hardware without breaking the bank. In: 43rd international conference IEEE parallel processing workshops (ICCPW),Google Scholar
- 12.Lima A, Furtado C, Valduriez P, Mattoso M (2009) Parallel OLAP query processing in database clusters with data replication. Distrib Parallel Databases 25:97–123CrossRefGoogle Scholar
- 13.Sun L, Krishnan S, Xin RS, Franklin MJ (2014) A partitioning framework for aggressive data skipping. In: International conference on very large data bases, HangzhouGoogle Scholar
- 14.Toumi L, Moussaoui A, Ugur A (2015) EMeD-part: an efficient methodology for horizontal partitioning in data warehouses. In: ACM IPAC ’15. BatnaGoogle Scholar
- 15.Grund M, Krueger J, Mueller J, Zeier A, Plattner H (2011) Dynamic partitioning for enterprise applications. In: Proceedings of IEEE IEEM, pp 1010–1015Google Scholar
- 16.Bellatreche L, Bouchakri R, Cuzzocrea A, Maabout S (2013) Horizontal partitioning of very-large data warehouses under dynamically-changing query workloads via incremental algorithms. In: SAC’13 proceedings of the 28th annual ACM symposium on applied computing, pp 208–210Google Scholar
- 17.Bouchakri R, Bellatreche L, Faget Z, Breß S (2014) A coding template for handling static and incremental horizontal partitioning in data warehouses. J Decis Syst 23:4, 481–498CrossRefGoogle Scholar
- 18.Rodriguez L, Li X (2011) A support-based vertical partitioning method for database design. In: 2011 8th international conference on electrical engineering computing science and automatic control (CCE), pp 1–6Google Scholar
- 19.Bouakkaz M, Ouinten Y, Ziani B (2012) Vertical fragmentation of data warehouses using the FP-Max algorithm. In: 2012 international conference on innovations in information technology (IIT), pp 273–276Google Scholar
- 20.Arres B, Kabachi N, Boussaid O (2015) A data pre-partitioning and distribution optimization approach for distributed datawarehouses. In: Proceedings of the international conference on parallel and distributed processing techniques and applications (PDPTA), Athens, pp 454–461Google Scholar
- 21.Kim JW, Cho SH, Kim I-M (2016) Workload-based column partitioning to efficiently process data warehouse query. Int J Appl Eng Res 11(2):917–921Google Scholar
- 22.Ahmed S, Coenen F, Leng P (2006) Tree-based partitioning of date for association rule mining. Knowl Inf Syst 315–331Google Scholar
- 23.Patil DV (2015) Reducing data skew with round robin horizontal partitioning of data for distributed association rule mining of large data set. IJETTGoogle Scholar
- 24.Le-Khac NA, Kechadi MT, Carthy J (2006) ADMIRE framework: distributed data mining on data grid platforms. In: Proceedings of the first international conference on software and data technologies. ICSOFTGoogle Scholar
- 25.Gorla N (2003) Features to consider in a data warehousing system. Commun ACM 46(11):111–115CrossRefGoogle Scholar
- 26.Cheung DW, Zhou B, Kao B, Kan H, Lee SD (2001) Towards the building of a dense-region-based OLAP system. Data Knowl Eng 36:1–27CrossRefzbMATHGoogle Scholar
- 27.Partitions (Analysis Services - Multidimensional Data) https://msdn.microsoft.com/en-us/library/ms175688.aspx. Accessed: 21 Sep 2017
- 28.SAS 9.1.3 OLAP Server: MDX Guide, Second Ed - SAS Support, MDX Introduction and OverviewGoogle Scholar
- 29.Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIG MOD Conference. Washington DC, USAGoogle Scholar
- 30.Ben Messaoud R, SL Rabasda, Boussaid O, Missaoui R (2006) Enhanced mining of association rules from data Cubes. In: DOLAP’06, November 10, 2006. Arlington, USAGoogle Scholar
- 31.Ponniah P (2001) Data warehousing fundamentals: a comprehensive guide for IT professionalsGoogle Scholar
- 32.Shukla A, Deshpande P, Naughton JF (1996) Storage estimation for multidimensional aggregates in the presence of hierarchies, http://ai2-s2-pdfs.s3.amazonaws.com
- 33.TPC-DS database: http://www.tpc.org/tpcds. Accessed: 21 Nov 2017
- 34.Letrache K, El Beggar O, Ramdani M (2017) The automatic creation of OLAP cube using an MDA approach. Softw: Pract Exp 47(12):1887–1903Google Scholar
- 35.El Beggar O, Letrache K, Ramdani M (2017) CIM for data warehouse requirements using an UML profile. IET Softw 11(4):181–194Google Scholar