Abstract
Horizontal Partitioning (HP) is an optimization technique widely used to improve the physical design of data warehouses. However, the selection of a partitioning schema is an NP-complete problem. Thus, many approaches were proposed to resolve this problem. Nonetheless, the overwhelming majority of these works do not take into account the size of the workload which can be very large. Huge workload increases the time of HP selection algorithms and may deteriorate the quality of final solution. We propose, in this paper, a new approach based on classification and election to select an HP schema in the case of largesized workloads. We conducted an experimental study on the ABP-1 benchmark to test the effectiveness and scalability of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Feinberg, D.: Database management systems. Technology trends, Gartner (2006)
Sanjay, A., Narasayya, V., Yang, B.: Integrating vertical and horizontal partitioning into automated physical database design. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 359–370 (2004)
Bellatreche, L., Boukhalfa, K., Richard, P.: Data partitioning in data warehouses: Hardness study, heuristics and ORACLE validation. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 87–96. Springer, Heidelberg (2008)
Ceri, S., Negri, M., Pelagatti, G.: Horizontal data partitioning in database design. In: Proceedings of the 1982 ACM SIGMOD International Conference on Management of Data (1982)
Bellatreche, L.: Utilisation des vues matérialisées, des index et de la fragmentation dans la conception logique et physique d’un entrepôt de données. Thèse de doctorat, Université de Clermont-Ferrand (2000)
Rao, J., Zhang, C., Megiddo, N., Lohman, G.: Automating physical database design in a parallel database. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 558–569. ACM, New York (2002)
Cuzzocrea, A., Darmont, J., Mahboubi, H.: Fragmenting very large xml data warehouses via k-means clustering algorithm, 301–328 (2009)
Barr, M., Bellatreche, L.: A new approach based on ants for solving the problem of horizontal fragmentation in relational data warehouses. In: 2010 International Conference on Machine and Web Intelligence (ICMWI), pp. 411–415 (2010)
Karima, T., Abdellatif, A., Ounalli, H.: Data mining based fragmentation technique for distributed data warehouses environment using predicate construction technique. In: 2010 Sixth International Conference on Networked Computing and Advanced Information Management (NCM), pp. 63–68 (2010)
Rehme, R., Bruno, N.: Automated partitioning design in parallel database systems. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 1137–1148. ACM, New York (2011)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297. University of California Press (1967)
Bellatreche, L., Boukhalfa, K.: An evolutionary approach to schema partitioning selection in a data warehouse. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2005. LNCS, vol. 3589, pp. 115–125. Springer, Heidelberg (2005)
Valduriez, P.: Parallel database systems: open problems and new issues. Kluwer Academic Publishers, Hingham (1993)
Valduriez, P., Özsu, M.: Principles of Distributed Database Systems, 2nd edn. Prentice Hall, New Jersey (1999)
Fiolet, V., Toursel, B.: Intelligent database distribution on a grid using clustering. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 466–472. Springer, Heidelberg (2005)
Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Paulson, E.: Efficient processing of data warehousing queries in a split execution environment. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 1165–1176. ACM, New York (2011)
Baer, H., et al.: Oracle database vldb and partitioning guide 11g release 2. Technical report, Oracle, Inc, Oracle White Paper (2011)
Microsoft, C.: Sql server 2012 performance white paper. Technical report, Microsoft Corporation (2012)
Cain, M.: Table partitioning strategies db2. Technical report, IBM (2006)
Sacca, D., Wiederhold, G.: Database partitioning in a cluster of processors. In: Proceedings of the 9th International Conference on Very Large Data Bases, pp. 242–247. Morgan Kaufmann Publishers Inc., San Francisco (1983)
Navathe, S., Ceri, S., Wiederhold, G., Dou, J.: Vertical partitioning algorithms for database design. ACM Trans. Database Syst. 9(4), 680–710 (1984)
Bellatreche, L., Karlapalem, K., Simonet, A.: Horizontal class partitioning in object-oriented databases. In: Tjoa, A.M. (ed.) DEXA 1997. LNCS, vol. 1308, pp. 58–67. Springer, Heidelberg (1997)
Pham, D., Dimov, S., Nguyen, C.: An incremental k-means algorithm. Journal of Mechanical Engineering Science 7(218), 783–795 (2004)
Bellatreche, L., Boukhalfa, K., Richard, P., Woameno, K.Y.: Referential horizontal partitioning selection problem in data warehouses: Hardness study and selection algorithms. IJDWM 5(4), 1–23 (2009)
Bouchakri, R., Bellatreche, L., Boukhalfa, K.: Une sélection multiple des structures d’optimisation dirigée par la méthode de classification k-means. In: EDA, pp. 207–222 (2010)
Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases. Data Mining Knowlege Discovery KDD 2(2), 169–194 (1996)
OLAP-Council: Apb-1 benchmark. Technical report, OLAP Council (1998), http://www.olpacouncil.org/research/resrchly.htm
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Amina, G., Boukhalfa, K. (2013). Very Large Workloads Based Approach to Efficiently Partition Data Warehouses. In: Amine, A., Otmane, A., Bellatreche, L. (eds) Modeling Approaches and Algorithms for Advanced Computer Applications. Studies in Computational Intelligence, vol 488. Springer, Cham. https://doi.org/10.1007/978-3-319-00560-7_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-00560-7_32
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-00559-1
Online ISBN: 978-3-319-00560-7
eBook Packages: EngineeringEngineering (R0)