Skip to main content

Very Large Workloads Based Approach to Efficiently Partition Data Warehouses

  • Chapter
Modeling Approaches and Algorithms for Advanced Computer Applications

Part of the book series: Studies in Computational Intelligence ((SCI,volume 488))

Abstract

Horizontal Partitioning (HP) is an optimization technique widely used to improve the physical design of data warehouses. However, the selection of a partitioning schema is an NP-complete problem. Thus, many approaches were proposed to resolve this problem. Nonetheless, the overwhelming majority of these works do not take into account the size of the workload which can be very large. Huge workload increases the time of HP selection algorithms and may deteriorate the quality of final solution. We propose, in this paper, a new approach based on classification and election to select an HP schema in the case of largesized workloads. We conducted an experimental study on the ABP-1 benchmark to test the effectiveness and scalability of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Feinberg, D.: Database management systems. Technology trends, Gartner (2006)

    Google Scholar 

  2. Sanjay, A., Narasayya, V., Yang, B.: Integrating vertical and horizontal partitioning into automated physical database design. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 359–370 (2004)

    Google Scholar 

  3. Bellatreche, L., Boukhalfa, K., Richard, P.: Data partitioning in data warehouses: Hardness study, heuristics and ORACLE validation. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 87–96. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  4. Ceri, S., Negri, M., Pelagatti, G.: Horizontal data partitioning in database design. In: Proceedings of the 1982 ACM SIGMOD International Conference on Management of Data (1982)

    Google Scholar 

  5. Bellatreche, L.: Utilisation des vues matérialisées, des index et de la fragmentation dans la conception logique et physique d’un entrepôt de données. Thèse de doctorat, Université de Clermont-Ferrand (2000)

    Google Scholar 

  6. Rao, J., Zhang, C., Megiddo, N., Lohman, G.: Automating physical database design in a parallel database. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 558–569. ACM, New York (2002)

    Chapter  Google Scholar 

  7. Cuzzocrea, A., Darmont, J., Mahboubi, H.: Fragmenting very large xml data warehouses via k-means clustering algorithm, 301–328 (2009)

    Google Scholar 

  8. Barr, M., Bellatreche, L.: A new approach based on ants for solving the problem of horizontal fragmentation in relational data warehouses. In: 2010 International Conference on Machine and Web Intelligence (ICMWI), pp. 411–415 (2010)

    Google Scholar 

  9. Karima, T., Abdellatif, A., Ounalli, H.: Data mining based fragmentation technique for distributed data warehouses environment using predicate construction technique. In: 2010 Sixth International Conference on Networked Computing and Advanced Information Management (NCM), pp. 63–68 (2010)

    Google Scholar 

  10. Rehme, R., Bruno, N.: Automated partitioning design in parallel database systems. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 1137–1148. ACM, New York (2011)

    Google Scholar 

  11. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297. University of California Press (1967)

    Google Scholar 

  12. Bellatreche, L., Boukhalfa, K.: An evolutionary approach to schema partitioning selection in a data warehouse. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2005. LNCS, vol. 3589, pp. 115–125. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  13. Valduriez, P.: Parallel database systems: open problems and new issues. Kluwer Academic Publishers, Hingham (1993)

    Google Scholar 

  14. Valduriez, P., Özsu, M.: Principles of Distributed Database Systems, 2nd edn. Prentice Hall, New Jersey (1999)

    Google Scholar 

  15. Fiolet, V., Toursel, B.: Intelligent database distribution on a grid using clustering. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 466–472. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  16. Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Paulson, E.: Efficient processing of data warehousing queries in a split execution environment. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 1165–1176. ACM, New York (2011)

    Chapter  Google Scholar 

  17. Baer, H., et al.: Oracle database vldb and partitioning guide 11g release 2. Technical report, Oracle, Inc, Oracle White Paper (2011)

    Google Scholar 

  18. Microsoft, C.: Sql server 2012 performance white paper. Technical report, Microsoft Corporation (2012)

    Google Scholar 

  19. Cain, M.: Table partitioning strategies db2. Technical report, IBM (2006)

    Google Scholar 

  20. Sacca, D., Wiederhold, G.: Database partitioning in a cluster of processors. In: Proceedings of the 9th International Conference on Very Large Data Bases, pp. 242–247. Morgan Kaufmann Publishers Inc., San Francisco (1983)

    Google Scholar 

  21. Navathe, S., Ceri, S., Wiederhold, G., Dou, J.: Vertical partitioning algorithms for database design. ACM Trans. Database Syst. 9(4), 680–710 (1984)

    Article  Google Scholar 

  22. Bellatreche, L., Karlapalem, K., Simonet, A.: Horizontal class partitioning in object-oriented databases. In: Tjoa, A.M. (ed.) DEXA 1997. LNCS, vol. 1308, pp. 58–67. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  23. Pham, D., Dimov, S., Nguyen, C.: An incremental k-means algorithm. Journal of Mechanical Engineering Science 7(218), 783–795 (2004)

    Google Scholar 

  24. Bellatreche, L., Boukhalfa, K., Richard, P., Woameno, K.Y.: Referential horizontal partitioning selection problem in data warehouses: Hardness study and selection algorithms. IJDWM 5(4), 1–23 (2009)

    Google Scholar 

  25. Bouchakri, R., Bellatreche, L., Boukhalfa, K.: Une sélection multiple des structures d’optimisation dirigée par la méthode de classification k-means. In: EDA, pp. 207–222 (2010)

    Google Scholar 

  26. Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases. Data Mining Knowlege Discovery KDD 2(2), 169–194 (1996)

    Google Scholar 

  27. OLAP-Council: Apb-1 benchmark. Technical report, OLAP Council (1998), http://www.olpacouncil.org/research/resrchly.htm

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gacem Amina .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Amina, G., Boukhalfa, K. (2013). Very Large Workloads Based Approach to Efficiently Partition Data Warehouses. In: Amine, A., Otmane, A., Bellatreche, L. (eds) Modeling Approaches and Algorithms for Advanced Computer Applications. Studies in Computational Intelligence, vol 488. Springer, Cham. https://doi.org/10.1007/978-3-319-00560-7_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-00560-7_32

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-00559-1

  • Online ISBN: 978-3-319-00560-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics