Designing Parallel Relational Data Warehouses: A Global, Comprehensive Approach

Benkrid, Soumia; Bellatreche, Ladjel; Cuzzocrea, Alfredo

doi:10.1007/978-3-319-01863-8_16

Soumia Benkrid^12,13,
Ladjel Bellatreche¹² &
Alfredo Cuzzocrea¹⁴

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 241))

1408 Accesses

Abstract

The process of designing a parallel data warehouse has two main steps: (1) fragmentation and (2) allocation of so-generated fragments at various nodes. Usually, we split the data warehouse horizontally, allocate fragments over nodes, and finally balance the load over the nodes of the parallel machine. The main drawback of such design approach is that the high communication cost. Therefore, Data Replication (DR) has become a requirement for availability on the one hand but also for minimizing the communication cost on the other hand. In this paper, we present a redundant allocation algorithm for designing shared-nothing parallel relational data warehouses, which is based on the well-known fuzzy k-means clustering algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, D., Das, S., El Abbadi, A.: Data Management in the Cloud: Challenges and Opportunities. Synthesis Lectures on Data Management. Morgan & Claypool Publishers (2012)
Google Scholar
Ahmad, I., Karlapalem, K., Ghafoor, R.A.: Evolutionary algorithms for allocating data in distributed database systems. In: Distributed Database Systems, Distributed and Parallel Databases, pp. 5–32 (2002)
Google Scholar
Akal, F., Böhm, K., Schek, H.-J.: OLAP query evaluation in a database cluster: A performance study on intra-query parallelism. In: Manolopoulos, Y., Návrat, P. (eds.) ADBIS 2002. LNCS, vol. 2435, pp. 218–231. Springer, Heidelberg (2002)
Chapter Google Scholar
Apers, P.M.G.: Data allocation in distributed database systems. ACM Transactions on Database Systems 13(3), 263–304 (1988)
Article Google Scholar
Bellatreche, L., Benkrid, S.: A joint design approach of partitioning and allocation in parallel data warehouses. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 99–110. Springer, Heidelberg (2009)
Chapter Google Scholar
Bellatreche, L., Benkrid, S., Crolotte, A., Cuzzocrea, A., Ghazal, A.: The f&a methodology and its experimental validation on a real-life parallel processing database system. In: CISIS 2012, pp. 114–121 (2012)
Google Scholar
Bellatreche, L., Cuzzocrea, A., Benkrid, S.: \(\mathcal{F}\)&\(\mathcal{A}\): A methodology for effectively and efficiently designing parallel relational data warehouses on heterogenous database clusters. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds.) DAWAK 2010. LNCS, vol. 6263, pp. 89–104. Springer, Heidelberg (2010)
Chapter Google Scholar
Bergsten, B., Couprie, M., Valduriez, P.: Overview of parallel architectures for databases. Comput. J. 36(8), 734–740 (1993)
Article Google Scholar
Bezdek, J.C., Ehrlich, R., Full, W.: Fcm: The fuzzy c-means clustering algorithm. Computers and Geo-sciences 10(2-3), 191–203 (1984)
Article Google Scholar
Ciciani, B., Dias, D.M., Yu, P.S.: Analysis of replication in distributed database systems. IEEE Trans. on Knowl. and Data Eng., 247–261 (1990)
Google Scholar
Cuzzocrea, A.: Theoretical and practical aspects of warehousing, querying and mining sensor and streaming data. Journal of Computer and System Science 79(3), 309–311 (2013)
Article MathSciNet Google Scholar
DeWitt, D., Madden, S., Stonebraker, M.: How to build a high-performance data warehouse, http://db.lcs.mit.edu/madden/high_perf.pdf
Hsiao, H.I., Dewitt, D.J.: Chained declustering: A new availability strategy for multiprocssor database machines. In: ICDE 1990, pp. 456–465 (1990)
Google Scholar
Coffman Jr., E.G., Leung, Joseph, Y.-T., Ting, D.W.: Bin packing: Maximizing the number of pieces packed 9, 263–271 (1978)
Google Scholar
Lima, A.A.B., Mattoso, M., Valduriez, P.: Adaptive Virtual Partitioning for OLAP Query Processing in a Database Cluster. In: Lifschitz, S. (ed.) SBBD 2004, Brasilia, Brésil, pp. 92–105 (2004)
Google Scholar
Lima, A.B., Furtado, C., Valduriez, P., Mattoso, M.: Parallel olap query processing in database clusters with data replication. distributed and parallel databases. Distributed and Parallel Database Journal 25(1-2), 97–123 (2009)
Article Google Scholar
Loukopoulos, T., Ahmad, I.: Static and adaptive distributed data replication using genetic algorithms. Journal of Parallel and Distributed Computing 64(11), 1270–1285 (2004)
Article MATH Google Scholar
Menon, S.: Allocating fragments in distributed databases. IEEE Transactions on Parallel and Distributed Systems 16(7), 577–585 (2005)
Article Google Scholar
Nehme, R.V., Bruno, N.: Automated partitioning design in parallel database systems. In: ACM SIGMOD 2011, pp. 1137–1148 (2011)
Google Scholar
Pavlo, A., Curino, C., Zdonik, S.: Skew-aware automatic database partitioning in shared-nothing, parallel oltp systems. In: ACM SIGMOD 2012, pp. 61–72. ACM, New York (2012)
Google Scholar
Rao, J., Zhang, C., Lohman, G., Megiddo, N.: Automating physical database design in a parallel database. In: ACM SIGMOD 2002, pp. 558–569 (June 2002)
Google Scholar
Stöhr, T., Märtens, H., Rahm, E.: Multi-dimensional database allocation for parallel data warehouses. In: VLDB 2000, pp. 273–284 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

LIAS/ISAE-ENSMA, Poitiers, France
Soumia Benkrid & Ladjel Bellatreche
National High School for Computer Science (ESI), Algiers, Algeria
Soumia Benkrid
ICAR-CNR and University of Calabria, Rende, Italy
Alfredo Cuzzocrea

Authors

Soumia Benkrid
View author publications
You can also search for this author in PubMed Google Scholar
Ladjel Bellatreche
View author publications
You can also search for this author in PubMed Google Scholar
Alfredo Cuzzocrea
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Soumia Benkrid .

Editor information

Editors and Affiliations

Dipartimento di Informatica Bioingegneria, Robotica e, Università di Genova, Genova, Italy
Barbara Catania
Dipartimento di Automatica e Informatica, Politecnico di Torino, Torino, Italy
Tania Cerquitelli
Dipartimento di Automatica e Informatica, Politecnico di Torino, Torino, Italy
Silvia Chiusano
Dipartimento di Informatica, Bioingegneria, Robotica e, Università di Genova, Genova, Italy
Giovanna Guerrini
Cloudera, Inc., California,, California, USA
Mirko Kämpf
Faculty of Informatics, Technische Universität München, Garching, Germany
Alfons Kemper
Dept. of Analytical Information Systems, Saint Petersburg University, Saint Petersburg, Russia
Boris Novikov
Dipartimento di Ingegneria e Scienza, dell’Informazione, ItalyUniversità di Trento, Povo, TN,, Italy
Themis Palpanas
Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Praha, Praha, Czech Republic
Jaroslav Pokorný
Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
Athena Vakali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Benkrid, S., Bellatreche, L., Cuzzocrea, A. (2014). Designing Parallel Relational Data Warehouses: A Global, Comprehensive Approach. In: Catania, B., et al. New Trends in Databases and Information Systems. Advances in Intelligent Systems and Computing, vol 241. Springer, Cham. https://doi.org/10.1007/978-3-319-01863-8_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-01863-8_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01862-1
Online ISBN: 978-3-319-01863-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics