Model and procedure for performance and availability-wise parallel warehouses

  • Pedro Furtado


Consider data warehouses as large data repositories queried for analysis and data mining in a variety of application contexts. A query over such data may take a large amount of time to be processed in a regular PC. Consider partitioning the data into a set of PCs (nodes), with either a parallel database server or any database server at each node and an engine-independent middleware. Nodes and network may even not be fully dedicated to the data warehouse. In such a scenario, care must be taken for handling processing heterogeneity and availability, so we study and propose efficient solutions for this. We concentrate on three main contributions: a performance-wise index, measuring relative performance; a replication-degree; a flexible chunk-wise organization with on-demand processing. These contributions extend the previous work on de-clustering and replication and are generic in the sense that they can be applied in very different contexts and with different data partitioning approaches. We evaluate their merits with a prototype implementation of the system.


De-clustering Replication Parallel processing Load-balancing 


  1. 1.
    Akal, F., Böhm K., Schek, H.-J.: OLAP query evaluation in a database cluster: a performance study on intra-query parallelism. In: East-European Conf. on Advances in Databases and Information Systems (ADBIS), Bratislava, Slovakia, 2002 Google Scholar
  2. 2.
    Bellatreche, L., Boukhalfa, K.: An evolutionary approach to schema partitioning selection in a data warehouse. In: International Conference on Data Warehousing and Knowledge Discovery, 2005 Google Scholar
  3. 3.
    Costa, R., Furtado, P.: Data warehouses in grids with high QoS. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 207–217, 2006 Google Scholar
  4. 4.
    Costa, M., Vieira, J., Bernardino, J., Furtado, P., Madeira, H.: A middle layer for distributed data warehouses using the DWS-AQA technique. In: VIII Jornadas de Ingeniería del Software y Bases de Datos, Alicante, Spain, November 2003 Google Scholar
  5. 5.
    DeWitt, D., Gray, J.: The future of high performance database processing. Commun. ACM 35(6) (1992) Google Scholar
  6. 6.
    Furtado, P.: Replication in node-partitioned data warehouses. In: DDIDR2005 Workshop of VLDB, 2005 Google Scholar
  7. 7.
    Furtado, P.: Efficient and robust node-partitioned data warehouses. In: Wrembel, R., Koncilia, C. (eds.) Data Warehouses and OLAP: Concepts, Architectures and Solutions, ISBN 1-59904365-3. Ideas Group, Inc., Chap. IX, pp. 203–229, 2007 Google Scholar
  8. 8.
    Furtado, P.: Workload-based placement and join processing in node-partitioned data warehouses. In: Proceedings of the International Conference on Data Warehousing and Knowledge Discovery, pp. 38–47, Zaragoza, Spain, September 2004 Google Scholar
  9. 9.
    Furtado, P.: Experimental evidence on partitioning in parallel data warehouses. In: Proceedings of the ACM DOLAP 04----Workshop of the International Conference on Information and Knowledge Management, Washington, USA, Nov. 2004 Google Scholar
  10. 10.
    Furtado, P.: Efficiently processing query-intensive databases over a non-dedicated local network. In: Proceedings of the 19th International Parallel and Distributed Processing Symposium, Denver, Colorado, USA, May 2005 Google Scholar
  11. 11.
    Hwang, K., Hai, J., Ho, R.S.C.: Orthogonal striping and mirroring in distributed RAID for I/O-centric cluster computing. IEEE Trans. Parallel Distrib. Syst. 13(1), 26–44 (2002) CrossRefGoogle Scholar
  12. 12.
    Hsiao, H., DeWitt, D.: Chained declustering: a new availability strategy for multi-processor database machines. In: Intl. Conf. on Data Engineering, 1990 Google Scholar
  13. 13.
    Hsiao, H., DeWitt, D.: Replicated data management in the gamma database machine. In: Workshop on the Management of Replicated Data, 1990 Google Scholar
  14. 14.
    Hsiao, H., DeWitt, D.J.: A performance study of three high availability data replication strategies. In: Proc. of the Parallel and Distributed Systems, 1991 Google Scholar
  15. 15.
    Hua, K.A., Lee, C.: An adaptive data placement scheme for parallel database computer systems. In: Proceedings of the Sixteenth Very Large Data Bases Conference, pp. 493–506, Brisbane, Queensland, Australia, August 1990 Google Scholar
  16. 16.
    Kitsuregawa, M., Tanaka, H., Motooka, T.: Application of hash to database machine and its architecture. New Generation Computing 1(1), 63–74 (1983) CrossRefGoogle Scholar
  17. 17.
    Lee, E.K., Chandramohan, A., Thekkath, P.: Distributed virtual disks. In: Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, 1996 Google Scholar
  18. 18.
    Lerner, A., Lifschitz, S.: A study of workload balancing techniques on parallel join algorithms. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), pp. 966–973, Las Vegas, EUA, July, 1998 Google Scholar
  19. 19.
    Lerner, A.: An architecture for the load-balanced parallel join operation in shared-nothing environments. M. Sc. Dissertation, Computer Science Department, Pontificia Univ. Catolica do Rio de Janeiro, March, 1998 (in Portuguese) Google Scholar
  20. 20.
    Lima, A.A., Mattoso, M., Valduriez, P.: Adaptive virtual partitioning for OLAP query processing in a database cluster. In: 19th Brazilian Symposium on Databases SBBD, Brasília, Brasil, 18–20 October 2004 Google Scholar
  21. 21.
    Lima, A.A.B., Mattoso, M., Valduriez, P.: OLAP query processing in a database cluster. In: Proc. 10th Euro-Par Conf., Pisa, Italy, 2004 Google Scholar
  22. 22.
    Patterson, D.A., Gibson, G., Katz, R.H.: A case for redundant arrays of inexpensive disks (raid). In: Proceedings of the International Conference on Management of Data, pp. 109–116, Chicago, USA, June 1998 Google Scholar
  23. 23.
    Rao, J., Zhang, C., Megiddo, N., Lohman, G.: Automating physical database design in a parallel database. In: Proceedings of the ACM International Conference on Management of Data, pp. 558–569, Madison, Wisconsin, USA, June 2002 Google Scholar
  24. 24.
    Stöhr, T., Märtens, H., Rahm, E.: Multi-dimensional database allocation for parallel data warehouses. In: Proc. 26th Intl. Conf. on Very Large Databases (VLDB), Cairo, Egypt, 2000 Google Scholar
  25. 25.
    Stonebraker, M., Schloss, G.A.: Distributed RAID—a new multiple copy algorithm. In: International Conference on Data Engineering, pp. 430–437, 1990 Google Scholar
  26. 26.
    Tandem: NonStop SQL, a distributed, high-performance, high-reliability implementation of SQL. In: Workshop on High Performance Transactional Systems, CA USA, September 1987 Google Scholar
  27. 27.
    Valduriez, P., Ozsu, M.: Principles of Parallel and Distributed Database Systems, 3rd edn. Prentice Hall, Englewood Cliffs (1999) Google Scholar
  28. 28.
    Williams, M., Zhou, S.: Data placement in parallel database systems. In: Parallel Database Techniques, California, USA, pp. 203–219. IEEE Computer Society Press, Los Alamitos (1998). Google Scholar
  29. 29.
    Yu, C.T., Meng, W.: Principles of Database Query Processing for Advanced Applications. Morgan Kaufmann, San Mateo (1998) Google Scholar
  30. 30.
    Zilio, D.C., Jhingran, A., Padmanabhan, S.: Partitioning key selection for a shared-nothing parallel database system. IBM Research Report RC 19820 (87739) (1994) Google Scholar
  31. 31.
    IBM DB2 Server.
  32. 32.
    Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology. SIGMOD Rec. 26(1), 65–74 (1997) CrossRefGoogle Scholar
  33. 33.
    Rousopoulos, R.: Materialized views and data warehouses. SIGMOD Rec. 27(1), 21–26 (1998) CrossRefGoogle Scholar
  34. 34.
    O’Neil, P., Graefe, G.: Multi-table joins through bitmapped join indices. SIGMOD Rec. 24(3), 8–11 (1995) CrossRefGoogle Scholar
  35. 35.
    Chan, C.-Y., Ioannidis, Y.E.: Bitmap index design and evaluation. In: Proceedings of the International Conference on the Management of Data, pp. 355–366, 1998 Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.University of CoimbraCoimbraPortugal

Personalised recommendations