Skip to main content

Optimizer and Scheduling for the Community Data Warehouse Architecture

  • Chapter
Methods and Supporting Technologies for Data Analysis

Part of the book series: Studies in Computational Intelligence ((SCI,volume 225))

  • 968 Accesses

Abstract

In today’s internet-connected data driven world, the demand on high performance data management systems is progressively growing. The data warehouse (DW) concept has evolved from a centralized local repository into a broader concept that encompasses a community service with unique storage and processing capabilities. This increase in popularity has lead to the appearance of new DWarchitectures and optimizations. In this chapter we propose two key inter-related enabler technologies for this vision: a parallel query optimizer which is able to optimize queries in any parallel DW independently of the underlying database management system (DBMS), and a scheduling approach for Grid DWs, which decides in which Grid site a query should be executed.We experimentally prove that the approaches allow the community Data Warehouse to work efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abramson, D., Sosic, R., Giddy, J., Hall, B.: Nimrod: a tool for performing parametrised simulations using distributed workstations. In: HPDC 1995: Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing, p. 112. IEEE Computer Society, Washington (1995)

    Chapter  Google Scholar 

  2. Alpdemir, N.M., Mukherjee, A., Gounaris, A., Paton, N.W., Watson, P., Fernandes, A.A., Smith, J.: Ogsa-dqp: A service-based distributed query processor for the grid. In: Proceedings of the Second UK e-Science All Hands Meeting (2003)

    Google Scholar 

  3. Babcock, B., Chaudhuri, S.: Towards a robust query optimizer: a principled and practical approach. In: SIGMOD 2005: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 119–130. ACM, New York (2005), http://doi.acm.org/10.1145/1066157.1066172

    Chapter  Google Scholar 

  4. Babu, S., Bizarro, P., DeWitt, D.: Proactive re-optimization. In: SIGMOD 2005: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 107–118. ACM, New York (2005), http://doi.acm.org/10.1145/1066157.1066171

  5. Baker, M., Buyya, R., Laforenza, D.: Grids and grid technologies for wide-area distributed computing. Softw. Pract. Exper. 32(15), 1437–1466 (2002), http://dx.doi.org/10.1002/spe.488

  6. Ballinger, C., Fryer, R.: Born to be parallel: Why parallel origins give teradata an enduring performance edge. IEEE Data Eng. Bull. 20(2), 3–12 (1997)

    Google Scholar 

  7. Baralis, E., Paraboschi, S., Teniente, E.: Materialized views selection in a multidimensional database. In: Jarke, M., Carey, M.J., Dittrich, K.R., Lochovsky, F.H., Loucopoulos, P., Jeusfeld, M.A. (eds.) VLDB 1997, Proceedings of 23rd International Conference on Very Large Data Bases, Athens, Greece, August 25-29, pp. 156–165. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  8. Baru, C., Fecteau, G.: An overview of db2 parallel edition. In: SIGMOD 1995: Proceedings of the 1995 ACM SIGMOD international conference on Management of data, pp. 460–462. ACM, New York (1995), http://doi.acm.org/10.1145/223784.223876

  9. Bote-Lorenzo, M.L., Dimitriadis, Y.A., Gómez-Sánchez, E.: Grid characteristics and uses: A grid definition. In: Fernández Rivera, F., Bubak, M., Gómez Tato, A., Doallo, R. (eds.) Across Grids 2003. LNCS, vol. 2970, pp. 291–298. Springer, Heidelberg (2004)

    Google Scholar 

  10. Buyya, R., Abramson, D., Giddy, J.: Nimrod/g: An architecture for a resource management and scheduling system in a global computational grid. HPC 1, 283 (2000)

    Google Scholar 

  11. de Carvalho Costa, R.L., Furtado, P.: Data warehouses in grids with high qos. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 207–217. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  12. Chervenak, A.L., Palavalli, N., Bharathi, S., Kesselman, C., Schwartzkopf, R.: Performance and scalability of a replica location service. In: HPDC 2004: Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing, pp. 182–191. IEEE Computer Society, Washington (2004), http://dx.doi.org/10.1109/HPDC.2004.27

    Google Scholar 

  13. Chu, F., Halpern, J., Gehrke, J.: Least expected cost query optimization: what can we expect? In: PODS 2002: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 293–302. ACM, New York (2002), http://doi.acm.org/10.1145/543613.543651

  14. Costa, M., Vieira, J., Bernardino, J., Furtado, P., Madeira, H.: A middle layer for distributed data warehouses using the dws-aqa technique. In: Pimentel, E., Brisaboa, N.R., Gómez, J. (eds.) JISBD, pp. 775–778 (2003)

    Google Scholar 

  15. Czajkowski, K., Foster, I.T., Karonis, N.T., Kesselman, C., Martin, S., Smith, W., Tuecke, S.: A resource management architecture for metacomputing systems. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 62–82. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  16. Deshpande, A., Ives, Z., Raman, V.: Adaptive query processing. Found. Trends databases 1(1), 1–140 (2007), http://dx.doi.org/10.1561/1900000001

    Google Scholar 

  17. DeWitt, D.J., Gray, J.: Parallel database systems: The future of high performance database systems. Commun. ACM 35(6), 85–98 (1992)

    Article  Google Scholar 

  18. Evrendilek, C., Dogac, A.: Query decomposition, optimization and processing in multidatabase systems (1994), citeseer.ist.psu.edu/evrendilek94query.html

  19. Fitzgerald, S.: Grid information services for distributed resource sharing. In: HPDC 2001: Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing, p. 181. IEEE Computer Society, Washington (2001)

    Google Scholar 

  20. Foster, I.: What is the grid? - a three point checklist. GRID today 1(6) (2002)

    Google Scholar 

  21. Foster, I., Kesselman, C.: Globus: A metacomputing infrastructure toolkit. The Internat. Journal of Supercomputer Applications and High Performance Computing 11(2), 115–128 (1997)

    Article  Google Scholar 

  22. Foster, I., Kesselman, C.: The grid in a nutshell. Grid resource management: state of the art and future trends, 3–13 (2004)

    Google Scholar 

  23. Foster, I., Kesselman, C., Nick, J., Tuecke, S.: The physiology of the grid: An open grid services architecture for distributed systems integration. In: Globus Project Tech. Report (2002)

    Google Scholar 

  24. Foster, I., Kesselman, C., Tsudik, G., Tuecke, S.: A security architecture for computational grids. In: CCS 1998: Proceedings of the 5th ACM conference on Computer and communications security, pp. 83–92. ACM, New York (1998), http://doi.acm.org/10.1145/288090.288111

  25. Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the grid: Enabling scalable virtual organizations. Int. J. High Perform. Comput. Appl. 15(3), 200–222 (2001), http://dx.doi.org/10.1177/109434200101500302

    Google Scholar 

  26. Frey, J., Tannenbaum, T., Livny, M., Foster, I., Tuecke, S.: Condor-g: A computation management agent for multi-institutional grids. Cluster Computing 5(3), 237–246 (2002), http://dx.doi.org/10.1023/A:1015617019423

  27. Furtado, P.: Workload-based placement and join processing in node-partitioned data warehouses. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2004. LNCS, vol. 3181, pp. 38–47. Springer, Heidelberg (2004)

    Google Scholar 

  28. Furtado, P.: Hierarchical aggregation in networked data management. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 360–369. Springer, Heidelberg (2005)

    Google Scholar 

  29. Furtado, P.: Replication in node partitioned data warehouses. In: VLDB Workshop on Design, Implementation, and Deployment of Database Replication (DIDDR) (2005)

    Google Scholar 

  30. Ganguly, S.: Design and analysis of parametric query optimization algorithms. In: VLDB 1998: Proceedings of the 24th International Conference on Very Large Data Bases, pp. 228–238. Morgan Kaufmann Publishers Inc, San Francisco (1998)

    Google Scholar 

  31. Gounaris, A., Smith, J., Paton, N.W., Sakellariou, R., Fernandes, A.A.A., Watson, P.: Adapting to changing resource performance in grid query processing. In: Pierson, J.-M. (ed.) VLDB DMG 2005. LNCS, vol. 3836, pp. 30–44. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  32. Grimshaw, A.S., Wulf, W.A., Team, C.T.L.: The legion vision of a worldwide virtual computer. Commun. ACM 40(1), 39–45 (1997)

    Article  Google Scholar 

  33. Hasan, W.: Optimization of sql queries for parallel machines. Ph.D. thesis, Stanford University, Stanford, CA, USA (1996)

    Google Scholar 

  34. Hasan, W., Motwani, R.: Coloring away communication in parallel query optimization. In: VLDB 1995: Proceedings of the 21st International Conference on Very Large Data Bases, pp. 239–250. Morgan Kaufmann Publishers Inc., San Francisco (1995)

    Google Scholar 

  35. Hillson, S., Hobbs, L., Lawande, S.: Improve results with query rewrite (2008), http://www.oracle.com/technology/oramag/oracle/03-sep/o53business.html (last visited, April 2008)

  36. Hong, W., Stonebraker, M.: Optimization of parallel query execution plans in xprs. Distrib. Parallel Databases 1(1), 9–32 (1993), http://dx.doi.org/10.1007/BF01277518

    Article  Google Scholar 

  37. HP: Hp neoview parallel query optimizer, http://whitepapers.techrepublic.com.com/whitepaper.aspx?docid%=283608 (last visited, April 2008)

  38. Hulgeri, A., Sudarshan, S.: Anipqo: almost non-intrusive parametric query optimization for nonlinear cost functions. In: VLDB 2003: Proceedings of the 29th international conference on Very large data bases, pp. 766–777. VLDB Endowment (2003)

    Google Scholar 

  39. Ioannidis, Y.E., Ng, R.T., Shim, K., Sellis, T.K.: Parametric query optimization. VLDB J. 6(2), 132–151 (1997)

    Article  Google Scholar 

  40. Kossmann, D., Stocker, K.: Iterative dynamic programming: a new class of query optimization algorithms. ACM Trans. Database Syst. 25(1), 43–82 (2000), http://doi.acm.org/10.1145/352958.352982

    Article  Google Scholar 

  41. Krauter, K., Buyya, R., Maheswaran, M.: A taxonomy and survey of grid resource management systems for distributed computing. Softw. Pract. Exper. 32(2), 135–164 (2002)

    Article  MATH  Google Scholar 

  42. Kruskal, J.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical Society 7(1), 48–50 (1956)

    Article  MathSciNet  Google Scholar 

  43. Lawrence, M., Rau-Chaplin, A.: The olap-enabled grid: Model and query processing algorithms. In: HPCS 2006: Proceedings of the 20th International Symposium on High-Performance Computing in an Advanced Collaborative Environment (2006)

    Google Scholar 

  44. Lohman, G.M., Mohan, C., Haas, L.M., Daniels, D., Lindsay, B.G., Selinger, P.G., Wilms, P.F.: Query processing in r*. In: Query Processing in Database Systems, pp. 31–47. Springer, Heidelberg (1985)

    Google Scholar 

  45. Microsoft: Microsoft sql server 2005 home page (2008), http://www.microsoft.com/sql/ (last visited, April 2008)

  46. Natrajan, A., Humphrey, M.A., Grimshaw, A.S.: Grid resource management in legion. Grid resource management: state of the art and future trends, 145–160 (2004)

    Google Scholar 

  47. O’Neil, P., Graefe, G.: Multi-table joins through bitmapped join indices. SIGMOD Rec. 24(3), 8–11 (1995), http://doi.acm.org/10.1145/211990.212001

    Article  Google Scholar 

  48. O’Neil, P.E., Quass, D.: Improved query performance with variant indexes. In: Peckham, J. (ed.) SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, USA, May 13-15, pp. 38–49. ACM Press, New York (1997)

    Chapter  Google Scholar 

  49. Oracle: Oracle real application clusters (2008), http://www.oracle.com/technology/products/database/clustering%/index.html (last visited, April 2008)

  50. Prim, R.C.: Shortest connection networks and some generalizations. The Bell System Technical Journal 3, 1389–1401 (1957)

    Google Scholar 

  51. Ranganathan, K., Foster, I.: Computation scheduling and data replication algorithms for data grids. Grid resource management: state of the art and future trends, 359–373 (2004)

    Google Scholar 

  52. Roy, A., Sander, V.: Gara: a uniform quality of service architecture. Grid resource management: state of the art and future trends, 377–394 (2004)

    Google Scholar 

  53. Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: SIGMOD 1979: Proceedings of the 1979 ACM SIGMOD international conference on Management of data, pp. 23–34. ACM, New York (1979), http://doi.acm.org/10.1145/582095.582099

    Google Scholar 

  54. Shasha, D., Wang, T.L.: Optimizing equijoin queries in distributed databases where relations are hash partitioned. ACM Trans. Database Syst. 16(2), 279–308 (1991), http://doi.acm.org/10.1145/114325.103713

  55. Silaghi, G.C., Arenas, A.E., Silva, L.M.: A utility-based reputation model for service-oriented computing. In: Priol, T., Vanneschi, M. (eds.) Toward Next Generation Grids. CoreGRID Series, pp. 63–72. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  56. Smith, J., Gounaris, A., Watson, P., Paton, N.W., Fernandes, A.A.A., Sakellariou, R.: Distributed query processing on the grid. In: Parashar, M. (ed.) GRID 2002. LNCS, vol. 2536, pp. 279–290. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  57. Tannenbaum, T., Wright, D., Miller, K., Livny, M.: Condor – a distributed job scheduler. In: Beowulf Cluster Computing with Linux, MIT Press, Cambridge (2001)

    Google Scholar 

  58. Thain, D., Tannenbaum, T., Livny, M.: Condor and the grid. In: Grid Computing: Making the Global Infrastructure a Reality. John Wiley & Sons Inc., Chichester (2003)

    Google Scholar 

  59. TPC: Transaction processing performance council (2008), http://www.tpc.org/ (last visited, April 2008)

  60. Venugopal, S., Buyya, R.: A deadline and budget constrained scheduling algorithm for escience applications on data grids. In: Hobbs, M., Goscinski, A.M., Zhou, W. (eds.) ICA3PP 2005. LNCS, vol. 3719, pp. 60–72. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  61. Wehrle, P., Miquel, M., Tchounikine, A.: A grid services-oriented architecture for efficient operation of distributed data warehouses on globus. In: AINA 2007: Proceedings of the 21st International Conference on Advanced Networking and Applications, pp. 994–999. IEEE Computer Society, Washington (2007), http://dx.doi.org/10.1109/AINA.2007.13

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

de Carvalho Costa, R.L., Antunes, R., Furtado, P. (2009). Optimizer and Scheduling for the Community Data Warehouse Architecture. In: Zakrzewska, D., Menasalvas, E., Byczkowska-Lipinska, L. (eds) Methods and Supporting Technologies for Data Analysis. Studies in Computational Intelligence, vol 225. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02196-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02196-1_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02195-4

  • Online ISBN: 978-3-642-02196-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics