Skip to main content

Abstract

Analytic queries can exhaust resources of the DBMS at hand. Since the nature of such queries can be foreseen, a database administrator can prepare the DBMS so that it serves such queries efficiently. Materialization of partial results (aggregates) is perhaps the most important method to reduce the resource consumption of such queries. The number of possible aggregates of a fact table is exponential in the number of its dimensions. The administrator has to choose a reasonable subset of all possible materialized aggregates. If an aggregate is materialized, it may produce benefits during a query execution but also instigate a cost during data maintenance (not to mention the space needed). Thus, the administrator faces an optimisation problem: knowing the workload (i.e. the queries and updates to be performed), what is the subset of all aggregates that gives the maximal net benefit? In this paper we present a cost model that defines the framework of this optimisation problem. Then, we compare two methods to compute the optimal subset of aggregates: a complete search and a genetic algorithm. We tested these meta-heuristics on a fact table with 30 dimensions. The results are promising. The genetic algorithm runs significantly faster while yielding solutions within 10% margin of the optimal solution found by the complete search.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Boncz, P.A., Manegold, S., Kersten, M.L.: Database architecture evolution: mammals flourished long before dinosaurs became extinct. PVLDB 2(2), 1648–1653 (2009). http://www.vldb.org/pvldb/2/vldb09-10years.pdf

    Google Scholar 

  2. Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: a multidimensional workload-aware histogram. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of data, Santa Barbara, CA, USA, 21–24 May 2001, pp. 211–222 (2001). http://doi.acm.org/10.1145/375663.375686

  3. Chaudhuri, S., Narasayya, V.R.: Self-tuning database systems: a decade of progress. In: Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, 23–27 September 2007, pp. 3–14 (2007). http://www.vldb.org/conf/2007/papers/special/p3-chaudhuri.pdf

  4. Deshpande, A., Ives, Z.G., Raman, V.: Adaptive query processing. Found. Trends Databases 1(1), 1–140 (2007). http://dx.doi.org/10.1561/1900000001

    Article  MATH  Google Scholar 

  5. Flexviews: Incrementally refreshable materialized views for MySQL, January 2012. http://code.google.com/p/flexviews/

  6. Gawarkiewicz, M., Wiśniewski, P.: Partial aggregation using hibernate. In: Kim, T., Adeli, H., Slezak, D., Sandnes, F.E., Song, X., Chung, K., Arnett, K.P. (eds.) FGIT 2011. LNCS, vol. 7105, pp. 90–99. Springer, Heidelberg (2011). doi:10.1007/978-3-642-27142-7_11

    Chapter  Google Scholar 

  7. Gawarkiewicz, M., Wiśniewski, P., Stencel, K.: Granular indices for HQL analytic queries. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2014. CCIS, vol. 424, pp. 30–39. Springer, Cham (2014). doi:10.1007/978-3-319-06932-6_4

    Chapter  Google Scholar 

  8. Hindshaw, F., Metzger, J., Zane, B.: Optimized Database Appliance, Patent No. U.S. 7,010,521 B2, Assignee: Netezza Corporation, Framingham, MA, issued 7 March 2006

    Google Scholar 

  9. Ioannidis, Y.E.: The history of histograms (abridged). In: VLDB, pp. 19–30 (2003). http://www.vldb.org/conf/2003/papers/S02P01.pdf

  10. Ivanova, M., Kersten, M.L., Nes, N.J., Goncalves, R.: An architecture for recycling intermediates in a column-store. ACM Trans. Database Syst. 35(4), 24 (2010). http://dx.doi.org/10.1145/1862919.1862921

    Article  Google Scholar 

  11. Ives, Z.G., Halevy, A.Y., Weld, D.S.: Adapting to source properties in processing data integration queries. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, France, 13–18 June 2004, pp. 395–406 (2004). http://doi.acm.org/10.1145/1007568.1007613

  12. Kabra, N., DeWitt, D.J.: Efficient mid-query re-optimization of sub-optimal query execution plans. In: SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, 2–4 June 1998, Seattle, Washington, USA, pp. 106–117 (1998). http://doi.acm.org/10.1145/276304.276315

  13. Kalyvianaki, E., Wiesemann, W., Vu, Q.H., Kuhn, D., Pietzuch, P.: SQPR: stream query planning with reuse. In: Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, 11–16 April 2011, Hannover, Germany, pp. 840–851 (2011). http://dx.doi.org/10.1109/ICDE.2011.5767851

  14. Markl, V., Raman, V., Simmen, D.E., Lohman, G.M., Pirahesh, H.: Robust query processing through progressive optimization. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, France, 13–18 June 2004, pp. 659–670 (2004). http://doi.acm.org/10.1145/1007568.1007642

  15. Mumick, I.S., Quass, D., Mumick, B.S.: Maintenance of data cubes and summary tables in a warehouse. In: SIGMOD Conference, pp. 100–111 (1997)

    Google Scholar 

  16. Salem, K., Beyer, K., Lindsay, B., Cochrane, R.: How to roll a join: asynchronous incremental view maintenance. SIGMOD Rec. 29(2), 129–140 (2000). http://doi.acm.org/10.1145/335191.335393

    Article  Google Scholar 

  17. Slezak, D., Synak, P., Borkowski, J., Wroblewski, J., Toppin, G.: A rough-columnar RDBMS engine - a case study of correlated subqueries. IEEE Data Eng. Bull. 35(1), 34–39 (2012). http://sites.computer.org/debull/A12mar/infobright1.pdf

    Google Scholar 

  18. Slezak, D., Synak, P., Wojna, A., Wroblewski, J.: Two database related interpretations of rough approximations: data organization and query execution. Fundam. Inform. 127(1–4), 445–459 (2013). http://dx.doi.org/10.3233/FI-2013-920

    Google Scholar 

  19. Slezak, D., Wroblewski, J., Eastwood, V., Synak, P.: Brighthouse: an analytic data warehouse for ad-hoc queries. PVLDB 1(2), 1337–1345 (2008). http://www.vldb.org/pvldb/1/1454174.pdf

    Google Scholar 

  20. Wisniewski, P., Stencel, K.: Query rewriting based on meta-granular aggregation. In: CS&P, pp. 457–468 (2013)

    Google Scholar 

  21. Wisniewski, P., Stencel, K.: Query rewriting based on meta-granular aggregation. Fundam. Inform. 135(4), 537–551 (2014). http://dx.doi.org/10.3233/FI-2014-1139

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Krzysztof Stencel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Szulc, I., Stencel, K., Wiśniewski, P. (2017). Using Genetic Algorithms to Optimize Redundant Data. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Towards Efficient Solutions for Data Analysis and Knowledge Representation. BDAS 2017. Communications in Computer and Information Science, vol 716. Springer, Cham. https://doi.org/10.1007/978-3-319-58274-0_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-58274-0_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-58273-3

  • Online ISBN: 978-3-319-58274-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics