Abstract
Analytic queries can exhaust resources of the DBMS at hand. Since the nature of such queries can be foreseen, a database administrator can prepare the DBMS so that it serves such queries efficiently. Materialization of partial results (aggregates) is perhaps the most important method to reduce the resource consumption of such queries. The number of possible aggregates of a fact table is exponential in the number of its dimensions. The administrator has to choose a reasonable subset of all possible materialized aggregates. If an aggregate is materialized, it may produce benefits during a query execution but also instigate a cost during data maintenance (not to mention the space needed). Thus, the administrator faces an optimisation problem: knowing the workload (i.e. the queries and updates to be performed), what is the subset of all aggregates that gives the maximal net benefit? In this paper we present a cost model that defines the framework of this optimisation problem. Then, we compare two methods to compute the optimal subset of aggregates: a complete search and a genetic algorithm. We tested these meta-heuristics on a fact table with 30 dimensions. The results are promising. The genetic algorithm runs significantly faster while yielding solutions within 10% margin of the optimal solution found by the complete search.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Boncz, P.A., Manegold, S., Kersten, M.L.: Database architecture evolution: mammals flourished long before dinosaurs became extinct. PVLDB 2(2), 1648–1653 (2009). http://www.vldb.org/pvldb/2/vldb09-10years.pdf
Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: a multidimensional workload-aware histogram. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of data, Santa Barbara, CA, USA, 21–24 May 2001, pp. 211–222 (2001). http://doi.acm.org/10.1145/375663.375686
Chaudhuri, S., Narasayya, V.R.: Self-tuning database systems: a decade of progress. In: Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, 23–27 September 2007, pp. 3–14 (2007). http://www.vldb.org/conf/2007/papers/special/p3-chaudhuri.pdf
Deshpande, A., Ives, Z.G., Raman, V.: Adaptive query processing. Found. Trends Databases 1(1), 1–140 (2007). http://dx.doi.org/10.1561/1900000001
Flexviews: Incrementally refreshable materialized views for MySQL, January 2012. http://code.google.com/p/flexviews/
Gawarkiewicz, M., Wiśniewski, P.: Partial aggregation using hibernate. In: Kim, T., Adeli, H., Slezak, D., Sandnes, F.E., Song, X., Chung, K., Arnett, K.P. (eds.) FGIT 2011. LNCS, vol. 7105, pp. 90–99. Springer, Heidelberg (2011). doi:10.1007/978-3-642-27142-7_11
Gawarkiewicz, M., Wiśniewski, P., Stencel, K.: Granular indices for HQL analytic queries. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2014. CCIS, vol. 424, pp. 30–39. Springer, Cham (2014). doi:10.1007/978-3-319-06932-6_4
Hindshaw, F., Metzger, J., Zane, B.: Optimized Database Appliance, Patent No. U.S. 7,010,521 B2, Assignee: Netezza Corporation, Framingham, MA, issued 7 March 2006
Ioannidis, Y.E.: The history of histograms (abridged). In: VLDB, pp. 19–30 (2003). http://www.vldb.org/conf/2003/papers/S02P01.pdf
Ivanova, M., Kersten, M.L., Nes, N.J., Goncalves, R.: An architecture for recycling intermediates in a column-store. ACM Trans. Database Syst. 35(4), 24 (2010). http://dx.doi.org/10.1145/1862919.1862921
Ives, Z.G., Halevy, A.Y., Weld, D.S.: Adapting to source properties in processing data integration queries. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, France, 13–18 June 2004, pp. 395–406 (2004). http://doi.acm.org/10.1145/1007568.1007613
Kabra, N., DeWitt, D.J.: Efficient mid-query re-optimization of sub-optimal query execution plans. In: SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, 2–4 June 1998, Seattle, Washington, USA, pp. 106–117 (1998). http://doi.acm.org/10.1145/276304.276315
Kalyvianaki, E., Wiesemann, W., Vu, Q.H., Kuhn, D., Pietzuch, P.: SQPR: stream query planning with reuse. In: Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, 11–16 April 2011, Hannover, Germany, pp. 840–851 (2011). http://dx.doi.org/10.1109/ICDE.2011.5767851
Markl, V., Raman, V., Simmen, D.E., Lohman, G.M., Pirahesh, H.: Robust query processing through progressive optimization. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, France, 13–18 June 2004, pp. 659–670 (2004). http://doi.acm.org/10.1145/1007568.1007642
Mumick, I.S., Quass, D., Mumick, B.S.: Maintenance of data cubes and summary tables in a warehouse. In: SIGMOD Conference, pp. 100–111 (1997)
Salem, K., Beyer, K., Lindsay, B., Cochrane, R.: How to roll a join: asynchronous incremental view maintenance. SIGMOD Rec. 29(2), 129–140 (2000). http://doi.acm.org/10.1145/335191.335393
Slezak, D., Synak, P., Borkowski, J., Wroblewski, J., Toppin, G.: A rough-columnar RDBMS engine - a case study of correlated subqueries. IEEE Data Eng. Bull. 35(1), 34–39 (2012). http://sites.computer.org/debull/A12mar/infobright1.pdf
Slezak, D., Synak, P., Wojna, A., Wroblewski, J.: Two database related interpretations of rough approximations: data organization and query execution. Fundam. Inform. 127(1–4), 445–459 (2013). http://dx.doi.org/10.3233/FI-2013-920
Slezak, D., Wroblewski, J., Eastwood, V., Synak, P.: Brighthouse: an analytic data warehouse for ad-hoc queries. PVLDB 1(2), 1337–1345 (2008). http://www.vldb.org/pvldb/1/1454174.pdf
Wisniewski, P., Stencel, K.: Query rewriting based on meta-granular aggregation. In: CS&P, pp. 457–468 (2013)
Wisniewski, P., Stencel, K.: Query rewriting based on meta-granular aggregation. Fundam. Inform. 135(4), 537–551 (2014). http://dx.doi.org/10.3233/FI-2014-1139
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Szulc, I., Stencel, K., Wiśniewski, P. (2017). Using Genetic Algorithms to Optimize Redundant Data. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Towards Efficient Solutions for Data Analysis and Knowledge Representation. BDAS 2017. Communications in Computer and Information Science, vol 716. Springer, Cham. https://doi.org/10.1007/978-3-319-58274-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-58274-0_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58273-3
Online ISBN: 978-3-319-58274-0
eBook Packages: Computer ScienceComputer Science (R0)