Using Genetic Algorithms to Optimize Redundant Data

Szulc, Iwona; Stencel, Krzysztof; Wiśniewski, Piotr

doi:10.1007/978-3-319-58274-0_14

Iwona Szulc¹⁵,
Krzysztof Stencel¹⁵ &
Piotr Wiśniewski¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 716))

Included in the following conference series:

International Conference: Beyond Databases, Architectures and Structures

1258 Accesses

Abstract

Analytic queries can exhaust resources of the DBMS at hand. Since the nature of such queries can be foreseen, a database administrator can prepare the DBMS so that it serves such queries efficiently. Materialization of partial results (aggregates) is perhaps the most important method to reduce the resource consumption of such queries. The number of possible aggregates of a fact table is exponential in the number of its dimensions. The administrator has to choose a reasonable subset of all possible materialized aggregates. If an aggregate is materialized, it may produce benefits during a query execution but also instigate a cost during data maintenance (not to mention the space needed). Thus, the administrator faces an optimisation problem: knowing the workload (i.e. the queries and updates to be performed), what is the subset of all aggregates that gives the maximal net benefit? In this paper we present a cost model that defines the framework of this optimisation problem. Then, we compare two methods to compute the optimal subset of aggregates: a complete search and a genetic algorithm. We tested these meta-heuristics on a fact table with 30 dimensions. The results are promising. The genetic algorithm runs significantly faster while yielding solutions within 10% margin of the optimal solution found by the complete search.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Boncz, P.A., Manegold, S., Kersten, M.L.: Database architecture evolution: mammals flourished long before dinosaurs became extinct. PVLDB 2(2), 1648–1653 (2009). http://www.vldb.org/pvldb/2/vldb09-10years.pdf
Google Scholar
Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: a multidimensional workload-aware histogram. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of data, Santa Barbara, CA, USA, 21–24 May 2001, pp. 211–222 (2001). http://doi.acm.org/10.1145/375663.375686
Chaudhuri, S., Narasayya, V.R.: Self-tuning database systems: a decade of progress. In: Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, 23–27 September 2007, pp. 3–14 (2007). http://www.vldb.org/conf/2007/papers/special/p3-chaudhuri.pdf
Deshpande, A., Ives, Z.G., Raman, V.: Adaptive query processing. Found. Trends Databases 1(1), 1–140 (2007). http://dx.doi.org/10.1561/1900000001
Article MATH Google Scholar
Flexviews: Incrementally refreshable materialized views for MySQL, January 2012. http://code.google.com/p/flexviews/
Gawarkiewicz, M., Wiśniewski, P.: Partial aggregation using hibernate. In: Kim, T., Adeli, H., Slezak, D., Sandnes, F.E., Song, X., Chung, K., Arnett, K.P. (eds.) FGIT 2011. LNCS, vol. 7105, pp. 90–99. Springer, Heidelberg (2011). doi:10.1007/978-3-642-27142-7_11
Chapter Google Scholar
Gawarkiewicz, M., Wiśniewski, P., Stencel, K.: Granular indices for HQL analytic queries. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2014. CCIS, vol. 424, pp. 30–39. Springer, Cham (2014). doi:10.1007/978-3-319-06932-6_4
Chapter Google Scholar
Hindshaw, F., Metzger, J., Zane, B.: Optimized Database Appliance, Patent No. U.S. 7,010,521 B2, Assignee: Netezza Corporation, Framingham, MA, issued 7 March 2006
Google Scholar
Ioannidis, Y.E.: The history of histograms (abridged). In: VLDB, pp. 19–30 (2003). http://www.vldb.org/conf/2003/papers/S02P01.pdf
Ivanova, M., Kersten, M.L., Nes, N.J., Goncalves, R.: An architecture for recycling intermediates in a column-store. ACM Trans. Database Syst. 35(4), 24 (2010). http://dx.doi.org/10.1145/1862919.1862921
Article Google Scholar
Ives, Z.G., Halevy, A.Y., Weld, D.S.: Adapting to source properties in processing data integration queries. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, France, 13–18 June 2004, pp. 395–406 (2004). http://doi.acm.org/10.1145/1007568.1007613
Kabra, N., DeWitt, D.J.: Efficient mid-query re-optimization of sub-optimal query execution plans. In: SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, 2–4 June 1998, Seattle, Washington, USA, pp. 106–117 (1998). http://doi.acm.org/10.1145/276304.276315
Kalyvianaki, E., Wiesemann, W., Vu, Q.H., Kuhn, D., Pietzuch, P.: SQPR: stream query planning with reuse. In: Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, 11–16 April 2011, Hannover, Germany, pp. 840–851 (2011). http://dx.doi.org/10.1109/ICDE.2011.5767851
Markl, V., Raman, V., Simmen, D.E., Lohman, G.M., Pirahesh, H.: Robust query processing through progressive optimization. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, France, 13–18 June 2004, pp. 659–670 (2004). http://doi.acm.org/10.1145/1007568.1007642
Mumick, I.S., Quass, D., Mumick, B.S.: Maintenance of data cubes and summary tables in a warehouse. In: SIGMOD Conference, pp. 100–111 (1997)
Google Scholar
Salem, K., Beyer, K., Lindsay, B., Cochrane, R.: How to roll a join: asynchronous incremental view maintenance. SIGMOD Rec. 29(2), 129–140 (2000). http://doi.acm.org/10.1145/335191.335393
Article Google Scholar
Slezak, D., Synak, P., Borkowski, J., Wroblewski, J., Toppin, G.: A rough-columnar RDBMS engine - a case study of correlated subqueries. IEEE Data Eng. Bull. 35(1), 34–39 (2012). http://sites.computer.org/debull/A12mar/infobright1.pdf
Google Scholar
Slezak, D., Synak, P., Wojna, A., Wroblewski, J.: Two database related interpretations of rough approximations: data organization and query execution. Fundam. Inform. 127(1–4), 445–459 (2013). http://dx.doi.org/10.3233/FI-2013-920
Google Scholar
Slezak, D., Wroblewski, J., Eastwood, V., Synak, P.: Brighthouse: an analytic data warehouse for ad-hoc queries. PVLDB 1(2), 1337–1345 (2008). http://www.vldb.org/pvldb/1/1454174.pdf
Google Scholar
Wisniewski, P., Stencel, K.: Query rewriting based on meta-granular aggregation. In: CS&P, pp. 457–468 (2013)
Google Scholar
Wisniewski, P., Stencel, K.: Query rewriting based on meta-granular aggregation. Fundam. Inform. 135(4), 537–551 (2014). http://dx.doi.org/10.3233/FI-2014-1139
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Mathematics and Computer Science, Nicolaus Copernicus University, Toruń, Poland
Iwona Szulc, Krzysztof Stencel & Piotr Wiśniewski

Authors

Iwona Szulc
View author publications
You can also search for this author in PubMed Google Scholar
Krzysztof Stencel
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Wiśniewski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Krzysztof Stencel .

Editor information

Editors and Affiliations

Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Stanisław Kozielski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Dariusz Mrozek
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Paweł Kasprowski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Bożena Małysiak-Mrozek
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Daniel Kostrzewa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Szulc, I., Stencel, K., Wiśniewski, P. (2017). Using Genetic Algorithms to Optimize Redundant Data. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Towards Efficient Solutions for Data Analysis and Knowledge Representation. BDAS 2017. Communications in Computer and Information Science, vol 716. Springer, Cham. https://doi.org/10.1007/978-3-319-58274-0_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-58274-0_14
Published: 27 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58273-3
Online ISBN: 978-3-319-58274-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics