Cost effective storage space for data cubes

Łatuszko, Marek

doi:10.1007/s10844-016-0408-5

Cost effective storage space for data cubes

Published: 16 April 2016

Volume 48, pages 243–261, (2017)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Marek Łatuszko¹

452 Accesses
1 Citation
Explore all metrics

Abstract

View materialization is one of the most important techniques applied in multidimensional databases. The problem of selecting a set of views for materialization that minimizes queries response time under storage space constraint received significant attention over last twenty years. Many researchers concentrate on designing better view selection methods with respect to the running time or the cost of the solution. This paper summarizes our research on the problem of how much space should be allocated for views materialization to ensure good queries performance. In order to comprehensively investigate the problem and minimize the influence of untypical cases, the experiments described in this paper were done on the large data set, including large data cubes, rarely considered in previous papers. In particular, the relation between the number of data cube views and the space limit expressed as a percentage of the fully materialized data cube size and a multiple of the base view size is analysed. According to our experimental results, the allocation of large space for views materialization is not cost effective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Building Smart Cubes for Reliable and Faster Access to Data

A Workload-Driven Approach for View Selection in Large Dimensional Datasets

Article 14 March 2020

Leandro Ordonez-Ante, Gregory Van Seghbroeck, … Filip De Turck

Query Prioritization for View Selection

Notes

The term materialized view originate from relational databases, however, in this paper we treat the view as a set of data cube cells, independent from database technology.
In (Asgharzadeh Talebi et al. 2006) and (Asgharzadeh Talebi et al. 2007) the space limit was related to queries workload, however, also special cases when queries ask for all views from data cube were considered.
The paper also described how to change the restriction on the fixed number of views to the restriction on space available for materialization.
According to (Kimball and Strehlo 1995) in 1995 nearly all successful decision support systems used star-shaped design.

References

Agrawal, S., Chaudhuri, S., & Narasayya, V.R. (2000). Automated selection of materialized views and indexes in SQL databases. In Proceedings of the 26th international conference on very large data bases VLDB ’00 (pp. 496–505). San Francisco: Morgan Kaufmann Publishers Inc.
Agrawal, S., Chaudhuri, S., Kollar, L., Marathe, A., Narasayya, V., & Syamala, M. (2004). Database tuning advisor for microsoft SQL server 2005. In VLDB. Very Large Data Bases Endowment Inc.
Agrawal, V., Sundararaghavan, P.S., Ahmed, M.U., & Nandkeolyar, U. (2007). View materialization in a data cube: optimization models and heuristics. Journal of Database Management, 18(3), 1–20.
Article Google Scholar
Aouiche, K., Jouve, P.E., & Darmont, J. (2006). Clustering-based materialized view selection in data warehouses. In Proceedings of the 10th East European conference on advances in databases and information systems, ADBIS’06. (pp. 81–95). Berlin, Heidelberg: Springer. doi:10.1007/11827252_9
Aouiche, K., & Darmont, J. (2009). Data mining-based materialized view and index selection in data warehouses. Journal of Intelligent Information System, 33(1), 65–93. doi:10.1007/s10844-009-0080-0.
Article Google Scholar
Asgharzadeh Talebi, Z., Chirkova, R., & Fathi, Y. (2005). Experimental study of an IP model for the view selection problem. Tech. Rep. TR-2005-34, Raleigh: NC State University.
Asgharzadeh Talebi, Z., Chirkova, R., & Fathi, Y. (2006). A study of a formal model for view selection for aggregate queries. Tech. Rep. TR-2006-2, Raleigh: NC State University.
Google Scholar
Asgharzadeh Talebi, Z., Chirkova, R., & Fathi, Y. (2007). Exact and inexact methods for solving the problem of view selection for aggregate queries. Tech. Rep. TR-2007-27, Raleigh: NC State University.
Google Scholar
Asgharzadeh Talebi, Z., Chirkova, R., & Fathi, Y. (2013). An integer programming approach for the view and index selection problem. Data & Knowledge Engineering, 83 (0), 111–125. doi:10.1016/j.datak.2012.11.001.
Article Google Scholar
Belknap, P., Dageville, B., Dias, K., & Yagoub, K. (2009). Self-tuning for {SQL} performance in oracle database 11g. In Proceedings of the 25th international conference on data engineering, ICDE 2009. doi:10.1109/ICDE.2009.165(pp. 1694–1700). Shanghai.
Bello, R.G., Dias, K., Downing, A., Feenan Jr. J.J., Finnerty, J.L., Norcott, W.D., Sun, H., Witkowski, A., & Ziauddin, M. (1998). Materialized views in oracle. In Proceedings of the 24th international conference on very large data bases, VLDB ’98 (pp. 659–664). San Francisco: Morgan Kaufmann Publishers Inc.
Bunker, C.J., Colby, L.S., Cole, R.L., McKenna, W.J., Mulagund, G., & Wilhite, D. (2001). Aggregate maintenance for data warehousing in informix red brick vista. In Proceedings of the 27th international conference on very large data bases, VLDB ’01 (pp. 659–662). San Francisco: Morgan Kaufmann Publishers Inc.
Chaudhuri, S., & Narasayya, V.R. (1997). An efficient cost-driven index selection tool for microsoft SQL server. In Proceedings of the 23rd international conference on very large data bases, VLDB ’97 (pp. 146–155). San Francisco: Morgan Kaufmann Publishers Inc.
Chaudhuri, S., & Weikum, G. (2000). Rethinking database system architecture: towards a self-tuning RISC-style database system. In Proceedings of the 26th international conference on very large data bases, VLDB ’00 (pp. 1–10). San Francisco: Morgan Kaufmann Publishers Inc.
Chaudhuri, S., & Narasayya, V. (2007). Self-tuning database systems: a decade of progress. In Proceedings of the 33rd international conference on very large data bases, VLDB ’07 (pp. 3–14). Vienna: VLDB Endowment.
Colby, L.S., Cole, R.L., Haslam, E., Jazayeri, N., Johnson, G., McKenna, W.J., Schumacher, L., & Wilhite, D. (1998). Red brick vista ^TM: aggregate computation and management. In Proceedings of the 14th international conference on data engineering, 1998 (pp. 174–177). doi:10.1109/ICDE.1998.655773.
Cuzzocrea, A., Furfaro, F., & Saccà, D. (2008). Enabling OLAP in mobile environments via intelligent data cube compression techniques. Journal of Intelligent Information Systems, 33(2), 95– 143. doi:10.1007/s10844-008-0065-4.
Article Google Scholar
Dageville, B., Das, D., Dias, K., Yagoub, K., Zait, M., & Ziauddin, M. (2004). Automatic SQL tuning in oracle 10G. In Proceedings of the thirtieth international conference on very large data bases - Volume 30 (pp. 1098–1109). Toronto: VLDB Endowment.
Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., & Pirahesh, H. (1997). Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Journal of Data Mining and Knowledge Discovery, 1(1), 29–53. doi:10.1023/A:1009726021843.
Article Google Scholar
Gupta, H. (1997). Selection of views to materialize in a data warehouse. In F. Afrati, & P. Kolaitis (Eds.) , Database theory - ICDT ’97, lecture notes in computer science (Vol. 1186, pp. 98–112). Berlin: Springer. doi:10.1007/3-540-62222-5-39.
Gupta, H., & Mumick, I.S. (2005). Selection of views to materialize in a data warehouse. IEEE Transactions on Knowledge and Data Engineering, 17(1), 24–43. doi:10.1109/TKDE.2005.16.
Article Google Scholar
Halevy, A.Y. (2001). Answering queries using views: a survey. The VLDB Journal, 10(4), 270–294. doi:10.1007/s007780100054.
Article MATH Google Scholar
Harinarayan, V., Rajaraman, A., & Ullman, J.D. (1996). Implementing data cubes efficiently. In Proceedings of the 1996 ACM SIGMOD international conference on management of data, SIGMOD ’96 (pp. 205–216). New York: ACM. doi:10.1145/233269.233333.
Huang, R., Chirkova, R., & Fathi, Y. (2012). Deterministic view selection for data-analysis queries: properties and algorithms. In T. Morzy, T. Härder, & R. Wrembel (Eds.), Advances in databases and information systems, lecture notes in computer science (Vol. 7503, pp. 195–208). doi:10.1007/978-3-642-33074-2_15. Berlin: Springer.
IBM (2013). IBM ILOG CPLEX optimizer. http://www-01.ibm.com/software/commerce/optimization/cplex-optimizer/index.html.
Kalnis, P., Mamoulis, N., & Papadias, D. (2002). View selection using randomized search. Data & Knowledge Engineering, 42(1), 89–111. doi:10.1016/S0169-023X(02)00045-9.
Article MATH Google Scholar
Karloff, H., & Mihail, M. (1999). On the complexity of the view-selection problem. In Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS ’99 (pp. 167–173). New York: ACM. 10.1145/303976.303993.
Kejser, T., Lee, D., Tkachuk, R., Anand, T., Dumitru, M., Galloway, G., Harinath, S., Melomed, E., Mirchandani, A., Rabeler, C., Vitt, E., Yogurtcuoglu, S., Zorner, A., Nayyar, S., Galloway, G., Piasevoli, T., Webb, C., & Russo, M. (2011). Microsoft SQL server 2008 R2 analysis services performance guide. In B. Inghram. (Ed.), White paper. http://download.microsoft.com/download/6/5/6/6567c845-fc8d-4d62-920f-c027a349c889/ssasperfguide2008r2.pdf.
Kimball, R., & Strehlo, K. (1995). Why decision support fails and how to fix it. SIGMOD Rec, 24(3), 92–97. doi:10.1145/211990.212023.
Article Google Scholar
Kimball, R., & Ross, M. (2002). The data warehouse toolkit: the complete guide to dimensional modeling, 2nd edn. New York: Wiley Inc.
Łatuszko, M. (2009). Query processing time optimization in data cubes. In T. Burczyńsk, W. Cholewa, & W. Moczulski (Eds.) ,Recent development in artificial intelligence methods (pp. 189–200). Gliwice: AI-Meth Series.
Łatuszko, M., & Pytlak, R. (2015). Methods for solving the mean query execution time minimization problem. European Journal of Operational Research, 246(2), 582–596. doi:10.1016/j.ejor.2015.04.041.
Article MathSciNet MATH Google Scholar
Lee, M., & Hammer, J. (1999). Speeding up warehouse physical design using a randomized algorithm. In Proceedings of the international workshop on design and management of data warehouses (DMDW ’99).
Li, J., Asgharzadeh Talebi, Z., Chirkova, R., & Fathi, Y. (2005). A formal model for the problem of view selection for aggregate queries. In Proceedings of the 9th East European conference on advances in databases and information systems, ADBIS05 (pp. 125–138). Berlin: Springer. doi:10.1007/11547686-10.
Lin, W.-Y., & Kuo, I.-C. (2004). A genetic selection algorithm for OLAP data cubes. Knowledge and Information Systems, 6(1), 83–102. doi:10.1007/s10115-003-0093-x.
Article MathSciNet Google Scholar
Mami, I., & Bellahsene, Z. (2012). A survey of view selection methods. SIGMOD Rec, 41(1), 20–29. doi:10.1145/2206869.2206874.
Article Google Scholar
Mami, I., Bellahsene, Z., & Coletta, R. (2012). View selection under multiple resource constraints in a distributed context. In S.W. Liddle, K.-D. Schewe, A.M. Tjoa, & X. Zhou (Eds.) , Proceedings of the 23rd international conference on database and expert systems applications, Vienna, Austria, September 3-6, 2012, DEXA 2012. Part II (pp. 281–296). Berlin, Heidelberg: Springer.
Müller, S., & Plattner, H. (2015). Aggregates caching in columnar in-memory databases. In A. Jagatheesan, J. Levandoski, T. Neumann, & A. Pavlo (Eds.), In memory data management and analysis: first and second international workshops, IMDM 2013, Riva del Garda, Italy, August 26, 2013, IMDM 2014, Hongzhou, China, September 1, 2014, Revised Selected Papers (pp. 69–81). Cham: Springer International Publishing. doi:10.1007/978-3-319-13960-9_6.
Oracle Corporation. (2006). Performance tuning using the SQL access advisor. Tech. rep., http://www.oracle.com/technetwork/database/manageability/twp-manage-tuning-using.pdf.
Paraboschi, S., Sindoni, G., Baralis, E., & Teniente, E. (2003). Chapter VIII: materialized views in multidimensional databases. In M. Rafanelli (Ed.) , Multidimensional databases: problems and solutions. Idea Group.
Pourabbas, E., & Rafanelli, M. (2003). Chapter IV: hierarchies. In M. Rafanelli (Ed.) ,Multidimensional databases: problems and solutions. Idea Group. http://dblp.uni-trier.de.
Rao, J., Zhang, C., Megiddo, N., & Lohman, G. (2002). Automating physical database design in a parallel database. In Proceedings of the 2002 ACM SIGMOD international conference on management of data, SIGMOD ’02. (pp. 558–569). New York: ACM. doi:10.1145/564691.564757.
Shukla, A., Deshpande, P., & Naughton, J.F. (1998). Materialized view selection for multidimensional datasets. In Proceedings of the 24rd international conference on very large data bases, VLDB ’98 (pp. 488–499). San Francisco: Morgan Kaufmann Publishers Inc.
Theodoratos, D., & Bouzeghoub, M. (2000). A general framework for the view selection problem for data warehouse design and evolution. In Proceedings of the 3rd ACM international workshop on data warehousing and OLAP, DOLAP ’00. (pp. 1–8). New York: ACM. doi:10.1145/355068.355309.
Theodoratos, D., Ligoudistianos, S., & Sellis, T. (2001). View selection for designing the global data warehouse. Data & Knowledge Engineering, 39(3), 219–240. Data warehousing. doi:10.1016/S0169-023X(01)00041-6.
Article MATH Google Scholar
Transaction Performance Processing Council (2010). TPC BENCHMARK ^TM H (Decision Support), Revision 2.11.0. http://www.tpc.org/tpch/default.asp.
Valentin, G., Zuliani, M., Zilio, D.C., Lohman, G., & Skelley, A. (2000). DB2 advisor: an optimizer smart enough to recommend its own indexes. In Proceedings of the 16th international conference on data engineering. Washington, DC: IEEE Computer Society.
Vitt, E, Anand, T.K., Berger, S., Dumitru, M., Jacobsen, E., Melomed, E., Mirchandani, A., Pasumansky, M., Petculescu, C., Rabeler, C., Robertson, W., Tkachuk, R., Wickert, D., & Wyatt, L. (2007). Microsoft SQL server 2005 analysis services performance guide. White paper. www.microsoft.com/technet/prodtechnol/sql/2005/ssas2005perfguide.mspx.
Weikum, G., Moenkeberg, A., Hasse, C., & Zabback, P. (2002). Self-tuning database technology and information services: from wishful thinking to viable engineering. In Proceedings of the 28th international conference on very large data bases, VLDB ’02 (pp. 20–31). Hong Kong: VLDB Endowment.
Yang, J., Karlapalem, K., & Li, Q. (1997). Algorithms for materialized view design in data warehousing environment. In Proceedings of the 23rd international conference on very large data bases, VLDB ’97 (pp. 136–145). San Francisco: Morgan Kaufmann Publishers Inc.
Zilio, D.C., Rao, J., Lightstone, S., Lohman, G., Storm, A., Arellano, C.G., & Fadden, S. (2004a). DB2 design advisor: Integrated automatic physical database design. In VLDB ’04: Proceedings of the thirtieth international conference on very large data bases, VLDB ’04 (pp. 1087–1097). Toronto: VLDB Endowment.
Zilio, D.C., Zuzarte, C., Lightstone, S., Ma, W., Lohman, G.M., Cochrane, R.J., Pirahesh, H., Colby, L., Gryz, J., Alton, E., Liang, D., & Valentin, G. (2004b). Recommending materialized views and indexes with IBM DB2 design advisor. International Conference on Autonomic Computing, 0, 180–188. doi:10.1109/ICAC.2004.47.

Download references

Author information

Authors and Affiliations

Institute of Automatic Control and Robotics, Warsaw University of Technology, Św. Andrzeja Boboli 8, 02-525, Warsaw, Poland
Marek Łatuszko

Authors

Marek Łatuszko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marek Łatuszko.

Appendix: Storage space needed to ensure expected benefit

Table 2 A rounded off percentage of the fully materialized data cube size needed to obtain 67 % of the maximum benefit

Full size table

Table 3 A rounded off percentage of the fully materialized data cube size needed to obtain 90 % of the maximum benefit

Full size table

Table 4 A rounded off percentage of the fully materialized data cube size needed to obtain 99 % of the maximum benefit

Full size table

Table 5 A rounded off multiple of the base view size needed to obtain 67 % of the maximum benefit

Full size table

Table 6 A rounded off multiple of the base view size needed to obtain 90 % of the maximum benefit

Full size table

Table 7 A rounded off multiple of the base view size needed to obtain 99 % of the maximum benefit

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Łatuszko, M. Cost effective storage space for data cubes. J Intell Inf Syst 48, 243–261 (2017). https://doi.org/10.1007/s10844-016-0408-5

Download citation

Received: 07 October 2015
Revised: 30 March 2016
Accepted: 01 April 2016
Published: 16 April 2016
Issue Date: April 2017
DOI: https://doi.org/10.1007/s10844-016-0408-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cost effective storage space for data cubes

Abstract

Access this article

Similar content being viewed by others

Building Smart Cubes for Reliable and Faster Access to Data

A Workload-Driven Approach for View Selection in Large Dimensional Datasets

Query Prioritization for View Selection

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Storage space needed to ensure expected benefit

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cost effective storage space for data cubes

Abstract

Access this article

Similar content being viewed by others

Building Smart Cubes for Reliable and Faster Access to Data

A Workload-Driven Approach for View Selection in Large Dimensional Datasets

Query Prioritization for View Selection

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Storage space needed to ensure expected benefit

Appendix: Storage space needed to ensure expected benefit

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation