Skip to main content
Log in

Cost effective storage space for data cubes

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

View materialization is one of the most important techniques applied in multidimensional databases. The problem of selecting a set of views for materialization that minimizes queries response time under storage space constraint received significant attention over last twenty years. Many researchers concentrate on designing better view selection methods with respect to the running time or the cost of the solution. This paper summarizes our research on the problem of how much space should be allocated for views materialization to ensure good queries performance. In order to comprehensively investigate the problem and minimize the influence of untypical cases, the experiments described in this paper were done on the large data set, including large data cubes, rarely considered in previous papers. In particular, the relation between the number of data cube views and the space limit expressed as a percentage of the fully materialized data cube size and a multiple of the base view size is analysed. According to our experimental results, the allocation of large space for views materialization is not cost effective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. The term materialized view originate from relational databases, however, in this paper we treat the view as a set of data cube cells, independent from database technology.

  2. In (Asgharzadeh Talebi et al. 2006) and (Asgharzadeh Talebi et al. 2007) the space limit was related to queries workload, however, also special cases when queries ask for all views from data cube were considered.

  3. The paper also described how to change the restriction on the fixed number of views to the restriction on space available for materialization.

  4. According to (Kimball and Strehlo 1995) in 1995 nearly all successful decision support systems used star-shaped design.

References

  • Agrawal, S., Chaudhuri, S., & Narasayya, V.R. (2000). Automated selection of materialized views and indexes in SQL databases. In Proceedings of the 26th international conference on very large data bases VLDB ’00 (pp. 496–505). San Francisco: Morgan Kaufmann Publishers Inc.

  • Agrawal, S., Chaudhuri, S., Kollar, L., Marathe, A., Narasayya, V., & Syamala, M. (2004). Database tuning advisor for microsoft SQL server 2005. In VLDB. Very Large Data Bases Endowment Inc.

  • Agrawal, V., Sundararaghavan, P.S., Ahmed, M.U., & Nandkeolyar, U. (2007). View materialization in a data cube: optimization models and heuristics. Journal of Database Management, 18(3), 1–20.

    Article  Google Scholar 

  • Aouiche, K., Jouve, P.E., & Darmont, J. (2006). Clustering-based materialized view selection in data warehouses. In Proceedings of the 10th East European conference on advances in databases and information systems, ADBIS’06. (pp. 81–95). Berlin, Heidelberg: Springer. doi:10.1007/11827252_9

  • Aouiche, K., & Darmont, J. (2009). Data mining-based materialized view and index selection in data warehouses. Journal of Intelligent Information System, 33(1), 65–93. doi:10.1007/s10844-009-0080-0.

    Article  Google Scholar 

  • Asgharzadeh Talebi, Z., Chirkova, R., & Fathi, Y. (2005). Experimental study of an IP model for the view selection problem. Tech. Rep. TR-2005-34, Raleigh: NC State University.

  • Asgharzadeh Talebi, Z., Chirkova, R., & Fathi, Y. (2006). A study of a formal model for view selection for aggregate queries. Tech. Rep. TR-2006-2, Raleigh: NC State University.

    Google Scholar 

  • Asgharzadeh Talebi, Z., Chirkova, R., & Fathi, Y. (2007). Exact and inexact methods for solving the problem of view selection for aggregate queries. Tech. Rep. TR-2007-27, Raleigh: NC State University.

    Google Scholar 

  • Asgharzadeh Talebi, Z., Chirkova, R., & Fathi, Y. (2013). An integer programming approach for the view and index selection problem. Data & Knowledge Engineering, 83 (0), 111–125. doi:10.1016/j.datak.2012.11.001.

    Article  Google Scholar 

  • Belknap, P., Dageville, B., Dias, K., & Yagoub, K. (2009). Self-tuning for {SQL} performance in oracle database 11g. In Proceedings of the 25th international conference on data engineering, ICDE 2009. doi:10.1109/ICDE.2009.165(pp. 1694–1700). Shanghai.

  • Bello, R.G., Dias, K., Downing, A., Feenan Jr. J.J., Finnerty, J.L., Norcott, W.D., Sun, H., Witkowski, A., & Ziauddin, M. (1998). Materialized views in oracle. In Proceedings of the 24th international conference on very large data bases, VLDB ’98 (pp. 659–664). San Francisco: Morgan Kaufmann Publishers Inc.

  • Bunker, C.J., Colby, L.S., Cole, R.L., McKenna, W.J., Mulagund, G., & Wilhite, D. (2001). Aggregate maintenance for data warehousing in informix red brick vista. In Proceedings of the 27th international conference on very large data bases, VLDB ’01 (pp. 659–662). San Francisco: Morgan Kaufmann Publishers Inc.

  • Chaudhuri, S., & Narasayya, V.R. (1997). An efficient cost-driven index selection tool for microsoft SQL server. In Proceedings of the 23rd international conference on very large data bases, VLDB ’97 (pp. 146–155). San Francisco: Morgan Kaufmann Publishers Inc.

  • Chaudhuri, S., & Weikum, G. (2000). Rethinking database system architecture: towards a self-tuning RISC-style database system. In Proceedings of the 26th international conference on very large data bases, VLDB ’00 (pp. 1–10). San Francisco: Morgan Kaufmann Publishers Inc.

  • Chaudhuri, S., & Narasayya, V. (2007). Self-tuning database systems: a decade of progress. In Proceedings of the 33rd international conference on very large data bases, VLDB ’07 (pp. 3–14). Vienna: VLDB Endowment.

  • Colby, L.S., Cole, R.L., Haslam, E., Jazayeri, N., Johnson, G., McKenna, W.J., Schumacher, L., & Wilhite, D. (1998). Red brick vista TM: aggregate computation and management. In Proceedings of the 14th international conference on data engineering, 1998 (pp. 174–177). doi:10.1109/ICDE.1998.655773.

  • Cuzzocrea, A., Furfaro, F., & Saccà, D. (2008). Enabling OLAP in mobile environments via intelligent data cube compression techniques. Journal of Intelligent Information Systems, 33(2), 95– 143. doi:10.1007/s10844-008-0065-4.

    Article  Google Scholar 

  • Dageville, B., Das, D., Dias, K., Yagoub, K., Zait, M., & Ziauddin, M. (2004). Automatic SQL tuning in oracle 10G. In Proceedings of the thirtieth international conference on very large data bases - Volume 30 (pp. 1098–1109). Toronto: VLDB Endowment.

  • Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., & Pirahesh, H. (1997). Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Journal of Data Mining and Knowledge Discovery, 1(1), 29–53. doi:10.1023/A:1009726021843.

    Article  Google Scholar 

  • Gupta, H. (1997). Selection of views to materialize in a data warehouse. In F. Afrati, & P. Kolaitis (Eds.) , Database theory - ICDT ’97, lecture notes in computer science (Vol. 1186, pp. 98–112). Berlin: Springer. doi:10.1007/3-540-62222-5-39.

  • Gupta, H., & Mumick, I.S. (2005). Selection of views to materialize in a data warehouse. IEEE Transactions on Knowledge and Data Engineering, 17(1), 24–43. doi:10.1109/TKDE.2005.16.

    Article  Google Scholar 

  • Halevy, A.Y. (2001). Answering queries using views: a survey. The VLDB Journal, 10(4), 270–294. doi:10.1007/s007780100054.

    Article  MATH  Google Scholar 

  • Harinarayan, V., Rajaraman, A., & Ullman, J.D. (1996). Implementing data cubes efficiently. In Proceedings of the 1996 ACM SIGMOD international conference on management of data, SIGMOD ’96 (pp. 205–216). New York: ACM. doi:10.1145/233269.233333.

  • Huang, R., Chirkova, R., & Fathi, Y. (2012). Deterministic view selection for data-analysis queries: properties and algorithms. In T. Morzy, T. Härder, & R. Wrembel (Eds.), Advances in databases and information systems, lecture notes in computer science (Vol. 7503, pp. 195–208). doi:10.1007/978-3-642-33074-2_15. Berlin: Springer.

  • IBM (2013). IBM ILOG CPLEX optimizer. http://www-01.ibm.com/software/commerce/optimization/cplex-optimizer/index.html.

  • Kalnis, P., Mamoulis, N., & Papadias, D. (2002). View selection using randomized search. Data & Knowledge Engineering, 42(1), 89–111. doi:10.1016/S0169-023X(02)00045-9.

    Article  MATH  Google Scholar 

  • Karloff, H., & Mihail, M. (1999). On the complexity of the view-selection problem. In Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS ’99 (pp. 167–173). New York: ACM. 10.1145/303976.303993.

  • Kejser, T., Lee, D., Tkachuk, R., Anand, T., Dumitru, M., Galloway, G., Harinath, S., Melomed, E., Mirchandani, A., Rabeler, C., Vitt, E., Yogurtcuoglu, S., Zorner, A., Nayyar, S., Galloway, G., Piasevoli, T., Webb, C., & Russo, M. (2011). Microsoft SQL server 2008 R2 analysis services performance guide. In B. Inghram. (Ed.), White paper. http://download.microsoft.com/download/6/5/6/6567c845-fc8d-4d62-920f-c027a349c889/ssasperfguide2008r2.pdf.

  • Kimball, R., & Strehlo, K. (1995). Why decision support fails and how to fix it. SIGMOD Rec, 24(3), 92–97. doi:10.1145/211990.212023.

    Article  Google Scholar 

  • Kimball, R., & Ross, M. (2002). The data warehouse toolkit: the complete guide to dimensional modeling, 2nd edn. New York: Wiley Inc.

  • Łatuszko, M. (2009). Query processing time optimization in data cubes. In T. Burczyńsk, W. Cholewa, & W. Moczulski (Eds.) ,Recent development in artificial intelligence methods (pp. 189–200). Gliwice: AI-Meth Series.

  • Łatuszko, M., & Pytlak, R. (2015). Methods for solving the mean query execution time minimization problem. European Journal of Operational Research, 246(2), 582–596. doi:10.1016/j.ejor.2015.04.041.

    Article  MathSciNet  MATH  Google Scholar 

  • Lee, M., & Hammer, J. (1999). Speeding up warehouse physical design using a randomized algorithm. In Proceedings of the international workshop on design and management of data warehouses (DMDW ’99).

  • Li, J., Asgharzadeh Talebi, Z., Chirkova, R., & Fathi, Y. (2005). A formal model for the problem of view selection for aggregate queries. In Proceedings of the 9th East European conference on advances in databases and information systems, ADBIS05 (pp. 125–138). Berlin: Springer. doi:10.1007/11547686-10.

  • Lin, W.-Y., & Kuo, I.-C. (2004). A genetic selection algorithm for OLAP data cubes. Knowledge and Information Systems, 6(1), 83–102. doi:10.1007/s10115-003-0093-x.

    Article  MathSciNet  Google Scholar 

  • Mami, I., & Bellahsene, Z. (2012). A survey of view selection methods. SIGMOD Rec, 41(1), 20–29. doi:10.1145/2206869.2206874.

    Article  Google Scholar 

  • Mami, I., Bellahsene, Z., & Coletta, R. (2012). View selection under multiple resource constraints in a distributed context. In S.W. Liddle, K.-D. Schewe, A.M. Tjoa, & X. Zhou (Eds.) , Proceedings of the 23rd international conference on database and expert systems applications, Vienna, Austria, September 3-6, 2012, DEXA 2012. Part II (pp. 281–296). Berlin, Heidelberg: Springer.

  • Müller, S., & Plattner, H. (2015). Aggregates caching in columnar in-memory databases. In A. Jagatheesan, J. Levandoski, T. Neumann, & A. Pavlo (Eds.), In memory data management and analysis: first and second international workshops, IMDM 2013, Riva del Garda, Italy, August 26, 2013, IMDM 2014, Hongzhou, China, September 1, 2014, Revised Selected Papers (pp. 69–81). Cham: Springer International Publishing. doi:10.1007/978-3-319-13960-9_6.

  • Oracle Corporation. (2006). Performance tuning using the SQL access advisor. Tech. rep., http://www.oracle.com/technetwork/database/manageability/twp-manage-tuning-using.pdf.

  • Paraboschi, S., Sindoni, G., Baralis, E., & Teniente, E. (2003). Chapter VIII: materialized views in multidimensional databases. In M. Rafanelli (Ed.) , Multidimensional databases: problems and solutions. Idea Group.

  • Pourabbas, E., & Rafanelli, M. (2003). Chapter IV: hierarchies. In M. Rafanelli (Ed.) ,Multidimensional databases: problems and solutions. Idea Group. http://dblp.uni-trier.de.

  • Rao, J., Zhang, C., Megiddo, N., & Lohman, G. (2002). Automating physical database design in a parallel database. In Proceedings of the 2002 ACM SIGMOD international conference on management of data, SIGMOD ’02. (pp. 558–569). New York: ACM. doi:10.1145/564691.564757.

  • Shukla, A., Deshpande, P., & Naughton, J.F. (1998). Materialized view selection for multidimensional datasets. In Proceedings of the 24rd international conference on very large data bases, VLDB ’98 (pp. 488–499). San Francisco: Morgan Kaufmann Publishers Inc.

  • Theodoratos, D., & Bouzeghoub, M. (2000). A general framework for the view selection problem for data warehouse design and evolution. In Proceedings of the 3rd ACM international workshop on data warehousing and OLAP, DOLAP ’00. (pp. 1–8). New York: ACM. doi:10.1145/355068.355309.

  • Theodoratos, D., Ligoudistianos, S., & Sellis, T. (2001). View selection for designing the global data warehouse. Data & Knowledge Engineering, 39(3), 219–240. Data warehousing. doi:10.1016/S0169-023X(01)00041-6.

    Article  MATH  Google Scholar 

  • Transaction Performance Processing Council (2010). TPC BENCHMARK TM H (Decision Support), Revision 2.11.0. http://www.tpc.org/tpch/default.asp.

  • Valentin, G., Zuliani, M., Zilio, D.C., Lohman, G., & Skelley, A. (2000). DB2 advisor: an optimizer smart enough to recommend its own indexes. In Proceedings of the 16th international conference on data engineering. Washington, DC: IEEE Computer Society.

  • Vitt, E, Anand, T.K., Berger, S., Dumitru, M., Jacobsen, E., Melomed, E., Mirchandani, A., Pasumansky, M., Petculescu, C., Rabeler, C., Robertson, W., Tkachuk, R., Wickert, D., & Wyatt, L. (2007). Microsoft SQL server 2005 analysis services performance guide. White paper. www.microsoft.com/technet/prodtechnol/sql/2005/ssas2005perfguide.mspx.

  • Weikum, G., Moenkeberg, A., Hasse, C., & Zabback, P. (2002). Self-tuning database technology and information services: from wishful thinking to viable engineering. In Proceedings of the 28th international conference on very large data bases, VLDB ’02 (pp. 20–31). Hong Kong: VLDB Endowment.

  • Yang, J., Karlapalem, K., & Li, Q. (1997). Algorithms for materialized view design in data warehousing environment. In Proceedings of the 23rd international conference on very large data bases, VLDB ’97 (pp. 136–145). San Francisco: Morgan Kaufmann Publishers Inc.

  • Zilio, D.C., Rao, J., Lightstone, S., Lohman, G., Storm, A., Arellano, C.G., & Fadden, S. (2004a). DB2 design advisor: Integrated automatic physical database design. In VLDB ’04: Proceedings of the thirtieth international conference on very large data bases, VLDB ’04 (pp. 1087–1097). Toronto: VLDB Endowment.

  • Zilio, D.C., Zuzarte, C., Lightstone, S., Ma, W., Lohman, G.M., Cochrane, R.J., Pirahesh, H., Colby, L., Gryz, J., Alton, E., Liang, D., & Valentin, G. (2004b). Recommending materialized views and indexes with IBM DB2 design advisor. International Conference on Autonomic Computing, 0, 180–188. doi:10.1109/ICAC.2004.47.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marek Łatuszko.

Appendix: Storage space needed to ensure expected benefit

Appendix: Storage space needed to ensure expected benefit

Table 2 A rounded off percentage of the fully materialized data cube size needed to obtain 67 % of the maximum benefit
Table 3 A rounded off percentage of the fully materialized data cube size needed to obtain 90 % of the maximum benefit
Table 4 A rounded off percentage of the fully materialized data cube size needed to obtain 99 % of the maximum benefit
Table 5 A rounded off multiple of the base view size needed to obtain 67 % of the maximum benefit
Table 6 A rounded off multiple of the base view size needed to obtain 90 % of the maximum benefit
Table 7 A rounded off multiple of the base view size needed to obtain 99 % of the maximum benefit

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Łatuszko, M. Cost effective storage space for data cubes. J Intell Inf Syst 48, 243–261 (2017). https://doi.org/10.1007/s10844-016-0408-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-016-0408-5

Keywords

Navigation