Advertisement

Journal of Intelligent Information Systems

, Volume 34, Issue 3, pp 305–343 | Cite as

A top-down approach for compressing data cubes under the simultaneous evaluation of multiple hierarchical range queries

  • Alfredo CuzzocreaEmail author
Article

Abstract

A novel top-down compression technique for data cubes is introduced and experimentally assessed in this paper. This technique considers the previously unrecognized case in which multiple Hierarchical Range Queries (HRQ), a very useful class of OLAP queries, must be evaluated against the target data cube simultaneously. This scenario makes traditional data cube compression techniques ineffective, as, contrary to the aim of our work, these techniques take into consideration one constraint only (e.g., a given storage space bound). The result of our study consists in introducing an innovative multiple-objective OLAP computational paradigm, and a hierarchical multidimensional histogram, whose main benefit is meaningfully implementing an intermediate compression of the input data cube able to simultaneously accommodate an even large family of different-in-nature HRQ. A complementary contribution of our work is represented by a wide experimental evaluation of the performance of our technique against both benchmark and real-life data cubes, also in comparison with state-of-the-art histogram-based compression techniques.

Keywords

Multi-objective compression of data cubes Compressing data cubes under simultaneous multiple OLAP queries Multiple-query data cube compression techniques Advanced OLAP 

References

  1. Aboulnaga, A., & Chaudhuri, S. (1999). Self-tuning histograms: Building histograms without looking at data. In Proceedings of the 1999 ACM international conference on management of data (pp. 181–192).Google Scholar
  2. Acharya, S., Gibbons, P. B., Poosala, V., & Ramaswamy, S. (1999a). Join synopses for approximate query answering. In Proceedings of the 1999 ACM international conference on management of data (pp. 275–286).Google Scholar
  3. Acharya, S., Poosala, V., & Ramaswamy, S. (1999b). Selectivity estimation in spatial databases. In Proceedings of the 1999 ACM international conference on management of data (pp. 13–24).Google Scholar
  4. Agrawal, R., & Wimmers, E. L. (2000). A framework for expressing and combining preferences. In Proceedings of the 2000 ACM international conference on management of data (pp. 297–306).Google Scholar
  5. Babcock, B., Chaudhuri, S., & Das, G. (2003). Dynamic sample selection for approximate query answers. In Proceedings of the 2003 ACM international conference on management of data (pp. 539–550).Google Scholar
  6. Balke, W.-T., & Güntzer, U. (2004). Multi-objective query processing for database systems. In Proceedings of the 30th international conference on very large data bases (pp. 936–947).Google Scholar
  7. Börzsönyi, S., Kossmann, D., & Stocker, K. (2001). The skyline operator. In Proceedings of the IEEE 17th international conference on data engineering (pp. 421–430).Google Scholar
  8. Bowman, I. T., & Salem, K. (2005). Optimization of query streams using semantic prefetching. ACM Transactions on Database Systems, 30(4), 1056–1101.CrossRefGoogle Scholar
  9. Bruno, N., Chaudhuri, S., & Gravano, L. (2001). STHoles: A multidimensional workload-aware histogram. In Proceedings of the 2001 ACM international conference on management of data (pp. 211–222).Google Scholar
  10. Buccafurri, F., Furfaro, F., Saccà, D., & Sirangelo, C. (2003). A quad-tree based multiresolution approach for two-dimensional summary data. In Proceedings of the IEEE 15th international conference on scientific and statistical database management (pp. 127–140).Google Scholar
  11. Chaudhuri, S., Das, G., Datar, M., Motwani, R., & Narasayya, V. R. (2001). Overcoming limitations of sampling for aggregation queries. In Proceedings of the IEEE 17th international conference on data engineering (pp. 534–542).Google Scholar
  12. Chaudhuri, S., Das, G., & Narasayya, V. R. (2007). Optimized stratified sampling for approximate query processing. ACM Transactions on Database Systems, 32(2), 9.CrossRefGoogle Scholar
  13. Chen, Z., & Narasayya, V. (2005). Efficient computation of multiple group by queries. In Proceedings of the 2005 ACM international conference on management of data (pp. 534–542).Google Scholar
  14. Colliat, G. (1996). OLAP, relational, and multidimensional database systems. ACM SIGMOD Record, 25(3), 64–69.CrossRefGoogle Scholar
  15. Cuzzocrea, A. (2005a). Overcoming limitations of approximate query answering in OLAP. In Proceedings of the 9th IEEE international database engineering and applications symposium (pp. 200–209).Google Scholar
  16. Cuzzocrea, A. (2005b). Providing probabilistically-bounded approximate answers to non-holistic aggregate range queries in OLAP. In Proceedings of the 8th ACM international workshop on data warehousing and OLAP (pp. 97–106).Google Scholar
  17. Cuzzocrea, A. (2006a). Improving range-sum query evaluation on data cubes via polynomial approximation. Data & Knowledge Engineering, 56(2), 85–121.CrossRefGoogle Scholar
  18. Cuzzocrea, A. (2006b). Accuracy control in compressed multidimensional data cubes for quality of answer-based OLAP tools. In Proceedings of the 18th IEEE international conference on scientific and statistical database management (pp. 301–310).Google Scholar
  19. Cuzzocrea, A., & Wang, W. (2007). Approximate range-sum query answering on data cubes with probabilistic guarantees. Journal of Intelligent Information Systems, 28(2), 161–197.CrossRefGoogle Scholar
  20. Cuzzocrea, A., Saccà, D., & Serafino, P. (2007). Semantics-aware advanced OLAP visualization of multidimensional data cubes. International Journal of Data Warehousing and Mining, 3(4), 1–30.Google Scholar
  21. Doan, A., & Levy, A. Y. (2002). Efficiently ordering plans for data integration. In Proceedings of the IEEE 18th international conference on data engineering (pp. 393–402).Google Scholar
  22. Faloutsos, C., & Kamel, I. (1997). Relaxing the uniformity and independence assumptions using the concept of fractal dimension. Journal of Computer and System Sciences, 55(2), 229–240.zbMATHCrossRefMathSciNetGoogle Scholar
  23. Fan, J., & Kambhampati, S. (2006). Multi-objective query processing for data aggregation. Technical Report, Computer Science and Engineering Department, Arizona State University. Retrieved from http://rakaposhi.eas.asu.edu/fan-jour.pdf.
  24. Fang, M., Shivakumar, N., Garcia-Molina, H., Motwani, R., & Ullman, J. D. (1998). Computing iceberg queries efficiently. In Proceedings of the 24th international conference on very large data bases (pp. 299–310).Google Scholar
  25. Ganti, V., Lee, M.-L., & Ramakrishnan, R. (2000). ICICLES: Self-tuning samples for approximate query answering. In Proceedings of the 26th international conference on very large data bases (pp. 176–187).Google Scholar
  26. Garofalakis, M. N., & Gibbons, P. B. (2004). Probabilistic wavelet synopses. ACM Transactions on Database Systems, 29(1), 43–90.CrossRefGoogle Scholar
  27. Garofalakis, M. N., & Kumar, A. (2005). Wavelet synopses for general error metrics. ACM Transactions on Database Systems, 30(4), 888–928.CrossRefGoogle Scholar
  28. Gibbons, P. B., & Matias, Y. (1998). New sampling-based summary statistics for improving approximate query answers. In Proceedings of the 1998 ACM international conference on management of data (pp. 331–342).Google Scholar
  29. Gilbert, A. C., Kotidis, Y., Muthukrishnan, S., & Strauss, M. (2001). Optimal and approximate computation of summary statistics for range aggregates. In Proceedings of the 20th ACM international symposium on principles of database systems.Google Scholar
  30. Guha, S., Indyk, P., Muthukrishnan, S., & Strauss, M. (2002). Histogramming data streams with fast per-item processing. In Proceedings of the 29th international colloquium on automata, languages and programming (pp. 681–692).Google Scholar
  31. Guha, S., Koudas, N., & Shim, K. (2006). Approximation and streaming algorithms for histogram construction problems. ACM Transactions on Database Systems, 31(1), 396–438.CrossRefGoogle Scholar
  32. Gunopulos, D., Kollios, G., Tsotras, V. J., & Domeniconi, C. (2005). Selectivity estimators for multidimensional range queries over real attributes. VLDB Journal, 14(2), 137–154.CrossRefGoogle Scholar
  33. Gupta, H., & Mumick, I. S. (2005). Selection of views to materialize in a data warehouse. IEEE Transactions on Knowledge and Data Engineering, 17(1), 24–43.CrossRefGoogle Scholar
  34. Han, J., & Kamber, M. (2000). Data mining: Concepts and techniques. San Francisco: Morgan Kauffmann.Google Scholar
  35. Harinarayan, V., Rajaraman, A., & Ullman, J. D. (1996). Implementing data cubes efficiently. In Proceedings of the 1996 ACM international conference on management of data (pp. 205–216).Google Scholar
  36. Hinneburg, A., Lehner, W., & Habich, D. (2003). COMBI-Operator: Database support for data mining applications. In Proceedings of the 29th international conference on very large data bases (pp. 429–439).Google Scholar
  37. Ho, C.-T., Agrawal, R., Megiddo, N., & Srikant, R. (1997). Range queries in OLAP data cubes. In Proceedings of the 1997 ACM international conference on management of data (pp. 73–88).Google Scholar
  38. Ioannidis, Y. (2003). The history of histograms (abridged). In Proceedings of the 29th international conference on very large data bases (pp. 19–30).Google Scholar
  39. Ioannidis, Y., & Poosala, V. (1999). Histogram-based approximation of set-valued query answers. In Proceedings of the 25th international conference on very large data bases (pp. 174–185).Google Scholar
  40. Ives, Z. G., Florescu, D., Friedman, M., Levy, A. Y., & Weld, D. S. (1999). An adaptive query execution system for data integration. Proceedings of the 1999 ACM international conference on management of data (pp. 299–310).Google Scholar
  41. Jagadish, H. V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K. C., & Suel, T. (1998). Optimal histograms with quality guarantees. In Proceedings of the 24th international conference on very large data bases (pp. 275–286).Google Scholar
  42. Jin, R., & Agrawal, G. (2006). A systematic approach for optimizing complex mining tasks on multiple databases. In Proceedings of the IEEE 22nd international conference on data engineering (paper 17).Google Scholar
  43. Jin, R., Sinha, K., & Agrawal, G. (2005a). A framework to support multiple query optimization for complex mining tasks. In Proceedings of the 6th ACM international workshop on multimedia data mining (pp. 23–32).Google Scholar
  44. Jin, R., Sinha, K., & Agrawal, G. (2005b). Simultaneous optimization of complex mining tasks with a knowledgeable cache. In Proceedings of the 11th ACM international conference on knowledge discovery and data mining (pp. 600–605).Google Scholar
  45. Kalnis, P., & Papadias, D. (2003). Multi-query optimization for on-line analytical processing. Information Systems, 28(5), 457–473.zbMATHCrossRefGoogle Scholar
  46. Knuth, D. (1997). The art of computer programming, volume 1: Fundamental algorithms. New York: Addison-Wesley.Google Scholar
  47. Kooi, R. P. (1980). The optimization of queries in relational databases. Ph.D. thesis, Case Western Reserve University.Google Scholar
  48. Koudas, N., Muthukrishnan, S., & Srivastava, D. (2000). Optimal histograms for hierarchical range queries. In Proceedings of the 19th ACM international symposium on principles of database systems (pp. 196–204).Google Scholar
  49. Malvestuto, F. (1993). A universal-scheme approach to statistical databases containing homogeneous summary tables. ACM Transactions on Database Systems, 18(4), 678–708.CrossRefGoogle Scholar
  50. Mistry, H., Roy, P., Sudarshan, S., & Ramamritham, K. (2001). Materialized view selection and maintenance using multi-query optimization. In Proceedings of the 2001 ACM international conference on management of data (pp. 307–318).Google Scholar
  51. Muralikrishna, M., & DeWitt, D. J. (1998). Equi-depth histograms for estimating selectivity factors for multi-dimensional queries. In Proceedings of the 1998 ACM international conference on management of data (pp. 28–36).Google Scholar
  52. Muthukrishnan, S., Poosala, V., & Suel, T. (1999). On rectangular partitionings in two dimensions: Algorithms, complexity, and applications. In Proceedings of the 7th international conference on database theory (pp. 236–256).Google Scholar
  53. Nie, Z., & Kambhampati, S. (2001). Joint optimization of cost and coverage of query plans in data integration. In Proceedings of the 2001 ACM international conference on information and knowledge management (pp. 223–230).Google Scholar
  54. OLAP Council. (1998). Analytical processing benchmark 1, release II. Retrieved from http://www.symcorp.com/downloads/OLAP_CouncilWhitePaper.pdf.
  55. Papadias, D., Tao, Y., Fu, G., & Seeger, B. (2005). Progressive skyline computation in database systems. ACM Transactions on Database Systems, 30(1), 41–82.CrossRefGoogle Scholar
  56. Piatetsky-Shapiro, G., & Connell, C. (1984). Accurate estimation of the number of tuples satisfying a condition. In Proceedings of the 1984 ACM international conference on management of data (pp. 256–276).Google Scholar
  57. Poosala, V. (1997). Histogram-based estimation techniques in database systems. Ph.D. thesis, University of Wisconsin–Madison.Google Scholar
  58. Poosala, V., & Ganti, V. (1999). Fast approximate answers to aggregate queries on a data cube. In Proceedings of the IEEE 11th international conference on scientific and statistical database management (pp. 24–33).Google Scholar
  59. Poosala, V., & Ioannidis, Y. (1997). Selectivity estimation without the attribute value independence assumption. In Proceedings of the 25th international conference on very large data bases (pp. 486–495).Google Scholar
  60. Poosala, V., Ioannidis, Y., Haas, P. J., & Shekita, E. J. (1996). Improved histograms for selectivity estimation of range predicates. In Proceedings of the 1996 ACM international conference on management of data (pp. 294–305).Google Scholar
  61. Roy, P., Seshadri, S., Sudarshan, S., & Bhobe, S. (2000). Efficient and extensible algorithms for multi-query optimization. In Proceedings of the 2000 ACM international conference on management of data (pp. 249–260).Google Scholar
  62. Selinger, P. G., Astrahan, M. M., Chamberlin, D. D., Lorie, R. A., & Price, T. G. (1979). Access path selection in a relational database management system. In Proceedings of the 1979 ACM international conference on management of data (pp. 23–34).Google Scholar
  63. Sellis, T. (1998). Multiple-query optimization. ACM Transactions on Database Systems, 13(1), 23–52.CrossRefGoogle Scholar
  64. Sellis, T., & Ghosh, S. (1990). On the multiple-query optimization problem. IEEE Transactions on Knowledge and Data Engineering, 2(2), 262–266.CrossRefGoogle Scholar
  65. Shoshani, A. (1997). OLAP and statistical databases: Similarities and differences. In Proceedings of the 16th ACM international symposium on principles of database systems (pp. 185–196).Google Scholar
  66. Srivastava, U., Haas, P. J., Markl, V., Kutsch, M., & Tran, T. M. (2006). ISOMER: Consistent histogram construction using query feedback. In Proceedings of the IEEE 22nd international conference on data engineering (paper 39).Google Scholar
  67. Thaper, N., Guha, S., Indyk, P., & Koudas, N. (2002). Dynamic multidimensional histograms. In Proceedings of the 2002 ACM international conference on management of data (pp. 428–439).Google Scholar
  68. Transaction Processing Council. (2006). TPC benchmark H. Retrieved from http://www.tpc.org/tpch/.
  69. University of California, Irvine. (2001). 1990 US census data. Retrieved from http://kdd.ics.uci.edu/databases/census1990/USCensus1990.html.
  70. Wang, S., Rundensteiner, E. A., Ganguly, S., & Bhatnagar, S. (2006). State-slice: A new paradigm of multi-query optimization of window-based stream queries. In Proceedings of the 32nd international conference on very large data bases (pp. 619–630).Google Scholar
  71. Xin, D., Han, J., Cheng, H., & Li, X. (2006). Answering top-k queries with multi-dimensional selections: The ranking cube approach. In Proceedings of the 32nd international conference on very large data bases (pp. 463–475).Google Scholar
  72. Xu, W., Theodoratos, D., & Zuzarte, C. (2006). Computing closest common sub-expressions for view selection problems. In Proceedings of the 9th ACM international workshop on data warehousing and OLAP (pp. 75–82).Google Scholar
  73. Zhang, R., Koudas, N., Ooi, B. C., & Srivastava, D. (2005). Multiple aggregations over data streams. In Proceedings of the 2005 ACM international conference on management of data (pp. 299–310).Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.ICAR-CNR and University of CalabriaCosenzaItaly

Personalised recommendations