Skip to main content
Book cover

Data Streams pp 103–125Cite as

Multi-Dimensional Analysis of Data Streams Using Stream Cubes

  • Chapter

Part of the book series: Advances in Database Systems ((ADBS,volume 31))

Abstract

Large volumes of dynamic stream data pose great challenges to its analysis. Besides its dynamic and transient behavior, stream data has another important characteristic: multi-dimensionality. Much of stream data resides at a multidimensional space and at rather low level of abstraction, whereas most analysts are interested in relatively high-level dynamic changes in some combination of dimensions. To discover high-level dynamic and evolving characteristics, one may need to perform multi-level, multi-dimensional on-line analytical processing (OLAP) of stream data. Such necessity calls for the investigation of new architectures that may facilitate on-line analytical processing of multi-dimensional stream data.

In this chapter, we introduce an interesting stream_cube architecture that effectively performs on-line partial aggregation of multi-dimensional stream data, captures the essential dynamic and evolving characteristics of data streams, and facilitates fast OLAP on stream data. Three important techniques are proposed for the design and implementation of stream cubes. First, a tilted time frame model is proposed to register time-related data in a multi-resolution model: The more recent data are registered at finer resolution, whereas the more distant data are registered at coarser resolution. This design reduces the overall storage requirements of time-related data and adapts nicely to the data analysis tasks commonly encountered in practice. Second, instead of materializing cuboids at all levels, two critical layers: observation layer and minimal interesting layer, are maintained to support routine as well as flexible analysis with minimal computation cost. Third, an efficient stream data cubing algorithm is developed that computes only the layers (cuboids) along a popular path and leaves the other cuboids for on-line, query-driven computation. Based on this design methodology, stream data cube can be constructed and maintained incrementally with reasonable memory space, computation cost, and query response time. This is verified by our substantial performance study.

Stream cube architecture facilitates online analytical processing of stream data. It also forms a preliminary structure for online stream mining. The impact of the design and implementation of stream cube in the context of stream mining is also discussed in the chapter.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. In Proc. 1996 Int. Conf. Very Large Data Bases (VLDB’96), pages 506–521, Bombay, India, Sept. 1996.

    Google Scholar 

  2. C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In Proc. 2003 Int. Conf. Very Large Data Bases (VLDB’03), pages 81–92, Berlin, Germany, Sept. 2003.

    Google Scholar 

  3. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for projected clustering of high dimensional data streams. In Proc. 2004 Int. Conf. Very Large Data Bases (VLDB’04), pages 852–863, Toronto, Canada, Aug. 2004.

    Google Scholar 

  4. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. On demand classification of data streams. In Proc. 2004 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD’ 04), pages 503–508, Seattle, WA, Aug. 2004.

    Google Scholar 

  5. R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. 1995 Int. Conf. Data Engineering (ICDE’ 95), pages 3–14, Taipei, Taiwan, Mar. 1995.

    Google Scholar 

  6. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proc. 2002 ACM Symp. Principles of Database Systems (PODS’02), pages 1–16, Madison, WI, June 2002.

    Google Scholar 

  7. K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cubes. In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’99), pages 359–370, Philadelphia, PA, June 1999.

    Google Scholar 

  8. S. Babu and J. Widom. Continuous queries over data streams. SIGMOD Record, 30:109–120, 2001.

    Article  Google Scholar 

  9. B.-C. Chen, L. Chen, Y. Lin, and R. Ramakrishnan. Prediction cubes. In Proc. 2005 Int. Conf. Very Large Data Bases (VLDB’ 05), pages 982–993, Trondheim, Norway, Aug. 2005.

    Google Scholar 

  10. Y. D. Cai, D. Clutter, G. Pape, J. Han, M. Welge, and L. Auvil. MAIDS: Mining alarming incidents from data streams. In Proc. 2004 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’ 04), pages 919–920, Paris, France, June 2004.

    Google Scholar 

  11. S. Chaudhuri and U. Dayal. An overview of data warehousing and OLAP technology. SIGMOD Record, 26:65–74, 1997.

    Article  Google Scholar 

  12. Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang. Multi-dimensional regression analysis of time-series data streams. In Proc. 2002 Int. Conf. Very Large Data Bases (VLDB’02), pages 323–334, Hong Kong, China, Aug. 2002.

    Google Scholar 

  13. G. Dong, J. Han, J. Lam, J. Pei, and K. Wang. Mining multi-dimensional constrained gradients in data cubes. In Proc. 2001 Int. Conf. on Very Large Data Bases (VLDB’01), pages 321–330, Rome, Italy, Sept. 2001.

    Google Scholar 

  14. J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab and sub-totals. Data Mining and Knowledge Discovery, 1:29–54, 1997.

    Article  Google Scholar 

  15. C. Giannella, J. Han, J. Pei, X. Yan, and P. S. Yu. Mining frequent patterns in data streams at multiple time granularities. In H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha, editors, Data Mining: Next Generation Challenges and Future Directions. AAAI/MIT Press, 2004.

    Google Scholar 

  16. M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proc. 2001 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’ 01), pages 58–66, Santa Barbara, CA, May 2001.

    Google Scholar 

  17. A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In Proc. 2001 Int. Conf. on Very Large Data Bases (VLDB’ 01), pages 79–88, Rome, Italy, Sept. 2001.

    Google Scholar 

  18. J. Gehrke, F. Korn, and D. Srivastava. On computing correlated aggregates over continuous data streams. In Proc. 2001 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’01), pages 13–24, Santa Barbara, CA, May 2001.

    Google Scholar 

  19. S. Guha, N. Mishra, R. Motwani, and L. O’Callaghan. Clustering data streams. In Proc. 2000 Symp. Foundations of Computer Science (FOCS’00), pages 359–366, Redondo Beach, CA, 2000.

    Google Scholar 

  20. J. Han, J. Pei, G. Dong, and K. Wang. Efficient computation of iceberg cubes with complex measures. In Proc. 2001 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’ 01), pages 1–12, Santa Barbara, CA, May 2001.

    Google Scholar 

  21. V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’96), pages 205–216, Montreal, Canada, June 1996.

    Google Scholar 

  22. G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In Proc. 2001 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD’ 01), San Fransisco, CA, Aug. 2001.

    Google Scholar 

  23. T. Imielinski, L. Khachiyan, and A. Abdulghani. Cubegrades: Generalizing association rules. Data Mining and Knowledge Discovery, 6:219–258, 2002.

    Article  MathSciNet  Google Scholar 

  24. X. Li, J. Han, and H. Gonzalez. High-dimensional OLAP: A minimal cubing approach. In Proc. 2004 Int. Conf. Very Large Data Bases (VLDB’04), pages 528–539, Toronto, Canada, Aug. 2004.

    Google Scholar 

  25. G. Manku and R. Motwani. Approximate frequency counts over data streams. In Proc. 2002 Int. Conf. Very Large Data Bases (VLDB’ 02), pages 346–357, Hong Kong, China, Aug. 2002.

    Google Scholar 

  26. S. Sarawagi, R. Agrawal, and N. Megiddo. Discovery-driven exploration of OLAP data cubes. In Proc. Int. Conf of Extending Database Technology (EDBT’98), pages 168–182, Valencia, Spain, Mar. 1998.

    Google Scholar 

  27. Z. Shao, J. Han, and D. Xin. MM-Cubing: Computing iceberg cubes by factorizing the lattice space. In Proc. 2004 Int. Conf. on Scientific and Statistical Database Management (SSDBM’04), pages 213–222, Santorini Island, Greece, June 2004.

    Google Scholar 

  28. G. Sathe and S. Sarawagi. Intelligent rollups in multidimensional OLAP data. In Proc. 2001 Int. Conf. Very Large Data Bases (VLDB’01), pages 531–540, Rome, Italy, Sept. 2001.

    Google Scholar 

  29. H. Wang, W. Fan, P. S. Yu, and J. Han. Mining concept-drifting data streams using ensemble classifiers. In Proc. 2003 ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD’03), pages 226–235, Washington, DC, Aug. 2003.

    Google Scholar 

  30. D. Xin, J. Han, X. Li, and B. W. Wah. Star-cubing: Computing iceberg cubes by top-down and bottom-up integration. In Proc. 2003 Int. Conf Very Large Data Bases (VLDB’03), pages 476–487, Berlin, Germany, Sept. 2003.

    Google Scholar 

  31. Y. Zhao, P. Deshpande, and J. Naughton. An array-based algorithm for simultaneous multi-dimensional aggregates. In Proc. ACM-SIGMOD International Conference on Management of Data, pages 159–170, 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Han, J. et al. (2007). Multi-Dimensional Analysis of Data Streams Using Stream Cubes. In: Aggarwal, C.C. (eds) Data Streams. Advances in Database Systems, vol 31. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-47534-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-47534-9_6

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-28759-1

  • Online ISBN: 978-0-387-47534-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics