Multi-Dimensional Analysis of Data Streams Using Stream Cubes

Han, Jiawei; Cai, Y. Dora; Chen, Yixin; Dong, Guozhu; Pei, Jian; Wah, Benjamin W.; Wang, Jianyong

doi:10.1007/978-0-387-47534-9_6

Multi-Dimensional Analysis of Data Streams Using Stream Cubes

Jiawei Han³,
Y. Dora Cai³,
Yixin Chen⁴,
Guozhu Dong⁵,
Jian Pei⁶,
Benjamin W. Wah³ &
…
Jianyong Wang⁷

Chapter

2866 Accesses
2 Citations

Part of the book series: Advances in Database Systems ((ADBS,volume 31))

Abstract

Large volumes of dynamic stream data pose great challenges to its analysis. Besides its dynamic and transient behavior, stream data has another important characteristic: multi-dimensionality. Much of stream data resides at a multidimensional space and at rather low level of abstraction, whereas most analysts are interested in relatively high-level dynamic changes in some combination of dimensions. To discover high-level dynamic and evolving characteristics, one may need to perform multi-level, multi-dimensional on-line analytical processing (OLAP) of stream data. Such necessity calls for the investigation of new architectures that may facilitate on-line analytical processing of multi-dimensional stream data.

In this chapter, we introduce an interesting stream_cube architecture that effectively performs on-line partial aggregation of multi-dimensional stream data, captures the essential dynamic and evolving characteristics of data streams, and facilitates fast OLAP on stream data. Three important techniques are proposed for the design and implementation of stream cubes. First, a tilted time frame model is proposed to register time-related data in a multi-resolution model: The more recent data are registered at finer resolution, whereas the more distant data are registered at coarser resolution. This design reduces the overall storage requirements of time-related data and adapts nicely to the data analysis tasks commonly encountered in practice. Second, instead of materializing cuboids at all levels, two critical layers: observation layer and minimal interesting layer, are maintained to support routine as well as flexible analysis with minimal computation cost. Third, an efficient stream data cubing algorithm is developed that computes only the layers (cuboids) along a popular path and leaves the other cuboids for on-line, query-driven computation. Based on this design methodology, stream data cube can be constructed and maintained incrementally with reasonable memory space, computation cost, and query response time. This is verified by our substantial performance study.

Stream cube architecture facilitates online analytical processing of stream data. It also forms a preliminary structure for online stream mining. The impact of the design and implementation of stream cube in the context of stream mining is also discussed in the chapter.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. In Proc. 1996 Int. Conf. Very Large Data Bases (VLDB’96), pages 506–521, Bombay, India, Sept. 1996.
Google Scholar
C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In Proc. 2003 Int. Conf. Very Large Data Bases (VLDB’03), pages 81–92, Berlin, Germany, Sept. 2003.
Google Scholar
C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for projected clustering of high dimensional data streams. In Proc. 2004 Int. Conf. Very Large Data Bases (VLDB’04), pages 852–863, Toronto, Canada, Aug. 2004.
Google Scholar
C. Aggarwal, J. Han, J. Wang, and P. S. Yu. On demand classification of data streams. In Proc. 2004 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD’ 04), pages 503–508, Seattle, WA, Aug. 2004.
Google Scholar
R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. 1995 Int. Conf. Data Engineering (ICDE’ 95), pages 3–14, Taipei, Taiwan, Mar. 1995.
Google Scholar
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proc. 2002 ACM Symp. Principles of Database Systems (PODS’02), pages 1–16, Madison, WI, June 2002.
Google Scholar
K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cubes. In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’99), pages 359–370, Philadelphia, PA, June 1999.
Google Scholar
S. Babu and J. Widom. Continuous queries over data streams. SIGMOD Record, 30:109–120, 2001.
Article Google Scholar
B.-C. Chen, L. Chen, Y. Lin, and R. Ramakrishnan. Prediction cubes. In Proc. 2005 Int. Conf. Very Large Data Bases (VLDB’ 05), pages 982–993, Trondheim, Norway, Aug. 2005.
Google Scholar
Y. D. Cai, D. Clutter, G. Pape, J. Han, M. Welge, and L. Auvil. MAIDS: Mining alarming incidents from data streams. In Proc. 2004 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’ 04), pages 919–920, Paris, France, June 2004.
Google Scholar
S. Chaudhuri and U. Dayal. An overview of data warehousing and OLAP technology. SIGMOD Record, 26:65–74, 1997.
Article Google Scholar
Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang. Multi-dimensional regression analysis of time-series data streams. In Proc. 2002 Int. Conf. Very Large Data Bases (VLDB’02), pages 323–334, Hong Kong, China, Aug. 2002.
Google Scholar
G. Dong, J. Han, J. Lam, J. Pei, and K. Wang. Mining multi-dimensional constrained gradients in data cubes. In Proc. 2001 Int. Conf. on Very Large Data Bases (VLDB’01), pages 321–330, Rome, Italy, Sept. 2001.
Google Scholar
J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab and sub-totals. Data Mining and Knowledge Discovery, 1:29–54, 1997.
Article Google Scholar
C. Giannella, J. Han, J. Pei, X. Yan, and P. S. Yu. Mining frequent patterns in data streams at multiple time granularities. In H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha, editors, Data Mining: Next Generation Challenges and Future Directions. AAAI/MIT Press, 2004.
Google Scholar
M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proc. 2001 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’ 01), pages 58–66, Santa Barbara, CA, May 2001.
Google Scholar
A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In Proc. 2001 Int. Conf. on Very Large Data Bases (VLDB’ 01), pages 79–88, Rome, Italy, Sept. 2001.
Google Scholar
J. Gehrke, F. Korn, and D. Srivastava. On computing correlated aggregates over continuous data streams. In Proc. 2001 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’01), pages 13–24, Santa Barbara, CA, May 2001.
Google Scholar
S. Guha, N. Mishra, R. Motwani, and L. O’Callaghan. Clustering data streams. In Proc. 2000 Symp. Foundations of Computer Science (FOCS’00), pages 359–366, Redondo Beach, CA, 2000.
Google Scholar
J. Han, J. Pei, G. Dong, and K. Wang. Efficient computation of iceberg cubes with complex measures. In Proc. 2001 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’ 01), pages 1–12, Santa Barbara, CA, May 2001.
Google Scholar
V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’96), pages 205–216, Montreal, Canada, June 1996.
Google Scholar
G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In Proc. 2001 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD’ 01), San Fransisco, CA, Aug. 2001.
Google Scholar
T. Imielinski, L. Khachiyan, and A. Abdulghani. Cubegrades: Generalizing association rules. Data Mining and Knowledge Discovery, 6:219–258, 2002.
Article MathSciNet Google Scholar
X. Li, J. Han, and H. Gonzalez. High-dimensional OLAP: A minimal cubing approach. In Proc. 2004 Int. Conf. Very Large Data Bases (VLDB’04), pages 528–539, Toronto, Canada, Aug. 2004.
Google Scholar
G. Manku and R. Motwani. Approximate frequency counts over data streams. In Proc. 2002 Int. Conf. Very Large Data Bases (VLDB’ 02), pages 346–357, Hong Kong, China, Aug. 2002.
Google Scholar
S. Sarawagi, R. Agrawal, and N. Megiddo. Discovery-driven exploration of OLAP data cubes. In Proc. Int. Conf of Extending Database Technology (EDBT’98), pages 168–182, Valencia, Spain, Mar. 1998.
Google Scholar
Z. Shao, J. Han, and D. Xin. MM-Cubing: Computing iceberg cubes by factorizing the lattice space. In Proc. 2004 Int. Conf. on Scientific and Statistical Database Management (SSDBM’04), pages 213–222, Santorini Island, Greece, June 2004.
Google Scholar
G. Sathe and S. Sarawagi. Intelligent rollups in multidimensional OLAP data. In Proc. 2001 Int. Conf. Very Large Data Bases (VLDB’01), pages 531–540, Rome, Italy, Sept. 2001.
Google Scholar
H. Wang, W. Fan, P. S. Yu, and J. Han. Mining concept-drifting data streams using ensemble classifiers. In Proc. 2003 ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD’03), pages 226–235, Washington, DC, Aug. 2003.
Google Scholar
D. Xin, J. Han, X. Li, and B. W. Wah. Star-cubing: Computing iceberg cubes by top-down and bottom-up integration. In Proc. 2003 Int. Conf Very Large Data Bases (VLDB’03), pages 476–487, Berlin, Germany, Sept. 2003.
Google Scholar
Y. Zhao, P. Deshpande, and J. Naughton. An array-based algorithm for simultaneous multi-dimensional aggregates. In Proc. ACM-SIGMOD International Conference on Management of Data, pages 159–170, 1997.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Illinois, Urbana, Illinois
Jiawei Han, Y. Dora Cai & Benjamin W. Wah
Washington University, St. Louis, Missouri
Yixin Chen
Wright State University, Dayton, Ohio
Guozhu Dong
Simon Fraser University, British Columbia, Canada
Jian Pei
Tsinghua University, Beijing, China
Jianyong Wang

Authors

Jiawei Han
View author publications
You can also search for this author in PubMed Google Scholar
Y. Dora Cai
View author publications
You can also search for this author in PubMed Google Scholar
Yixin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Guozhu Dong
View author publications
You can also search for this author in PubMed Google Scholar
Jian Pei
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin W. Wah
View author publications
You can also search for this author in PubMed Google Scholar
Jianyong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IBM, Thomas J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY, 10532
Charu C. Aggarwal

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Han, J. et al. (2007). Multi-Dimensional Analysis of Data Streams Using Stream Cubes. In: Aggarwal, C.C. (eds) Data Streams. Advances in Database Systems, vol 31. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-47534-9_6

Download citation

DOI: https://doi.org/10.1007/978-0-387-47534-9_6
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-28759-1
Online ISBN: 978-0-387-47534-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics