Skip to main content

A Survey of Change Diagnosis Algorithms in Evolving Data Streams

  • Chapter
Data Streams

Part of the book series: Advances in Database Systems ((ADBS,volume 31))

Abstract

An important problem in the field of data stream analysis is change detection and monitoring. In many cases, the data stream can show changes over time which can be used for understanding the nature of several applications. We discuss the concept of velocity density estimation, a technique used to understand, visualize and determine trends in the evolution of fast data streams. We show how to use velocity density estimation in order to create both temporal velocity profiles and spatial velocity profiles at periodic instants in time. These profiles are then used in order to predict three kinds of data evolution. Methods are proposed to visualize the changing data trends in a single online scan of the data stream, and a computational requirement which is linear in the number of data points. In addition, batch processing techniques are proposed in order to identify combinations of dimensions which show the greatest amount of global evolution. We also discuss the problem of change detection in the context of graph data, and illustrate that it may often be useful to determine communities of evolution in graph environments.

The presence of evolution in data streams may also change the underlying data to the extent that the underlying data mining models may need to be modified to account for the change in data distribution. We discuss a number of methods for micro-clustering which are used to study the effect of evolution on problems such as clustering and classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal C., Procopiuc C., Wolf J., Yu P., Park J.-S. (1999). Fast algorithms for projected clustering. ACM SIGMOD Conference.

    Google Scholar 

  2. Aggarwal C., Yu P. S (2005). Online Analysis of Community Evolution in Data Streams. ACM SIAM Conference on Data Mining.

    Google Scholar 

  3. Aggarwal C (2003). A Framework for Diagnosing Changes in Evolving Data Streams. ACM SIGMOD Conference.

    Google Scholar 

  4. Aggarwal C (2002). An Intuitive Framework for understanding Changes in Evolving Data Streams. IEEE ICDE Conference.

    Google Scholar 

  5. Aggarwal C, Han J., Wang J., Yu P (2003). A Framework for Clustering Evolving Data Streams. VLDB Conference.

    Google Scholar 

  6. Aggarwal C, Han J., Wang J., Yu P (2004). A Framework for High Dimensional Projected Clustering of Data Streams. VLDB Conference.

    Google Scholar 

  7. Aggarwal C, Han J., Wang J., Yu P. (2004). On-Demand Classification of Data Streams. ACM KDD Conference.

    Google Scholar 

  8. Aggarwal C. (2006). On Biased Reservoir Sampling in the presence of stream evolution. VLDB Conference.

    Google Scholar 

  9. Chawathe S., Garcia-Molina H. (1997). Meaningful Change Detection in Structured Data. ACM SIGMOD Conference Proceedings.

    Google Scholar 

  10. Cheung D., Han J., Ng V., Wong C. Y (1996). Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique. IEEE ICDE Conference Proceedings.

    Google Scholar 

  11. Dasu T., Krishnan S., Venkatasubramaniam S., Yi K. (2005). An Information-Theoretic Approach to Detecting Changes in Multidimensional data Streams. Duke University Technical Report CS-2005-06.

    Google Scholar 

  12. Donjerkovic D., Ioannidis Y. E., Ramakrishnan R. (2000). Dynamic Histograms: Capturing Evolving Data Sets. IEEE ICDE Conference Proceedings.

    Google Scholar 

  13. Ganti V., Gehrke J., Ramakrishnan R (2002). Mining Data Streams under Block Evolution. ACM SIGKDD Explorations, 3(2), 2002.

    Google Scholar 

  14. Ganti V., Gehrke J., Ramakrishnan R., Loh W.-Y (1999). A Framework for Measuring Differences in Data Characteristics. ACM PODS Conference Proceedings.

    Google Scholar 

  15. Gollapudi S., Sivakumar D. (2004) Framework and Algorithms for Trend Analysis in Massive Temporal Data ACM CIKM Conference Proceedings.

    Google Scholar 

  16. Hulten G., Spencer L., Domingos P. (2001). Mining Time Changing Data Streams. ACM KDD Conference.

    Google Scholar 

  17. Jain A., Dubes R. (1998). Algorithms for Clustering Data, Prentice Hall, New Jersey.

    Google Scholar 

  18. Kifer D., David S.-B., Gehrke J. (2004). Detecting Change in Data Streams. VLDB Conference, 2004.

    Google Scholar 

  19. Roddick J. F. et al (2000). Evolution and Change in Data Management: Issues and Directions. ACM SIGMOD Record, 29(1): pp. 21–25.

    Article  Google Scholar 

  20. Roddick J. F., Spiliopoulou M (1999). A Bibliography of Temporal, Spatial, and Spatio-Temporal Data Mining Research. ACM SIGKDD Explorations, 1(1).

    Google Scholar 

  21. Schweller R., Gupta A., Parsons E., Chen Y. (2004) Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams. Internet Measurement Conference Proceedings.

    Google Scholar 

  22. Sellis T (1999). Research Issues in Spatio-temporal Database Systems. Symposium on Spatial Databases Proceedings.

    Google Scholar 

  23. Silverman B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall.

    Google Scholar 

  24. Thomas S., Bodagala S., Alsabti K., Ranka S. (1997). An Efficient Algorithm for the Incremental Updating of Association Rules in Large Databases. ACM KDD Conference Proceedings.

    Google Scholar 

  25. Vitter J. S. (1985) Random Sampling with a Reservoir. ACM Transactions on Mathematical Software, Vol. 11(1), pp 37–57.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Aggarwal, C.C. (2007). A Survey of Change Diagnosis Algorithms in Evolving Data Streams. In: Aggarwal, C.C. (eds) Data Streams. Advances in Database Systems, vol 31. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-47534-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-47534-9_5

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-28759-1

  • Online ISBN: 978-0-387-47534-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics