Abstract
An important problem in the field of data stream analysis is change detection and monitoring. In many cases, the data stream can show changes over time which can be used for understanding the nature of several applications. We discuss the concept of velocity density estimation, a technique used to understand, visualize and determine trends in the evolution of fast data streams. We show how to use velocity density estimation in order to create both temporal velocity profiles and spatial velocity profiles at periodic instants in time. These profiles are then used in order to predict three kinds of data evolution. Methods are proposed to visualize the changing data trends in a single online scan of the data stream, and a computational requirement which is linear in the number of data points. In addition, batch processing techniques are proposed in order to identify combinations of dimensions which show the greatest amount of global evolution. We also discuss the problem of change detection in the context of graph data, and illustrate that it may often be useful to determine communities of evolution in graph environments.
The presence of evolution in data streams may also change the underlying data to the extent that the underlying data mining models may need to be modified to account for the change in data distribution. We discuss a number of methods for micro-clustering which are used to study the effect of evolution on problems such as clustering and classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal C., Procopiuc C., Wolf J., Yu P., Park J.-S. (1999). Fast algorithms for projected clustering. ACM SIGMOD Conference.
Aggarwal C., Yu P. S (2005). Online Analysis of Community Evolution in Data Streams. ACM SIAM Conference on Data Mining.
Aggarwal C (2003). A Framework for Diagnosing Changes in Evolving Data Streams. ACM SIGMOD Conference.
Aggarwal C (2002). An Intuitive Framework for understanding Changes in Evolving Data Streams. IEEE ICDE Conference.
Aggarwal C, Han J., Wang J., Yu P (2003). A Framework for Clustering Evolving Data Streams. VLDB Conference.
Aggarwal C, Han J., Wang J., Yu P (2004). A Framework for High Dimensional Projected Clustering of Data Streams. VLDB Conference.
Aggarwal C, Han J., Wang J., Yu P. (2004). On-Demand Classification of Data Streams. ACM KDD Conference.
Aggarwal C. (2006). On Biased Reservoir Sampling in the presence of stream evolution. VLDB Conference.
Chawathe S., Garcia-Molina H. (1997). Meaningful Change Detection in Structured Data. ACM SIGMOD Conference Proceedings.
Cheung D., Han J., Ng V., Wong C. Y (1996). Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique. IEEE ICDE Conference Proceedings.
Dasu T., Krishnan S., Venkatasubramaniam S., Yi K. (2005). An Information-Theoretic Approach to Detecting Changes in Multidimensional data Streams. Duke University Technical Report CS-2005-06.
Donjerkovic D., Ioannidis Y. E., Ramakrishnan R. (2000). Dynamic Histograms: Capturing Evolving Data Sets. IEEE ICDE Conference Proceedings.
Ganti V., Gehrke J., Ramakrishnan R (2002). Mining Data Streams under Block Evolution. ACM SIGKDD Explorations, 3(2), 2002.
Ganti V., Gehrke J., Ramakrishnan R., Loh W.-Y (1999). A Framework for Measuring Differences in Data Characteristics. ACM PODS Conference Proceedings.
Gollapudi S., Sivakumar D. (2004) Framework and Algorithms for Trend Analysis in Massive Temporal Data ACM CIKM Conference Proceedings.
Hulten G., Spencer L., Domingos P. (2001). Mining Time Changing Data Streams. ACM KDD Conference.
Jain A., Dubes R. (1998). Algorithms for Clustering Data, Prentice Hall, New Jersey.
Kifer D., David S.-B., Gehrke J. (2004). Detecting Change in Data Streams. VLDB Conference, 2004.
Roddick J. F. et al (2000). Evolution and Change in Data Management: Issues and Directions. ACM SIGMOD Record, 29(1): pp. 21–25.
Roddick J. F., Spiliopoulou M (1999). A Bibliography of Temporal, Spatial, and Spatio-Temporal Data Mining Research. ACM SIGKDD Explorations, 1(1).
Schweller R., Gupta A., Parsons E., Chen Y. (2004) Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams. Internet Measurement Conference Proceedings.
Sellis T (1999). Research Issues in Spatio-temporal Database Systems. Symposium on Spatial Databases Proceedings.
Silverman B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall.
Thomas S., Bodagala S., Alsabti K., Ranka S. (1997). An Efficient Algorithm for the Incremental Updating of Association Rules in Large Databases. ACM KDD Conference Proceedings.
Vitter J. S. (1985) Random Sampling with a Reservoir. ACM Transactions on Mathematical Software, Vol. 11(1), pp 37–57.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Aggarwal, C.C. (2007). A Survey of Change Diagnosis Algorithms in Evolving Data Streams. In: Aggarwal, C.C. (eds) Data Streams. Advances in Database Systems, vol 31. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-47534-9_5
Download citation
DOI: https://doi.org/10.1007/978-0-387-47534-9_5
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-28759-1
Online ISBN: 978-0-387-47534-9
eBook Packages: Computer ScienceComputer Science (R0)