Abstract
Stream mining is a challenging problem that has attracted considerable attention in the last decade. As a result there are numerous algorithms for mining data streams, from summarizing and analyzing, to change and anomaly detection. However, most research focuses on proposing, adapting or improving algorithms and studying their computational performance. For a practitioner of stream mining, there is very little guidance on choosing a technology suited for a particular task or application.
In this paper, we address the practical aspect of choosing a suitable algorithm by drawing on the statistical properties of power and robustness. For the purpose of illustration, we focus on change detection algorithms (CDAs). We define an objective performance measure, streaming power, and use it to explore the robustness of three different algorithms. The measure is comparable for disparate algorithms, and provides a common framework for comparing and evaluating change detection algorithms on any data set in a meaningful fashion. We demonstrate on real world applications, and on synthetic data.
In addition, we present a repository of data streams for the community to test change detection algorithms for streaming data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C.: A framework for diagnosing changes in evolving data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 575–586 (2003)
Chakrabarti, S., Sarawagi, S., Dom, B.: Mining surprising patterns using temporal description length. In: Proceedings of 24rd International Conference on Very Large Databases, pp. 606–617 (1998)
Chawathe, S.S., Garcia-Molina, H.: Meaningful change detection in structured data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 26–37 (1997)
Cox, D.R., Hinkley, D.V.: Theoretical Statistics. Wiley, New York (1974)
Dasu, T., Krishnan, S., Lin, D., Venkatasubramanian, S., Yi, K.: Change (Detection) you can believe in: Finding distributional shifts in data streams. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, J.-F. (eds.) IDA 2009. LNCS, vol. 5772, pp. 21–34. Springer, Heidelberg (2009)
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman and Hall (1993)
Ganti, V., Gehrke, J., Ramakrishnan, R., Loh, W.-Y.: A framework for measuring differences in data characteristics, pp. 126–137 (1999)
Huber, P.J.: Robust Statistics. John Wiley, New York (1981)
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: KDD, pp. 97–106 (2001)
Keogh, E., Lonardi, S., Chiu, B.Y.: Finding surprising patterns in a time series database in linear time and space. In: KDD, pp. 550–556 (2002)
Kifer, D., Ben-David, S., Gehrke, J.: Detecting changes in data streams. In: Proceedings of the 30th International Conference on Very Large Databases, pp. 180–191 (2004)
Kleinberg, J.: Bursty and hierarchical structure in streams. Data Mining and Knowledge Discovery 7(4), 373–397 (2003)
Song, X., Wu, M., Jermaine, C., Ranka, S.: Statistical change detection for multi-dimensional data. In: ACM SIGKDD 2007, pp. 667–676 (2007)
Zhu, Y., Shasha, D.: Efficient elastic burst detection in data streams. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 336–345 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dasu, T., Krishnan, S., Pomann, G.M. (2011). Robustness of Change Detection Algorithms. In: Gama, J., Bradley, E., Hollmén, J. (eds) Advances in Intelligent Data Analysis X. IDA 2011. Lecture Notes in Computer Science, vol 7014. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24800-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-24800-9_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24799-6
Online ISBN: 978-3-642-24800-9
eBook Packages: Computer ScienceComputer Science (R0)