An Overview on Mining Data Streams

Gama, João; Rodrigues, Pedro Pereira

doi:10.1007/978-3-642-01091-0_2

João Gama⁶ &
Pedro Pereira Rodrigues⁷

Part of the book series: Studies in Computational Intelligence ((SCI,volume 206))

1090 Accesses
12 Citations

Summary

The most challenging applications of knowledge discovery involve dynamic environments where data continuous flow at high-speed and exhibit non-stationary properties. In this chapter we discuss the main challenges and issues when learning from data streams. In this work, we discuss the most relevant issues in knowledge discovery from data streams: incremental learning, cost-performance management, change detection, and novelty detection. We present illustrative algorithms for these learning tasks, and a real-world application illustrating the advantages of stream processing. The chapter ends with some open issues that emerge from this new research area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C., Han, J., Wang, J., Yu, P.: A framework for clustering evolving data streams. In: Proceedings of Twenty-Ninth International Conference on Very Large Data Bases, pp. 81–92. Morgan Kaufmann, San Francisco (2003)
Google Scholar
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the 21st Symposium on Principles of Database Systems, pp. 1–16. ACM Press, New York (2002)
Google Scholar
Barbará, D.: Requirements for clustering data streams. SIGKDD Explorations 3, 23–27 (2002)
Article Google Scholar
Barbara, D., Chen, P.: Using the fractal dimension to cluster datasets. In: Proc. of the 6th International Conference on Knowledge Discovery and Data Mining, pp. 260–264. ACM Press, New York (2000)
Google Scholar
Basseville, M., Nikiforov, I.: Detection of abrupt changes: Theory and applications. Prentice-Hall Inc., Englewood Cliffs (1987)
Google Scholar
Bauer, D.F.: Constructing confidence sets using rank statistics. Journal of American Statistical Association, 687–690 (1972)
Google Scholar
Cauwenberghs, G., Poggio, T.: Incremental and decremental support vector machine learning. In: Proceedings of the 13th Neural Information Processing Systems (2000)
Google Scholar
Craven, M., Shavlik, J.: Using neural networks for data mining. Future Generation Computer Systems 13, 211–229 (1997)
Article Google Scholar
Domingos, P., Hulten, G.: Mining High-Speed Data Streams. In: Proceedings of the ACM Sixth International Conference on Knowledge Discovery and Data Mining, pp. 71–80. ACM Press, New York (2000)
Chapter Google Scholar
Domingos, P., Hulten, G.: A general method for scaling up machine learning algorithms and its application to clustering. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 106–113. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29, 103–129 (1997)
Article MATH Google Scholar
Ferrer-Troyano, F., Aguilar-Ruiz, J., Riquelme, J.: Incremental rule learning and border examples selection from numerical data streams. Journal of Universal Computer Science 11, 1426–1439 (2005)
Google Scholar
Gaber, M.M., Krishnaswamy, S., Zaslavsky, A.: Cost-efficient mining techniques for data streams. In: Proceedings of the second workshop on Australasian information security, pp. 109–114. Australian Computer Society, Inc. (2004)
Google Scholar
Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS, vol. 3171, pp. 286–295. Springer, Heidelberg (2004)
Google Scholar
Gama, J., Pinto, C.: Discretization from data streams: applications to histograms and data mining. In: SAC, pp. 662–667. ACM Press, New York (2006)
Google Scholar
Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 523–528. ACM Press, Washington (2003)
Chapter Google Scholar
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.: Mining frequent patterns in data streams at multiple time granularities. In: Next Generation Data Mining. AAAI/MIT (2003)
Google Scholar
Gibbons, P.B., Matias, Y.: Synopsis data structures for massive data sets. In: ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 909–910. Society for Industrial and Applied Mathematics (1999)
Google Scholar
Guha, S., Harb, B.: Wavelet synopsis for data streams: minimizing non-euclidean error. In: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pp. 88–97. ACM Press, New York (2005)
Chapter Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD 2000: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 1–12. ACM Press, New York (2000)
Chapter Google Scholar
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the 7th ACM SIGKDD International conference on Knowledge discovery and data mining, pp. 97–106. ACM Press, San Francisco (2001)
Google Scholar
Ikonomovska, E., Gama, J.: Learning model trees from data streams. In: Discovery Science, (no prelo). Springer, Heidelberg (2008)
Google Scholar
Jin, R., Agrawal, G.: Efficient decision tree construction on streaming data. In: Proceedings of the Ninth International Conference on Knowledge Discovery and Data Mining. ACM Press, New York (2003)
Google Scholar
Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: VLDB 2004: Proceedings of the 30th International Conference on Very Large Data Bases, pp. 180–191. Morgan Kaufmann Publishers Inc., San Francisco (2004)
Google Scholar
Klinkenberg, R.: Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis 8, 281–300 (2004)
Google Scholar
Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD 2003), pp. 2–11 (2003)
Google Scholar
Markou, M., Singh, S.: Novelty detection: a review-part 1: neural network based approaches (2003)
Google Scholar
O’Callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: Streaming-data algorithms for high-quality clustering. In: Proceedings of IEEE International Conference on Data Engineering. IEEE Press, Los Alamitos (2002)
Google Scholar
Rauschenbach, T.: Short-term load forecast using wavelet transformation. Proceeding (362) Artificial Intelligence and Applications (2002)
Google Scholar
Rodrigues, P., Gama, J.: A system for analysis and prediction of electricity-load streams. Intelligent Data Analysis 13 (to appear, 2009)
Google Scholar
Rodrigues, P., Gama, J., Pedroso, J.: Odac: Hierarchical clustering of time series data streams. In: Proceedings of the Sixth SIAM International Conference on Data Mining, pp. 499–503. Society for Industrial and Applied Mathematics, Bethesda (2006)
Google Scholar
Sheikholeslami, G., Chatterjee, S., Zhang, A.: WaveCluster: A multi-resolution clustering approach for very large spatial databases. In: Proceedings of the Twenty-Fourth International Conference on Very Large Data Bases, pp. 428–439. ACM Press, New York (1998)
Google Scholar
Sousa, E., Traina, A., Traina, J.C., Faloutsos, C.: Evaluating the intrinsic dimension of evolving data streams. New Generation Computing 25 (2007)
Google Scholar
Spinosa, E., Gama, J., Carvalho, A.: Cluster-based novel concept detection in data streams applied to intrusion detection in computer networks. In: Proceedings of the 2008 ACM Symposium on Applied computing, pp. 976–980. ACM Press, New York (2008)
Chapter Google Scholar
Wald, A.: Sequential analysis. John Wiley and Sons, Chichester (1947)
MATH Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an eficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 103–114. ACM Press, New York (1996)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Economics, LIAAD, INESC-Porto, University of Porto, Rua de Ceuta, 118, 6, 4050-190, Porto, Portugal
João Gama
Faculty of Medicine, LIAAD, INESC-Porto, University of Porto, Rua de Ceuta, 118, 6, 4050-190, Porto, Portugal
Pedro Pereira Rodrigues

Authors

João Gama
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Pereira Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Machine Intelligence Research Labs, (MIR Labs), Scientific Network for Innovation and Research Excellence, Auburn, P.O. Box 2259, 98071-2259, Washington, USA
Ajith Abraham
College of Business Administration, Quantitative and Information System Department, Kuwait University, P.O. Box 5486, 13055, Safat, Kuwait
Aboul-Ella Hassanien
Department of Computer Science, University of São Paulo, Caixa Postal 668, 13560-970, Sao Carlos, SP, Brazil
André Ponce de Leon F. de Carvalho
Dept. Computer Science, Technical University Ostrava, Tr. 17. Listopadu 15, 708 33, Ostrava, Czech Republic
Václav Snášel

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gama, J., Rodrigues, P.P. (2009). An Overview on Mining Data Streams. In: Abraham, A., Hassanien, AE., de Leon F. de Carvalho, A.P., Snášel, V. (eds) Foundations of Computational, IntelligenceVolume 6. Studies in Computational Intelligence, vol 206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01091-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-01091-0_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01090-3
Online ISBN: 978-3-642-01091-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics