Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 206))

Summary

The most challenging applications of knowledge discovery involve dynamic environments where data continuous flow at high-speed and exhibit non-stationary properties. In this chapter we discuss the main challenges and issues when learning from data streams. In this work, we discuss the most relevant issues in knowledge discovery from data streams: incremental learning, cost-performance management, change detection, and novelty detection. We present illustrative algorithms for these learning tasks, and a real-world application illustrating the advantages of stream processing. The chapter ends with some open issues that emerge from this new research area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Aggarwal, C., Han, J., Wang, J., Yu, P.: A framework for clustering evolving data streams. In: Proceedings of Twenty-Ninth International Conference on Very Large Data Bases, pp. 81–92. Morgan Kaufmann, San Francisco (2003)

    Google Scholar 

  • Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the 21st Symposium on Principles of Database Systems, pp. 1–16. ACM Press, New York (2002)

    Google Scholar 

  • Barbará, D.: Requirements for clustering data streams. SIGKDD Explorations 3, 23–27 (2002)

    Article  Google Scholar 

  • Barbara, D., Chen, P.: Using the fractal dimension to cluster datasets. In: Proc. of the 6th International Conference on Knowledge Discovery and Data Mining, pp. 260–264. ACM Press, New York (2000)

    Google Scholar 

  • Basseville, M., Nikiforov, I.: Detection of abrupt changes: Theory and applications. Prentice-Hall Inc., Englewood Cliffs (1987)

    Google Scholar 

  • Bauer, D.F.: Constructing confidence sets using rank statistics. Journal of American Statistical Association, 687–690 (1972)

    Google Scholar 

  • Cauwenberghs, G., Poggio, T.: Incremental and decremental support vector machine learning. In: Proceedings of the 13th Neural Information Processing Systems (2000)

    Google Scholar 

  • Craven, M., Shavlik, J.: Using neural networks for data mining. Future Generation Computer Systems 13, 211–229 (1997)

    Article  Google Scholar 

  • Domingos, P., Hulten, G.: Mining High-Speed Data Streams. In: Proceedings of the ACM Sixth International Conference on Knowledge Discovery and Data Mining, pp. 71–80. ACM Press, New York (2000)

    Chapter  Google Scholar 

  • Domingos, P., Hulten, G.: A general method for scaling up machine learning algorithms and its application to clustering. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 106–113. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  • Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29, 103–129 (1997)

    Article  MATH  Google Scholar 

  • Ferrer-Troyano, F., Aguilar-Ruiz, J., Riquelme, J.: Incremental rule learning and border examples selection from numerical data streams. Journal of Universal Computer Science 11, 1426–1439 (2005)

    Google Scholar 

  • Gaber, M.M., Krishnaswamy, S., Zaslavsky, A.: Cost-efficient mining techniques for data streams. In: Proceedings of the second workshop on Australasian information security, pp. 109–114. Australian Computer Society, Inc. (2004)

    Google Scholar 

  • Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS, vol. 3171, pp. 286–295. Springer, Heidelberg (2004)

    Google Scholar 

  • Gama, J., Pinto, C.: Discretization from data streams: applications to histograms and data mining. In: SAC, pp. 662–667. ACM Press, New York (2006)

    Google Scholar 

  • Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 523–528. ACM Press, Washington (2003)

    Chapter  Google Scholar 

  • Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.: Mining frequent patterns in data streams at multiple time granularities. In: Next Generation Data Mining. AAAI/MIT (2003)

    Google Scholar 

  • Gibbons, P.B., Matias, Y.: Synopsis data structures for massive data sets. In: ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 909–910. Society for Industrial and Applied Mathematics (1999)

    Google Scholar 

  • Guha, S., Harb, B.: Wavelet synopsis for data streams: minimizing non-euclidean error. In: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pp. 88–97. ACM Press, New York (2005)

    Chapter  Google Scholar 

  • Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD 2000: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 1–12. ACM Press, New York (2000)

    Chapter  Google Scholar 

  • Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the 7th ACM SIGKDD International conference on Knowledge discovery and data mining, pp. 97–106. ACM Press, San Francisco (2001)

    Google Scholar 

  • Ikonomovska, E., Gama, J.: Learning model trees from data streams. In: Discovery Science, (no prelo). Springer, Heidelberg (2008)

    Google Scholar 

  • Jin, R., Agrawal, G.: Efficient decision tree construction on streaming data. In: Proceedings of the Ninth International Conference on Knowledge Discovery and Data Mining. ACM Press, New York (2003)

    Google Scholar 

  • Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: VLDB 2004: Proceedings of the 30th International Conference on Very Large Data Bases, pp. 180–191. Morgan Kaufmann Publishers Inc., San Francisco (2004)

    Google Scholar 

  • Klinkenberg, R.: Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis 8, 281–300 (2004)

    Google Scholar 

  • Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD 2003), pp. 2–11 (2003)

    Google Scholar 

  • Markou, M., Singh, S.: Novelty detection: a review-part 1: neural network based approaches (2003)

    Google Scholar 

  • O’Callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: Streaming-data algorithms for high-quality clustering. In: Proceedings of IEEE International Conference on Data Engineering. IEEE Press, Los Alamitos (2002)

    Google Scholar 

  • Rauschenbach, T.: Short-term load forecast using wavelet transformation. Proceeding (362) Artificial Intelligence and Applications (2002)

    Google Scholar 

  • Rodrigues, P., Gama, J.: A system for analysis and prediction of electricity-load streams. Intelligent Data Analysis 13 (to appear, 2009)

    Google Scholar 

  • Rodrigues, P., Gama, J., Pedroso, J.: Odac: Hierarchical clustering of time series data streams. In: Proceedings of the Sixth SIAM International Conference on Data Mining, pp. 499–503. Society for Industrial and Applied Mathematics, Bethesda (2006)

    Google Scholar 

  • Sheikholeslami, G., Chatterjee, S., Zhang, A.: WaveCluster: A multi-resolution clustering approach for very large spatial databases. In: Proceedings of the Twenty-Fourth International Conference on Very Large Data Bases, pp. 428–439. ACM Press, New York (1998)

    Google Scholar 

  • Sousa, E., Traina, A., Traina, J.C., Faloutsos, C.: Evaluating the intrinsic dimension of evolving data streams. New Generation Computing 25 (2007)

    Google Scholar 

  • Spinosa, E., Gama, J., Carvalho, A.: Cluster-based novel concept detection in data streams applied to intrusion detection in computer networks. In: Proceedings of the 2008 ACM Symposium on Applied computing, pp. 976–980. ACM Press, New York (2008)

    Chapter  Google Scholar 

  • Wald, A.: Sequential analysis. John Wiley and Sons, Chichester (1947)

    MATH  Google Scholar 

  • Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an eficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 103–114. ACM Press, New York (1996)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Gama, J., Rodrigues, P.P. (2009). An Overview on Mining Data Streams. In: Abraham, A., Hassanien, AE., de Leon F. de Carvalho, A.P., Snášel, V. (eds) Foundations of Computational, IntelligenceVolume 6. Studies in Computational Intelligence, vol 206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01091-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01091-0_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01090-3

  • Online ISBN: 978-3-642-01091-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics