Skip to main content

Instance-Based Classification and Regression on Data Streams

  • Chapter
  • First Online:
Book cover Learning in Non-Stationary Environments

Abstract

In order to be useful and effectively applicable in dynamically evolving environments, machine learning methods have to meet several requirements, including the ability to analyze incoming data in an online, incremental manner, to observe tight time and memory constraints, and to appropriately respond to changes of the data characteristics and underlying distributions. This paper advocates an instance-based learning algorithm for that purpose, both for classification and regression problems. This algorithm has a number of desirable properties that are not, at least not as a whole, shared by currently existing alternatives. Notably, our method is very flexible and thus able to adapt to an evolving environment quickly, a point of utmost importance in the data stream context. At the same time, the algorithm is relatively robust and thus applicable to streams with different characteristics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Of course, this maxim disregards other criteria, such as the complexity of the method.

  2. 2.

    This choice of k test aims at including in the test environment the similarity environments of all examples in the similarity environment of x 0; of course, it does not guarantee to do so.

  3. 3.

    Note that, if this error, p, is estimated from the last k instances, the variance of this estimation is \(\approx p(1 - p)/k\). Moreover, the estimate is unbiased, provided that the error remained constant during the last k time steps. The value k = 20 provides a good trade-off between bias and precision.

  4. 4.

    http://lib.stat.cmu.edu/.

  5. 5.

    To make the transformation more robust toward outliers, it makes sense to replace max and min by appropriate percentiles of the empirical distribution.

References

  1. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of VLDB 2003, the 29th International Conference on Very Large Data Bases. Berlin, Germany (2003)

    Google Scholar 

  2. Aha, D.W. (ed.): Lazy Learning. Kluwer Academic Publ., Dordrecht, Netherlands (1997)

    MATH  Google Scholar 

  3. Aha, D.W., Kibler, D.F., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6(1), 37–66 (1991)

    Google Scholar 

  4. Angelov, P.P., Filev, D.P., Kasabov, N.: Evolving Intelligent Systems. John Wiley and Sons, New York (2010)

    Book  Google Scholar 

  5. Angelov, P.P., Lughofer, E., Zhou, X.: Evolving fuzzy classifiers using different model architectures. Fuzzy Sets and Systems 159(23), 3160–3182 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  6. Beringer, J., Hüllermeier, E.: Efficient instance-based learning on data streams. Intelligent Data Analysis 11(6), 627–650 (2007)

    Google Scholar 

  7. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. Journal of Machine Learning Research 11, 1601–1604 (2010)

    Google Scholar 

  8. Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavaldà, R.: New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 139–148. Paris, France (2009)

    Google Scholar 

  9. Bifet, A., Kirkby, R.: Massive Online Analysis Manual (2009)

    Google Scholar 

  10. Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: Tracking most frequent items dynamically. In: ACM Symposium on Principles of Database Systems (PODS). San Diego, California (2003)

    Google Scholar 

  11. Dasarathy, B.V. (ed.): Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos, California (1991)

    Google Scholar 

  12. Dawid, A.P.: Statistical theory: The prequential approach. In: Journal of the Royal Statistical Society-A, pp. 147:278–292 (1984)

    Google Scholar 

  13. Domingos, P.: Rule induction and instance-based learning: A unified approach. In: C. Mellish (ed.) Proceedings of the 14th International Joint Conference on Artificial Intelligence, IJCAI 95, vol. 2, pp. 1226–1232. Morgan Kaufmann, Montral, Qubec, Canada (1995)

    Google Scholar 

  14. Domingos, P.: Unifying instance-based and rule-based induction. Machine Learning 24, 141–168 (1996)

    Google Scholar 

  15. Domingos, P., Hulten, G.: A general framework for mining massive data streams. Journal of Computational and Graphical Statistics 12 (2003)

    Google Scholar 

  16. Frank, A., Asuncion, A.: UCI machine learning repository (2010). URL http://archive.ics.uci.edu/ml

  17. Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: A review. ACM SIGMOD Record, ACM Special Interest Group on Management of Data 34(1) (2005)

    Google Scholar 

  18. Gama, J., Gaber, M.M.: Learning from Data Streams. Springer-Verlag, Berlin, New York (2007)

    Book  MATH  Google Scholar 

  19. Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Proceedings SBIA 2004, the 17th Brazilian Symposium on Artificial Intelligence, pp. 286–295. São Luis, Maranhão, Brazil (2004)

    Google Scholar 

  20. Gama, J., Medas, P., Rodrigues, P.: Learning decision trees from dynamic data streams. In: SAC ’05: Proceedings of the 2005 ACM symposium on Applied computing, pp. 573–577. ACM Press, New York, NY, USA (2005). DOI http://doi.acm.org/10. 1145/1066677.1066809

    Google Scholar 

  21. Gama, J., Sebastião, R., Rodrigues, P.P.: Issues in evaluation of stream learning algorithms. In: Proceedings of 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Paris, France (2009)

    Google Scholar 

  22. Hulten, G., Spencer, L., Domingos, P.: Mining timechanging data streams. In: Proceedings 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106. San Francisco, CA, USA (2001)

    Google Scholar 

  23. Kolodner, J.L.: Case-based Reasoning. Morgan Kaufmann, San Mateo (1993)

    Google Scholar 

  24. Lughofer, E.: FLEXFIS: A robust incremental learning approach for evolving takagi-sugeno fuzzy models. IEEE Transactions on Fuzzy Systems 16(6), 1393–1410 (2008)

    Article  Google Scholar 

  25. Lughofer, E.: Evolving Fuzzy Systems: Methodologies, Advanced Concepts and Applications. Springer-Verlag, Berlin, Heidelberg (2011)

    Book  MATH  Google Scholar 

  26. Oza, N.C., Russell, S.: Online bagging and boosting. Artificial Intelligence and Statistics pp. 105–112 (2001)

    Google Scholar 

  27. Salzberg, S.: A nearest hyperrectangle learning method. Machine Learning 6, 251–276 (1991)

    Google Scholar 

  28. Stanfill, C., Waltz, D.: Toward memory-based reasoning. Communications of the ACM 29, 1213–1228 (1986)

    Article  Google Scholar 

  29. Takagi, T., Sugeno, M.: Fuzzy identification of systems and its applications to modeling and control. IEEE Transactions on Systems, Man, and Cybernetics 15(1), 116–132 (1985)

    MATH  Google Scholar 

  30. Utgoff, P.E.: Incremental induction of decision trees. Machine Learning 4, 161–186 (1989)

    Article  Google Scholar 

  31. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2 edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  32. Widmer, G. and Kubat, M.: Learning in the Presence of Concept Drift and Hidden Contexts. Machine Learning 23, 69–101 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eyke Hüllermeier .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media New York

About this chapter

Cite this chapter

Shaker, A., Hüllermeier, E. (2012). Instance-Based Classification and Regression on Data Streams. In: Sayed-Mouchaweh, M., Lughofer, E. (eds) Learning in Non-Stationary Environments. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-8020-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-8020-5_8

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4419-8019-9

  • Online ISBN: 978-1-4419-8020-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics