Skip to main content

Correcting the Usage of the Hoeffding Inequality in Stream Mining

  • Conference paper
Advances in Intelligent Data Analysis XII (IDA 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8207))

Included in the following conference series:

Abstract

Many stream classification algorithms use the Hoeffding Inequality to identify the best split attribute during tree induction.

We show that the prerequisites of the Inequality are violated by these algorithms, and we propose corrective steps. The new stream classification core, correctedVFDT, satisfies the prerequisites of the Hoeffding Inequality and thus provides the expected performance guarantees.

The goal of our work is not to improve accuracy, but to guarantee a reliable and interpretable error bound. Nonetheless, we show that our solution achieves lower error rates regarding split attributes and sooner split decisions while maintaining a similar level of accuracy.

Part of this work was funded by the German Research Foundation project SP 572/11-1 IMPRINT: Incremental Mining for Perennial Objects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Domingos, P., Hulten, G.: Mining High Speed Data Streams. In: ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2000)

    Google Scholar 

  2. Frank, A., Asuncion, A.: UCI Machine Learning Repository (2010)

    Google Scholar 

  3. Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: KDD 2003: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 523–528. ACM, New York (2003)

    Chapter  Google Scholar 

  4. Gama, J., Kosina, P.: Learning Decision Rules from Data Streams. In: Walsh, T. (ed.) IJCAI, pp. 1255–1260. IJCAI/AAAI (2011)

    Google Scholar 

  5. Hastie, T., Tibshirani, R., Friedman, J., Ebooks Corporation: The Elements of Statistical Learning, ch. 9.2.3, pp. 324–329. Springer, Dordrecht (2009)

    Book  Google Scholar 

  6. Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58, 13–30 (1963)

    Article  MathSciNet  MATH  Google Scholar 

  7. Hulten, G., Spencer, L., Domingos, P.: Mining Time-Changing Data Streams. In: ACM SIGKDD (2001)

    Google Scholar 

  8. Kosina, P., Gama, J.: Very Fast Decision Rules for multi-class problems. In: Ossowski, S., Lecca, P. (eds.) SAC, pp. 795–800. ACM (2012)

    Google Scholar 

  9. Pfahringer, B., Holmes, G., Kirkby, R.: New Options for Hoeffding Trees. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 90–99. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  10. Rutkowski, L., Pietruczuk, L., Duda, P., Jaworski, M.: Decision Trees for Mining Data Streams Based on the McDiarmid’s Bound. IEEE Trans. on Knowledge and Data Engineering (accepted in 2012)

    Google Scholar 

  11. Xu, W., Qin, Z., Hu, H., Zhao, N.: Mining Uncertain Data Streams Using Clustering Feature Decision Trees. In: Tang, J., King, I., Chen, L., Wang, J. (eds.) ADMA 2011, Part II. LNCS, vol. 7121, pp. 195–208. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Matuszyk, P., Krempl, G., Spiliopoulou, M. (2013). Correcting the Usage of the Hoeffding Inequality in Stream Mining. In: Tucker, A., Höppner, F., Siebes, A., Swift, S. (eds) Advances in Intelligent Data Analysis XII. IDA 2013. Lecture Notes in Computer Science, vol 8207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41398-8_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41398-8_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41397-1

  • Online ISBN: 978-3-642-41398-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics