Classifying Streaming Data

  • Max Bramer
Part of the Undergraduate Topics in Computer Science book series (UTICS)


This chapter is concerned with the classification of streaming data, i.e. data which arrives (generally in large quantities) from some automatic process over a period of days, months, years or potentially forever.

Generating a classification tree for streaming data requires a different approach from the TDIDT algorithm described earlier in this book. The algorithm given here, H-Tree, is a variant of the popular VFDT algorithm which generates a type of decision tree called a Hoeffding Tree. The algorithm is described and explained in detailed with accompanying pseudocode for the benefit of readers who may be interested in developing their own implementations. An example is given to illustrate a way of comparing the rules generated by H-Tree with those from TDIDT.


Root Node Leaf Node Internal Node Information Gain Classification Tree 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. [1]
    Domingos, P., & Hulten, G. (2000). Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 71–80). New York: ACM. CrossRefGoogle Scholar
  2. [2]
    Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58 (301), 13–30. MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag London Ltd. 2016

Authors and Affiliations

  • Max Bramer
    • 1
  1. 1.School of ComputingUniversity of PortsmouthPortsmouthUK

Personalised recommendations