Advertisement

Classifying Streaming Data II: Time-Dependent Data

  • Max Bramer
Chapter
  • 50 Downloads
Part of the Undergraduate Topics in Computer Science book series (UTICS)

Abstract

This chapter builds on the description in Chapter  21 of the H-Tree algorithm for classifying streaming data, i.e. data which arrives (generally in large quantities) from some automatic process over a period of days, months, years or potentially forever. Chapter  21 was concerned with stationary data generated from a fixed causal model; Chapter 22 is concerned with data that is time-dependent, where the underlying model can change from time to time, perhaps seasonally. This phenomenon is known as concept drift.

The algorithm given here, CDH-Tree, is a variant of the popular CVFDT algorithm which generates a type of decision tree called a Hoeffding Tree. The algorithm is described and explained in detail with accompanying pseudocode for the benefit of readers who may be interested in developing their own implementations. A detailed example using synthetic data is given to illustrate the way in which the classification tree evolves as more and more records are processed in the presence of concept drift.

References

  1. [1]
    Domingos, P., & Hulten, G. (2000). Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 71–80). New York: ACM. CrossRefGoogle Scholar
  2. [2]
    Hulten, G., Spencer, L., & Domingos, P. (2001). Mining time-changing data streams. In Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (pp. 97–106). New York: ACM. CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2020

Authors and Affiliations

  • Max Bramer
    • 1
  1. 1.School of ComputingUniversity of PortsmouthPortsmouthUK

Personalised recommendations