Advertisement

Analyzing the Impact of Feature Drifts in Streaming Learning

  • Jean Paul BarddalEmail author
  • Heitor Murilo Gomes
  • Fabrício Enembreck
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9489)

Abstract

Learning from data streams requires efficient algorithms capable of deriving a model accordingly to the arrival of new instances. Data streams are by definition unbounded sequences of data that are possibly non stationary, i.e. they may undergo changes in data distribution, phenomenon named concept drift. Concept drifts force streaming learning algorithms to detect and adapt to such changes in order to present feasible accuracy throughout time. Nonetheless, most of works presented in the literature do not account for a specific kind of drifts: feature drifts. Feature drifts occur whenever the relevance of an arbitrary attribute changes through time, also impacting the concept to be learned. In this paper we (i) verify the occurrence of feature drift in a publicly available dataset, (ii) present a synthetic data stream generator capable of performing feature drifts and (iii) analyze the impact of this type of drift in stream learning algorithms, enlightening that there is room and the need for dynamic feature selection strategies for data streams.

Keywords

Data Stream Feature Selection Algorithm Concept Drift Hoeffding Tree Feature Drift 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Barddal, J.P., Gomes, H.M., Enembreck, F.: SFNclassifier: a scale-free social network method to handle concept drift. In: Proceedings of the 29th Annual ACM Symposium on Applied Computing (SAC), SAC 2014. ACM March 2014Google Scholar
  2. 2.
    Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavaldà, R.: New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 139–148. ACM SIGKDD June 2009Google Scholar
  3. 3.
    Bifet, A., Gavaldà, R.: Learning from time-changing data with adaptive windowing. In: SIAM International Conference on Data Mining (2007)Google Scholar
  4. 4.
    Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)Google Scholar
  5. 5.
    Carvalho, V.R., Cohen, W.W.: Single-pass online learning: performance, voting schemes and online feature selection. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, pp. 548–553. ACM, New York (2006)Google Scholar
  6. 6.
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2000, pp. 71–80. ACM, New York (2000)Google Scholar
  7. 7.
    Gama, J., Rodrigues, P.: Issues in evaluation of stream learning algorithms. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 329–338. ACM SIGKDD June 2009Google Scholar
  8. 8.
    Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 1–37 (2014)CrossRefzbMATHGoogle Scholar
  9. 9.
    Gama, J.: Knowledge Discovery from Data Streams. Chapman & Hall/CRC, Boca Raton (2010)CrossRefzbMATHGoogle Scholar
  10. 10.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: An update SIGKDD. Explor. Newsl. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  11. 11.
    Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Katakis, I., Tsoumakas, G., Vlahavas, I.: Dynamic feature space and incremental feature selection for the classification of textual data streams. In: ECML/PKDD-2006 International Workshop on Knowledge Discovery from Data Streams, 2006, pp. 107–116. Springer Verlag, Berlin (2006)Google Scholar
  13. 13.
    Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.M.: Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans. Knowl. Data Eng. 23(6), 859–874 (2011)CrossRefGoogle Scholar
  14. 14.
    Nguyen, H.-L., Woon, Y.-K., Ng, W.-K., Wan, L.: Heterogeneous ensemble for feature drifts in data streams. In: Tan, P.-N., Chawla, S., Ho, C.K., Bailey, J. (eds.) PAKDD 2012, Part II. LNCS, vol. 7302, pp. 1–12. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  15. 15.
    Silva, J.A., Faria, E.R., Barros, R.C., Hruschka, E.R., de Carvalho, A., Gama, J.: Data stream clustering: a survey. ACM Comput. Surv. 46(1), 1–31 (2013)CrossRefzbMATHGoogle Scholar
  16. 16.
    Street, W.N., Kim, Y.: A streaming ensemble algorithm (sea) for large-classification. In: Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 377–382. ACM SIGKDD August 2001Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Jean Paul Barddal
    • 1
    Email author
  • Heitor Murilo Gomes
    • 1
  • Fabrício Enembreck
    • 1
  1. 1.Graduate Program in Informatics (PPGIa)Pontifícia Universidade Católica Do ParanáCuritibaBrazil

Personalised recommendations