Analyzing the Impact of Feature Drifts in Streaming Learning

Barddal, Jean Paul; Gomes, Heitor Murilo; Enembreck, Fabrício

doi:10.1007/978-3-319-26532-2_3

Jean Paul Barddal¹⁷,
Heitor Murilo Gomes¹⁷ &
Fabrício Enembreck¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9489))

Included in the following conference series:

International Conference on Neural Information Processing

2195 Accesses
8 Citations

Abstract

Learning from data streams requires efficient algorithms capable of deriving a model accordingly to the arrival of new instances. Data streams are by definition unbounded sequences of data that are possibly non stationary, i.e. they may undergo changes in data distribution, phenomenon named concept drift. Concept drifts force streaming learning algorithms to detect and adapt to such changes in order to present feasible accuracy throughout time. Nonetheless, most of works presented in the literature do not account for a specific kind of drifts: feature drifts. Feature drifts occur whenever the relevance of an arbitrary attribute changes through time, also impacting the concept to be learned. In this paper we (i) verify the occurrence of feature drift in a publicly available dataset, (ii) present a synthetic data stream generator capable of performing feature drifts and (iii) analyze the impact of this type of drift in stream learning algorithms, enlightening that there is room and the need for dynamic feature selection strategies for data streams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barddal, J.P., Gomes, H.M., Enembreck, F.: SFNclassifier: a scale-free social network method to handle concept drift. In: Proceedings of the 29th Annual ACM Symposium on Applied Computing (SAC), SAC 2014. ACM March 2014
Google Scholar
Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavaldà, R.: New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 139–148. ACM SIGKDD June 2009
Google Scholar
Bifet, A., Gavaldà, R.: Learning from time-changing data with adaptive windowing. In: SIAM International Conference on Data Mining (2007)
Google Scholar
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)
Google Scholar
Carvalho, V.R., Cohen, W.W.: Single-pass online learning: performance, voting schemes and online feature selection. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, pp. 548–553. ACM, New York (2006)
Google Scholar
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2000, pp. 71–80. ACM, New York (2000)
Google Scholar
Gama, J., Rodrigues, P.: Issues in evaluation of stream learning algorithms. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 329–338. ACM SIGKDD June 2009
Google Scholar
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 1–37 (2014)
Article MATH Google Scholar
Gama, J.: Knowledge Discovery from Data Streams. Chapman & Hall/CRC, Boca Raton (2010)
Book MATH Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: An update SIGKDD. Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
Article MathSciNet MATH Google Scholar
Katakis, I., Tsoumakas, G., Vlahavas, I.: Dynamic feature space and incremental feature selection for the classification of textual data streams. In: ECML/PKDD-2006 International Workshop on Knowledge Discovery from Data Streams, 2006, pp. 107–116. Springer Verlag, Berlin (2006)
Google Scholar
Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.M.: Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans. Knowl. Data Eng. 23(6), 859–874 (2011)
Article Google Scholar
Nguyen, H.-L., Woon, Y.-K., Ng, W.-K., Wan, L.: Heterogeneous ensemble for feature drifts in data streams. In: Tan, P.-N., Chawla, S., Ho, C.K., Bailey, J. (eds.) PAKDD 2012, Part II. LNCS, vol. 7302, pp. 1–12. Springer, Heidelberg (2012)
Chapter Google Scholar
Silva, J.A., Faria, E.R., Barros, R.C., Hruschka, E.R., de Carvalho, A., Gama, J.: Data stream clustering: a survey. ACM Comput. Surv. 46(1), 1–31 (2013)
Article MATH Google Scholar
Street, W.N., Kim, Y.: A streaming ensemble algorithm (sea) for large-classification. In: Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 377–382. ACM SIGKDD August 2001
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate Program in Informatics (PPGIa), Pontifícia Universidade Católica Do Paraná, R. Imaculada Conceição, 1155, Curitiba, Brazil
Jean Paul Barddal, Heitor Murilo Gomes & Fabrício Enembreck

Authors

Jean Paul Barddal
View author publications
You can also search for this author in PubMed Google Scholar
Heitor Murilo Gomes
View author publications
You can also search for this author in PubMed Google Scholar
Fabrício Enembreck
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jean Paul Barddal .

Editor information

Editors and Affiliations

University of Istanbul, Istanbul, Turkey
Sabri Arik
University at Qatar, Doha, Qatar
Tingwen Huang
Tunku Abdul Rahman University College, Kuala Lumpur, Malaysia
Weng Kin Lai
University of Science Technology, Wuhan, China
Qingshan Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barddal, J.P., Gomes, H.M., Enembreck, F. (2015). Analyzing the Impact of Feature Drifts in Streaming Learning. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9489. Springer, Cham. https://doi.org/10.1007/978-3-319-26532-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-26532-2_3
Published: 12 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26531-5
Online ISBN: 978-3-319-26532-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics