On-Line Classification of Data Streams with Missing Values Based on Reinforcement Learning

Millán-Giraldo, Mónica; Traver, Vicente Javier; Sánchez, J. Salvador

doi:10.1007/978-3-642-21257-4_44

Mónica Millán-Giraldo¹⁹,
Vicente Javier Traver¹⁹ &
J. Salvador Sánchez¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6669))

Included in the following conference series:

Iberian Conference on Pattern Recognition and Image Analysis

3066 Accesses

Abstract

In some applications, data arrive sequentially and they are not available in batch form, what makes difficult the use of traditional classification systems. In addition, some attributes may lack due to some real-world conditions. For this problem, a number of decisions have to be made regarding how to proceed with the incomplete and unlabeled incoming objects, how to guess its missing attributes values, how to classify it, whether to include it in the training set, or when to ask for the class label to an expert. Unfortunately, no decision works well for all data sets. This data dependency motivates our formulation of the problem in terms of elements of reinforcement learning. The application of this learning paradigm for this problem is, to the best of our knowledge, novel. The empirical results are encouraging since the proposed framework behaves better and more generally than many strategies used isolatedly, and makes an efficient use of human effort (requests for the class label to an expert) and computer memory (the increase of size of the training set).

This work has been supported in part by the Spanish Ministry of Education and Science under grants CSD2007–00018 (Consolider Ingenio 2010) and TIN2009–14205, and by Bancaixa under grant P1–1B2009–04.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, Chichester (1987)
MATH Google Scholar
Ding, Y., Simonoff, J.S.: An investigation of missing data methods for classification trees applied to binary response data. J. of Machine Learning Res. 11, 131–170 (2010)
MathSciNet MATH Google Scholar
Farhangfar, A., Kurgan, L., Dy, J.: Impact of imputation of missing values on classification error for discrete data. Pattern Recognitions 41(12), 3692–3705 (2008)
Article MATH Google Scholar
Millán-Giraldo, M., Sánchez, J.S., Traver, V.J.: Exploring early classification strategies of streaming data with delayed attributes. In: Leung, C.S., Lee, M., Chan, J.H. (eds.) ICONIP 2009. LNCS, vol. 5863, pp. 875–883. Springer, Heidelberg (2009)
Chapter Google Scholar
Vogiatzis, D., Stafylopatis, A.: Reinforcement learning for rule extraction from a labeled dataset. Cognitive Systems Research 3(2), 237–253 (2002)
Article Google Scholar
Langford, J., Zadrozny, B.: Relating reinforcement learning performance to classification performance. In: Proc. of the Intl. Conference on Machine Learning, pp. 473–480 (2005)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning (Adaptive Computation and Machine Learning). The MIT Press, Cambridge (2006)
Google Scholar
Bruzzone, L., Roli, F., Serpico, S.B.: An extension of the Jeffreys Matusita distance to multiclass cases for feature selection. IEEE Transactions on Geoscience and Remote Sensing 33(6), 1318–1321 (1995)
Article Google Scholar
Nagy, G.: Classifiers that improve with use. In: In Proc. Conf. on Pattern Recognition and Multimedia, pp. 79–86 (2004)
Google Scholar
Frank, A., Asuncion, A.: UCI Machine Learning Repository
Google Scholar
Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)
Book MATH Google Scholar
Library: Real medical data sets, http://www.bangor.ac.uk/~mas00a/activities/real_data.htm

Download references

Author information

Authors and Affiliations

Institute of New Imaging Technologies, Universitat Jaume I, 12071, Castellón, Spain
Mónica Millán-Giraldo, Vicente Javier Traver & J. Salvador Sánchez

Authors

Mónica Millán-Giraldo
View author publications
You can also search for this author in PubMed Google Scholar
Vicente Javier Traver
View author publications
You can also search for this author in PubMed Google Scholar
J. Salvador Sánchez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departament de Matemàtica Aplicada i Anàlisi, Universitat de Barcelona, Facultat de Matemàtiques, Gran Via de les Corts Catalanes 585, 08007, Barcelona, Spain
Jordi Vitrià
Instituto de Sistemas e Robótica / Instituto Superior Técnico, Av. Rovisco Pais, 1, 1049-001, Lisbon, Portugal
João Miguel Sanches
Institute for Intelligent Systems and Numerical Applications in Engineering (SIANI), Edificio de Informática y Matemáticas, University of Las Palmas de Gran Canaria, Campus Universitario de Tafira, 35017, Las Palmas, Spain
Mario Hernández

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Millán-Giraldo, M., Traver, V.J., Sánchez, J.S. (2011). On-Line Classification of Data Streams with Missing Values Based on Reinforcement Learning. In: Vitrià, J., Sanches, J.M., Hernández, M. (eds) Pattern Recognition and Image Analysis. IbPRIA 2011. Lecture Notes in Computer Science, vol 6669. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21257-4_44

Download citation

DOI: https://doi.org/10.1007/978-3-642-21257-4_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21256-7
Online ISBN: 978-3-642-21257-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics