Contextual Bandit for Active Learning: Active Thompson Sampling

Bouneffouf, Djallel; Laroche, Romain; Urvoy, Tanguy; Feraud, Raphael; Allesiardo, Robin

doi:10.1007/978-3-319-12637-1_51

Djallel Bouneffouf²⁰,
Romain Laroche²⁰,
Tanguy Urvoy²⁰,
Raphael Feraud²⁰ &
…
Robin Allesiardo²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8834))

Included in the following conference series:

International Conference on Neural Information Processing

5061 Accesses
24 Citations

Abstract

The labelling of training examples is a costly task in a supervised classification. Active learning strategies answer this problem by selecting the most useful unlabelled examples to train a predictive model. The choice of examples to label can be seen as a dilemma between the exploration and the exploitation over the data space representation. In this paper, a novel active learning strategy manages this compromise by modelling the active learning problem as a contextual bandit problem. We propose a sequential algorithm named Active Thompson Sampling (ATS), which, in each round, assigns a sampling distribution on the pool, samples one point from this distribution, and queries the oracle for this sample point label. Experimental comparison to previously proposed active learning algorithms show superior performance on a real application dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, S., Goyal, N.: Thompson sampling for contextual bandits with linear payoffs. In: ICML (3), pp. 127–135 (2013)
Google Scholar
Bouneffouf, D.: DRARS, A Dynamic Risk-Aware Recommender System. PhD thesis, Institut National des Télécommunications (2013)
Google Scholar
Bouneffouf, D., Bouzeghoub, A., Gançarski, A.L.: A contextual-bandit algorithm for mobile context-aware recommender system. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds.) ICONIP 2012, Part III. LNCS, vol. 7665, pp. 324–331. Springer, Heidelberg (2012)
Chapter Google Scholar
Chapelle, O., Li, L.: An empirical evaluation of thompson sampling. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) NIPS, pp. 2249–2257 (2011)
Google Scholar
Ganti, R., Gray, A.G.: Building bridges: Viewing active learning from the multi-armed bandit lens. CoRR, abs/1309.6830 (2013)
Google Scholar
Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1994, pp. 3–12. Springer-Verlag New York, Inc., New York (1994)
Google Scholar
Osugi, T., Kim, D., Scott, S.: Balancing exploration and exploitation: A new algorithm for active machine learning. In: Fifth IEEE International Conference on Data Mining, p. 8 (November 2005)
Google Scholar
Settles, B.: Active Learning Literature Survey. Technical Report 1648, University of Wisconsin–Madison (2009)
Google Scholar
Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT 1992, pp. 287–294. ACM, New York (1992)
Chapter Google Scholar
Zhang, T., Oles, F.J.: A probability analysis on the value of unlabeled data for classification problems. In: 17th International Conference on Machine Learning (2000)
Google Scholar
Zhu, X., Lafferty, J., Ghahramani, Z.: Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In: ICML 2003 Workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, pp. 58–65 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Orange Labs, 2, Avenue Pierre Marzin, 22307, Lannion, France
Djallel Bouneffouf, Romain Laroche, Tanguy Urvoy, Raphael Feraud & Robin Allesiardo

Authors

Djallel Bouneffouf
View author publications
You can also search for this author in PubMed Google Scholar
Romain Laroche
View author publications
You can also search for this author in PubMed Google Scholar
Tanguy Urvoy
View author publications
You can also search for this author in PubMed Google Scholar
Raphael Feraud
View author publications
You can also search for this author in PubMed Google Scholar
Robin Allesiardo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Artificial Intelligence, Faculty of Computer Science and Information Technology Building, University of Malaya, 50603, Kuala Lumpur, Malaysia
Chu Kiong Loo
Department of Electronics and Communication Engineering,College of Engineering, Jalan IKRAM-UNITEN, Universiti Tenaga Nasional, 43009, Kajang, Selangor, Malaysia
Keem Siah Yap
School of Engineering and Information Technology, Murdoch University, South St, 6150, Murdoch, Western Australia, Australia
Kok Wai Wong
Department of Electrical and Electronics Engineering, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, 120-749, Seoul, South Korea
Andrew Teoh
Department of Electrical and Electronic Engineering, Xi’an Jiaotong-Liverpool University, Ren’ai Road 111, SIP 215123, Suzhou, Jiangsu Province, China
Kaizhu Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bouneffouf, D., Laroche, R., Urvoy, T., Feraud, R., Allesiardo, R. (2014). Contextual Bandit for Active Learning: Active Thompson Sampling. In: Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A., Huang, K. (eds) Neural Information Processing. ICONIP 2014. Lecture Notes in Computer Science, vol 8834. Springer, Cham. https://doi.org/10.1007/978-3-319-12637-1_51

Download citation

DOI: https://doi.org/10.1007/978-3-319-12637-1_51
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12636-4
Online ISBN: 978-3-319-12637-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics