Skip to main content

Contextual Bandit for Active Learning: Active Thompson Sampling

  • Conference paper
Neural Information Processing (ICONIP 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8834))

Included in the following conference series:

Abstract

The labelling of training examples is a costly task in a supervised classification. Active learning strategies answer this problem by selecting the most useful unlabelled examples to train a predictive model. The choice of examples to label can be seen as a dilemma between the exploration and the exploitation over the data space representation. In this paper, a novel active learning strategy manages this compromise by modelling the active learning problem as a contextual bandit problem. We propose a sequential algorithm named Active Thompson Sampling (ATS), which, in each round, assigns a sampling distribution on the pool, samples one point from this distribution, and queries the oracle for this sample point label. Experimental comparison to previously proposed active learning algorithms show superior performance on a real application dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, S., Goyal, N.: Thompson sampling for contextual bandits with linear payoffs. In: ICML (3), pp. 127–135 (2013)

    Google Scholar 

  2. Bouneffouf, D.: DRARS, A Dynamic Risk-Aware Recommender System. PhD thesis, Institut National des Télécommunications (2013)

    Google Scholar 

  3. Bouneffouf, D., Bouzeghoub, A., Gançarski, A.L.: A contextual-bandit algorithm for mobile context-aware recommender system. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds.) ICONIP 2012, Part III. LNCS, vol. 7665, pp. 324–331. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  4. Chapelle, O., Li, L.: An empirical evaluation of thompson sampling. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) NIPS, pp. 2249–2257 (2011)

    Google Scholar 

  5. Ganti, R., Gray, A.G.: Building bridges: Viewing active learning from the multi-armed bandit lens. CoRR, abs/1309.6830 (2013)

    Google Scholar 

  6. Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1994, pp. 3–12. Springer-Verlag New York, Inc., New York (1994)

    Google Scholar 

  7. Osugi, T., Kim, D., Scott, S.: Balancing exploration and exploitation: A new algorithm for active machine learning. In: Fifth IEEE International Conference on Data Mining, p. 8 (November 2005)

    Google Scholar 

  8. Settles, B.: Active Learning Literature Survey. Technical Report 1648, University of Wisconsin–Madison (2009)

    Google Scholar 

  9. Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT 1992, pp. 287–294. ACM, New York (1992)

    Chapter  Google Scholar 

  10. Zhang, T., Oles, F.J.: A probability analysis on the value of unlabeled data for classification problems. In: 17th International Conference on Machine Learning (2000)

    Google Scholar 

  11. Zhu, X., Lafferty, J., Ghahramani, Z.: Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In: ICML 2003 Workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, pp. 58–65 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Bouneffouf, D., Laroche, R., Urvoy, T., Feraud, R., Allesiardo, R. (2014). Contextual Bandit for Active Learning: Active Thompson Sampling. In: Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A., Huang, K. (eds) Neural Information Processing. ICONIP 2014. Lecture Notes in Computer Science, vol 8834. Springer, Cham. https://doi.org/10.1007/978-3-319-12637-1_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12637-1_51

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12636-4

  • Online ISBN: 978-3-319-12637-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics