Skip to main content

EGAL: Exploration Guided Active Learning for TCBR

  • Conference paper
Case-Based Reasoning. Research and Development (ICCBR 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6176))

Included in the following conference series:

Abstract

The task of building labelled case bases can be approached using active learning (AL), a process which facilitates the labelling of large collections of examples with minimal manual labelling effort. The main challenge in designing AL systems is the development of a selection strategy to choose the most informative examples to manually label. Typical selection strategies use exploitation techniques which attempt to refine uncertain areas of the decision space based on the output of a classifier. Other approaches tend to balance exploitation with exploration, selecting examples from dense and interesting regions of the domain space. In this paper we present a simple but effective exploration-only selection strategy for AL in the textual domain. Our approach is inherently case-based, using only nearest-neighbour-based density and diversity measures. We show how its performance is comparable to the more computationally expensive exploitation-based approaches and that it offers the opportunity to be classifier independent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baldridge, J., Osborne, M.: Active learning and the total cost of annotation. In: Proc. of EMNLP 2004, pp. 9–16 (2004)

    Google Scholar 

  2. Baram, Y., El-Yaniv, R., Luz, K.: Online choice of active learning algorithms. Journal of Machine Learning Research 5, 255–291 (2004)

    MathSciNet  Google Scholar 

  3. Brinker, K.: Incorporating diversity in active learning with support vector machines. In: Proc. of ICML 2003, pp. 59–66 (2003)

    Google Scholar 

  4. Cebron, N., Berthold, M.R.: Active learning for object classification: from exploration to exploitation. Data Mining and Knowledge Discovery 18(2), 283–299 (2009)

    Article  Google Scholar 

  5. Dagli, C.K., Rajaram, S., Huang, T.S.: Combining diversity-based active learning with discriminant analysis in image retrieval. In: Proc. of ICITA 2005, pp. 173–178 (2005)

    Google Scholar 

  6. Delany, S.J., Cunningham, P., Tsymbal, A., Coyle, L.: A case-based technique for tracking concept drift in spam filtering. Knowledge-Based Systems 18(4-5), 187–195 (2005)

    Article  Google Scholar 

  7. Fujii, A., Tokunaga, T., Inui, K., Tanaka, H.: Selective sampling for example-based word sense disambiguation. Computational Linguistics 24(4), 573–597 (1998)

    Google Scholar 

  8. Hasenjäger, M., Ritter, H.: Active learning with local models. Neural Processing Letters 7(2), 107–117 (1998)

    Article  Google Scholar 

  9. He, J., Carbonell, J.G.: Nearest-neighbor-based active learning for rare category detection. In: Proc. of NIPS 2007 (2007)

    Google Scholar 

  10. Hu, R., Mac Namee, B., Delany, S.J.: Sweetening the dataset: Using active learning to label unlabelled datasets. In: Proc. of AICS 2008, pp. 53–62 (2008)

    Google Scholar 

  11. Hu, R., Mac Namee, B., Delany, S.J.: Off to a good start: Using clustering to select the initial training set in active learning. In: Proc. of FLAIRS 2010 (to appear, 2010)

    Google Scholar 

  12. Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Proc. of SIGIR 1994, pp. 3–12 (1994)

    Google Scholar 

  13. Li, Y., Guo, L.: An active learning based TCM-KNN algorithm for supervised network intrusion detection. Computers and Security 26, 459–467 (2007)

    Google Scholar 

  14. Lindenbaum, M., Markovitch, S., Rusakov, D.: Selective sampling for nearest neighbor classifiers. Machine Learning 54(2), 125–152 (2004)

    Article  MATH  Google Scholar 

  15. McCallum, A., Nigam, K.: Employing EM and pool-based active learning for text classification. In: Proc. of ICML 1998, pp. 350–358 (1998)

    Google Scholar 

  16. Mustafaraj, E., Hoof, M., Freisleben, B.: Learning semantic annotations for textual cases. In: Muñoz-Ávila, H., Ricci, F. (eds.) ICCBR 2005. LNCS (LNAI), vol. 3620, pp. 99–109. Springer, Heidelberg (2005)

    Google Scholar 

  17. Nguyen, H.T., Smeulders, A.: Active learning using pre-clustering. In: Proc. of ICML 2004, pp. 623–630 (2004)

    Google Scholar 

  18. Ontañón, S., Plaza, E.: Collaborative case retention strategies for CBR agents. In: Ashley, K.D., Bridge, D.G. (eds.) ICCBR 2003. LNCS (LNAI), vol. 2689, pp. 392–406. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  19. Osugi, T., Kun, D., Scott, S.: Balancing exploration and exploitation: A new algorithm for active machine learning. In: Proc. of ICDM 2005, pp. 330–337 (2005)

    Google Scholar 

  20. Roy, N., McCallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: Proc. of ICML 2001, pp. 441–448 (2001)

    Google Scholar 

  21. Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: Proc. of EMNLP 2008, pp. 1069–1078 (2008)

    Google Scholar 

  22. Shen, D., Zhang, J., Su, J., Zhou, G., Tan, C.L.: Multi-criteria-based active learning for named entity recognition. In: Proc. of ACL 2004, p. 589 (2004)

    Google Scholar 

  23. Shen, X., Zhai, C.: Active feedback in ad hoc information retrieval. In: Proc. of SIGIR 2005, pp. 59–66. ACM, New York (2005)

    Chapter  Google Scholar 

  24. Tang, M., Luo, X., Roukos, S.: Active learning for statistical natural language parsing. In: Proc. of ACL 2002, pp. 120–127 (2002)

    Google Scholar 

  25. Tomanek, K., Wermter, J., Hahn, U.: An approach to text corpus construction which cuts annotation costs and maintains reusability of annotated data. In: Proc. of EMNLP 2007, pp. 486–495 (2007)

    Google Scholar 

  26. Wiratunga, N., Craw, S., Massie, S.: Index driven selective sampling for CBR. In: Ashley, K.D., Bridge, D.G. (eds.) ICCBR 2003. LNCS, vol. 2689, pp. 637–651. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  27. Xu, Z., Yu, K., Tresp, V., Xu, X., Wang, J.: Representative sampling for text classification using support vector machines. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 393–407. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  28. Xu, Z., Akella, R.: Active relevance feedback for difficult queries. In: Proc. of CIKM 2008, pp. 459–468 (2008)

    Google Scholar 

  29. Xu, Z., Akella, R., Zhang, Y.: Incorporating diversity and density in active learning for relevance feedback. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, pp. 246–257. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  30. Zhang, Q., Hu, R., Namee, B.M., Delany, S.J.: Back to the future: Knowledge light case base cookery. In: Workshop Proc. of 9th ECCBR, pp. 239–248 (2008)

    Google Scholar 

  31. Zhu, J., Wang, H., Tsou, B.: Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. In: Proc. of COLING 2008, pp. 1137–1144 (2008)

    Google Scholar 

  32. Zhu, J., Wang, H., Tsou, B.K.: A density-based re-ranking technique for active learning for data annotations. In: Li, W., Mollá-Aliod, D. (eds.) ICCPOL 2009. LNCS, vol. 5459, pp. 1–10. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hu, R., Jane Delany, S., Mac Namee, B. (2010). EGAL: Exploration Guided Active Learning for TCBR. In: Bichindaritz, I., Montani, S. (eds) Case-Based Reasoning. Research and Development. ICCBR 2010. Lecture Notes in Computer Science(), vol 6176. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14274-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14274-1_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14273-4

  • Online ISBN: 978-3-642-14274-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics