EGAL: Exploration Guided Active Learning for TCBR

Hu, Rong; Jane Delany, Sarah; Mac Namee, Brian

doi:10.1007/978-3-642-14274-1_13

Rong Hu²¹,
Sarah Jane Delany²¹ &
Brian Mac Namee²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6176))

Included in the following conference series:

International Conference on Case-Based Reasoning

920 Accesses
13 Citations

Abstract

The task of building labelled case bases can be approached using active learning (AL), a process which facilitates the labelling of large collections of examples with minimal manual labelling effort. The main challenge in designing AL systems is the development of a selection strategy to choose the most informative examples to manually label. Typical selection strategies use exploitation techniques which attempt to refine uncertain areas of the decision space based on the output of a classifier. Other approaches tend to balance exploitation with exploration, selecting examples from dense and interesting regions of the domain space. In this paper we present a simple but effective exploration-only selection strategy for AL in the textual domain. Our approach is inherently case-based, using only nearest-neighbour-based density and diversity measures. We show how its performance is comparable to the more computationally expensive exploitation-based approaches and that it offers the opportunity to be classifier independent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baldridge, J., Osborne, M.: Active learning and the total cost of annotation. In: Proc. of EMNLP 2004, pp. 9–16 (2004)
Google Scholar
Baram, Y., El-Yaniv, R., Luz, K.: Online choice of active learning algorithms. Journal of Machine Learning Research 5, 255–291 (2004)
MathSciNet Google Scholar
Brinker, K.: Incorporating diversity in active learning with support vector machines. In: Proc. of ICML 2003, pp. 59–66 (2003)
Google Scholar
Cebron, N., Berthold, M.R.: Active learning for object classification: from exploration to exploitation. Data Mining and Knowledge Discovery 18(2), 283–299 (2009)
Article Google Scholar
Dagli, C.K., Rajaram, S., Huang, T.S.: Combining diversity-based active learning with discriminant analysis in image retrieval. In: Proc. of ICITA 2005, pp. 173–178 (2005)
Google Scholar
Delany, S.J., Cunningham, P., Tsymbal, A., Coyle, L.: A case-based technique for tracking concept drift in spam filtering. Knowledge-Based Systems 18(4-5), 187–195 (2005)
Article Google Scholar
Fujii, A., Tokunaga, T., Inui, K., Tanaka, H.: Selective sampling for example-based word sense disambiguation. Computational Linguistics 24(4), 573–597 (1998)
Google Scholar
Hasenjäger, M., Ritter, H.: Active learning with local models. Neural Processing Letters 7(2), 107–117 (1998)
Article Google Scholar
He, J., Carbonell, J.G.: Nearest-neighbor-based active learning for rare category detection. In: Proc. of NIPS 2007 (2007)
Google Scholar
Hu, R., Mac Namee, B., Delany, S.J.: Sweetening the dataset: Using active learning to label unlabelled datasets. In: Proc. of AICS 2008, pp. 53–62 (2008)
Google Scholar
Hu, R., Mac Namee, B., Delany, S.J.: Off to a good start: Using clustering to select the initial training set in active learning. In: Proc. of FLAIRS 2010 (to appear, 2010)
Google Scholar
Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Proc. of SIGIR 1994, pp. 3–12 (1994)
Google Scholar
Li, Y., Guo, L.: An active learning based TCM-KNN algorithm for supervised network intrusion detection. Computers and Security 26, 459–467 (2007)
Google Scholar
Lindenbaum, M., Markovitch, S., Rusakov, D.: Selective sampling for nearest neighbor classifiers. Machine Learning 54(2), 125–152 (2004)
Article MATH Google Scholar
McCallum, A., Nigam, K.: Employing EM and pool-based active learning for text classification. In: Proc. of ICML 1998, pp. 350–358 (1998)
Google Scholar
Mustafaraj, E., Hoof, M., Freisleben, B.: Learning semantic annotations for textual cases. In: Muñoz-Ávila, H., Ricci, F. (eds.) ICCBR 2005. LNCS (LNAI), vol. 3620, pp. 99–109. Springer, Heidelberg (2005)
Google Scholar
Nguyen, H.T., Smeulders, A.: Active learning using pre-clustering. In: Proc. of ICML 2004, pp. 623–630 (2004)
Google Scholar
Ontañón, S., Plaza, E.: Collaborative case retention strategies for CBR agents. In: Ashley, K.D., Bridge, D.G. (eds.) ICCBR 2003. LNCS (LNAI), vol. 2689, pp. 392–406. Springer, Heidelberg (2003)
Chapter Google Scholar
Osugi, T., Kun, D., Scott, S.: Balancing exploration and exploitation: A new algorithm for active machine learning. In: Proc. of ICDM 2005, pp. 330–337 (2005)
Google Scholar
Roy, N., McCallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: Proc. of ICML 2001, pp. 441–448 (2001)
Google Scholar
Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: Proc. of EMNLP 2008, pp. 1069–1078 (2008)
Google Scholar
Shen, D., Zhang, J., Su, J., Zhou, G., Tan, C.L.: Multi-criteria-based active learning for named entity recognition. In: Proc. of ACL 2004, p. 589 (2004)
Google Scholar
Shen, X., Zhai, C.: Active feedback in ad hoc information retrieval. In: Proc. of SIGIR 2005, pp. 59–66. ACM, New York (2005)
Chapter Google Scholar
Tang, M., Luo, X., Roukos, S.: Active learning for statistical natural language parsing. In: Proc. of ACL 2002, pp. 120–127 (2002)
Google Scholar
Tomanek, K., Wermter, J., Hahn, U.: An approach to text corpus construction which cuts annotation costs and maintains reusability of annotated data. In: Proc. of EMNLP 2007, pp. 486–495 (2007)
Google Scholar
Wiratunga, N., Craw, S., Massie, S.: Index driven selective sampling for CBR. In: Ashley, K.D., Bridge, D.G. (eds.) ICCBR 2003. LNCS, vol. 2689, pp. 637–651. Springer, Heidelberg (2003)
Chapter Google Scholar
Xu, Z., Yu, K., Tresp, V., Xu, X., Wang, J.: Representative sampling for text classification using support vector machines. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 393–407. Springer, Heidelberg (2003)
Chapter Google Scholar
Xu, Z., Akella, R.: Active relevance feedback for difficult queries. In: Proc. of CIKM 2008, pp. 459–468 (2008)
Google Scholar
Xu, Z., Akella, R., Zhang, Y.: Incorporating diversity and density in active learning for relevance feedback. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, pp. 246–257. Springer, Heidelberg (2007)
Chapter Google Scholar
Zhang, Q., Hu, R., Namee, B.M., Delany, S.J.: Back to the future: Knowledge light case base cookery. In: Workshop Proc. of 9th ECCBR, pp. 239–248 (2008)
Google Scholar
Zhu, J., Wang, H., Tsou, B.: Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. In: Proc. of COLING 2008, pp. 1137–1144 (2008)
Google Scholar
Zhu, J., Wang, H., Tsou, B.K.: A density-based re-ranking technique for active learning for data annotations. In: Li, W., Mollá-Aliod, D. (eds.) ICCPOL 2009. LNCS, vol. 5459, pp. 1–10. Springer, Heidelberg (2009)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Dublin Institute of Technology, Dublin, Ireland
Rong Hu, Sarah Jane Delany & Brian Mac Namee

Authors

Rong Hu
View author publications
You can also search for this author in PubMed Google Scholar
Sarah Jane Delany
View author publications
You can also search for this author in PubMed Google Scholar
Brian Mac Namee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Technology, University of Washington, Tacoma, 1900 Commerce Street, Box 358426, 98402, Tacoma, WA, USA
Isabelle Bichindaritz
Dipartimento di Informatica, Università del Piemonte Orientale, P.O. Box, Alessandria, Italy
Stefania Montani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, R., Jane Delany, S., Mac Namee, B. (2010). EGAL: Exploration Guided Active Learning for TCBR. In: Bichindaritz, I., Montani, S. (eds) Case-Based Reasoning. Research and Development. ICCBR 2010. Lecture Notes in Computer Science(), vol 6176. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14274-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-14274-1_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14273-4
Online ISBN: 978-3-642-14274-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics