Active Learning to Remove Source Instances for Domain Adaptation for Word Sense Disambiguation

Shinnou, Hiroyuki; Onodera, Yoshiyuki; Sasaki, Minoru; Komiya, Kanako

doi:10.1007/978-981-10-0515-2_7

Active Learning to Remove Source Instances for Domain Adaptation for Word Sense Disambiguation

Hiroyuki Shinnou¹²,
Yoshiyuki Onodera¹²,
Minoru Sasaki¹² &
…
Kanako Komiya¹²

Conference paper
First Online: 20 February 2016

638 Accesses
2 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 593))

Abstract

In this paper, an active learning method of domain adaptation issues for word sense disambiguation is presented. In general, active learning is an approach where data with high learning effect is selected from an unlabeled data set, then labeled manually, and added to the training data. However, data in the source domain can deteriorate classification precision (misleading data), which extends errors to the domain adaptation. When data labeled by active learning is added to training data, an attempt is made to detect misleading data in the source domain and delete it from the training data. In this way, compared to standard learning classification precision is improved.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The word “(Hairu)” has three senses in a dictionary. However, it has four senses in OC and PB domain. The fourth sense is new. In Japanese WSD SemEval-2 task, tagging the new sense was attempted.
2.
http://www.csie.ntu.edu.tw/~cjlin/libsvm/.

References

Chapelle, O., Schölkopf, B., Zien, A., et al.: Semi-supervised learning, vol. 2. MIT press, Cambridge (2006)
Book Google Scholar
Daumé, III, H.: Frustratingly easy domain adaptation. In: ACL-2007, pp. 256–263 (2007)
Google Scholar
Jiang, J., Zhai, C.: Instance weighting for domain adaptation in NLP. In: ACL-2007, pp. 264–271 (2007)
Google Scholar
Maekawa, K.: Design of a balanced corpus of contemporary written Japanese. In: Symposium on Large-Scale Knowledge Resources (LKR 2007), pp. 55–58 (2007)
Google Scholar
Mori, S.: Domain adaptation in natural language processing (in japanese). Jpn. Soc. Artif. Intell. 27(4), 365–372 (2012)
Google Scholar
Okumura, M., Shirai, K., Komiya, K., Yokono, H.: SemEval-2010 task: Japanese WSD. In: The 5th International Workshop on Semantic Evaluation, pp. 69–74 (2010)
Google Scholar
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Article Google Scholar
Rai, P., Saha, A., Daumé III., H., Venkatasubramanian, S.: Domain adaptation meets active learning. In: NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing, pp. 27–32 (2010)
Google Scholar
Schohn, G., Cohn, D.: Less is more: Active learning with support vector machines. In: ICML, pp. 839–846 (2000)
Google Scholar
Settles, B.: Active Learning Literature Survey. University of Wisconsin, Madison (2010)
Google Scholar
Søgaard, A.: Semi-Supervised Learning and Domain Adaptation in Natural Language Processing. Morgan & Claypool, Milton Keynes (2013)
Google Scholar
Sugiyama, M., Kawanabe, M.: Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation. MIT Press, Cambridge (2011)
Google Scholar
Yoshida, H., Shinnou, H.: Detection of misleading data by outlier detection methods (in japanese). In: The 5th Japanese Corpus Linguistics Workshop, pp. 49–56 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Sciences, Ibaraki University, 4-12-1 Nakanarusawa, Hitachi, Ibaraki, Japan
Hiroyuki Shinnou, Yoshiyuki Onodera, Minoru Sasaki & Kanako Komiya

Authors

Hiroyuki Shinnou
View author publications
You can also search for this author in PubMed Google Scholar
Yoshiyuki Onodera
View author publications
You can also search for this author in PubMed Google Scholar
Minoru Sasaki
View author publications
You can also search for this author in PubMed Google Scholar
Kanako Komiya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hiroyuki Shinnou .

Editor information

Editors and Affiliations

Graduate School of Information Science, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
Kôiti Hasida
School of Electrical Eng and Informatics, Bandung Institute of Technology, Bandung, Indonesia
Ayu Purwarianti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shinnou, H., Onodera, Y., Sasaki, M., Komiya, K. (2016). Active Learning to Remove Source Instances for Domain Adaptation for Word Sense Disambiguation. In: Hasida, K., Purwarianti, A. (eds) Computational Linguistics. PACLING 2015. Communications in Computer and Information Science, vol 593. Springer, Singapore. https://doi.org/10.1007/978-981-10-0515-2_7

Download citation

DOI: https://doi.org/10.1007/978-981-10-0515-2_7
Published: 20 February 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0514-5
Online ISBN: 978-981-10-0515-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics