Abstract
In this paper, an active learning method of domain adaptation issues for word sense disambiguation is presented. In general, active learning is an approach where data with high learning effect is selected from an unlabeled data set, then labeled manually, and added to the training data. However, data in the source domain can deteriorate classification precision (misleading data), which extends errors to the domain adaptation. When data labeled by active learning is added to training data, an attempt is made to detect misleading data in the source domain and delete it from the training data. In this way, compared to standard learning classification precision is improved.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The word “(Hairu)” has three senses in a dictionary. However, it has four senses in OC and PB domain. The fourth sense is new. In Japanese WSD SemEval-2 task, tagging the new sense was attempted.
- 2.
References
Chapelle, O., Schölkopf, B., Zien, A., et al.: Semi-supervised learning, vol. 2. MIT press, Cambridge (2006)
Daumé, III, H.: Frustratingly easy domain adaptation. In: ACL-2007, pp. 256–263 (2007)
Jiang, J., Zhai, C.: Instance weighting for domain adaptation in NLP. In: ACL-2007, pp. 264–271 (2007)
Maekawa, K.: Design of a balanced corpus of contemporary written Japanese. In: Symposium on Large-Scale Knowledge Resources (LKR 2007), pp. 55–58 (2007)
Mori, S.: Domain adaptation in natural language processing (in japanese). Jpn. Soc. Artif. Intell. 27(4), 365–372 (2012)
Okumura, M., Shirai, K., Komiya, K., Yokono, H.: SemEval-2010 task: Japanese WSD. In: The 5th International Workshop on Semantic Evaluation, pp. 69–74 (2010)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Rai, P., Saha, A., Daumé III., H., Venkatasubramanian, S.: Domain adaptation meets active learning. In: NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing, pp. 27–32 (2010)
Schohn, G., Cohn, D.: Less is more: Active learning with support vector machines. In: ICML, pp. 839–846 (2000)
Settles, B.: Active Learning Literature Survey. University of Wisconsin, Madison (2010)
Søgaard, A.: Semi-Supervised Learning and Domain Adaptation in Natural Language Processing. Morgan & Claypool, Milton Keynes (2013)
Sugiyama, M., Kawanabe, M.: Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation. MIT Press, Cambridge (2011)
Yoshida, H., Shinnou, H.: Detection of misleading data by outlier detection methods (in japanese). In: The 5th Japanese Corpus Linguistics Workshop, pp. 49–56 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Singapore
About this paper
Cite this paper
Shinnou, H., Onodera, Y., Sasaki, M., Komiya, K. (2016). Active Learning to Remove Source Instances for Domain Adaptation for Word Sense Disambiguation. In: Hasida, K., Purwarianti, A. (eds) Computational Linguistics. PACLING 2015. Communications in Computer and Information Science, vol 593. Springer, Singapore. https://doi.org/10.1007/978-981-10-0515-2_7
Download citation
DOI: https://doi.org/10.1007/978-981-10-0515-2_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0514-5
Online ISBN: 978-981-10-0515-2
eBook Packages: Computer ScienceComputer Science (R0)