Abstract
This paper describes a similarity-based technique which produces a good estimate of part-of-speech tags and their morpho-syntactic relations of Chinese compound words before they are fed into a tagger. The technique relies on a set of features from Chinese morphemes as well as a set of collocation markers which provide hints on the syntactic categories of the compound words. The technique is trained with a compound words database with more than 53,500 disyllabic words. Experimental results show the tagger with the technique outperforms its counterpart.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chen, K.-J., Bai, M.-H.: Unknown word detection for Chinese by a corpus-based learning method. Computational Linguistics and Chinese Language Processing 3(1), 27–44 (1998)
Chen, K.-J., Chen, C.-J.: Automatic semantic classification for Chinese unknown compound nouns. In: COLING 2000, pp. 173–179 (2000)
Chinese Word Sketch (2006), http://wordsketch.ling.sinica.edu.tw/
Chung, Y.-S., Chen, K.-J.: Analysis of Chinese morphemes and its application to sense and part-of-speech prediction for Chinese compounds. In: Proceedings of the Joint Conference of 23rd International Conference on the Computer Processing of Oriental Languages (2010)
Ciaramita, M., Johnson, M.: Supersense tagging of unknown nouns in WordNet. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 168–175 (2003)
Curran, J.R., Moens, M.: Improvements in automatic thesaurus extraction. In: Proceedings of the ACL 2002 Workshop on Unsupervised Lexical Acquisition, Philadelphia, Pennsylvania, pp. 59–66 (2002)
Curran, J.R.: Supersense tagging of unknown nouns using semantic similarity. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, pp. 26–33 (2005)
Dagan, I., Lee, L., Pereira, F.: Similarity-based models of word co-occurrence probabilities. Machine Learning Journal 34(1-3), 43–69 (1999)
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Frege, G.: On sense and reference. The Philosophical Review 57, 207–230 (1948)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)
Gao, J., Li, M., Wu, A., Huang, C.-N.: Chinese word segmentation and named entity recognition: A pragmatic approach. Computational Linguistics 31(4), 531–574 (2006)
Geffet, M., Dagan, I.: The distributional inclusion hypotheses and lexical entailment. In: Proceedings of the 43rd Annual Meeting of the ACL, pp. 107–114 (2005)
Harris, Z.: Mathematical Structures of Language. Wiley, NY (1968)
Kwong, O.Y., Tsou, B.K.: Categorical fluidity in Chinese and its implications for part-of-speech tagging. In: Proceedings of the Conference on European Chapter of the Association for Computational Linguistics, pp. 115–118 (2003)
Lin, D.: An information-theoretic definition of similarity. In: Proceedings of 15th International Conference on Machine Learning, pp. 296–304 (1998)
Lin, D., Zhou, S., Qin, L., Zhou, M.: Identifying synonyms among distributionally similar words. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, pp. 1492–1493 (2003)
Liu, Y., Yu, S., Zhu, X.: Construction of the contemporary Chinese compound words database and its application. In: Zhang, P. (ed.) The Contemporary Educational Techniques and Teaching Chinese as a Foreign Language, pp. 273–278. Guangxi Normal University Press (2000)
Mei, J., Zhu, Y., Gao, Y., Ying, H.: Cilin《同 義 詞 詞 林》梅家駒等 商務印書館 (1984) (in Chinese)
Ng, H.T., Low, J.K.: Chinese part-of-speech tagging: One-at-a-time or all-at-once? Word-based or character-based? In: Proceedings of EMNLP, Barcelona, Spain (2004)
Packard, J.L.: The Morphology of Chinese: A Linguistic and Cognitive Approach. Cambridge University Press (2000)
Pereira, F., Tishby, N., Lee, L.: Distributional clustering of similar words. In: Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics, pp. 183–190 (1993)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)
Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problem of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1998)
Schütze, H.: Automatic word sense discrimination. Computational Linguistics 24(1), 97–124 (1992)
Tseng, H., Chen, K.-J.: Design of Chinese morphological analyzer. In: Proceedings of the First SIGHAN Workshops on Chinese Language Processing (2002)
Weeds, J., Weir, D.: Co-occurrence retrieval: A flexible framework for lexical distributional similarity. Computational Linguistics 31(4), 439–475 (2006)
Widdows, D.: Unsupervised methods for developing taxonomies by combining syntactic and statistical information. In: Proceedings of the 2003 Conference of the North American Chapter of the Association For Computational Linguistics on Human Language Technology, Morristown, NJ, pp. 197–204 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chan, S.W.K., Chong, M.M.C. (2013). Predicting Part-of-Speech Tags and Morpho-Syntactic Relations Using Similarity-Based Technique. In: Dediu, AH., Martín-Vide, C., Mitkov, R., Truthe, B. (eds) Statistical Language and Speech Processing. SLSP 2013. Lecture Notes in Computer Science(), vol 7978. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39593-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-39593-2_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39592-5
Online ISBN: 978-3-642-39593-2
eBook Packages: Computer ScienceComputer Science (R0)