Skip to main content

Predicting Part-of-Speech Tags and Morpho-Syntactic Relations Using Similarity-Based Technique

  • Conference paper
Statistical Language and Speech Processing (SLSP 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7978))

Included in the following conference series:

  • 2696 Accesses

Abstract

This paper describes a similarity-based technique which produces a good estimate of part-of-speech tags and their morpho-syntactic relations of Chinese compound words before they are fed into a tagger. The technique relies on a set of features from Chinese morphemes as well as a set of collocation markers which provide hints on the syntactic categories of the compound words. The technique is trained with a compound words database with more than 53,500 disyllabic words. Experimental results show the tagger with the technique outperforms its counterpart.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chen, K.-J., Bai, M.-H.: Unknown word detection for Chinese by a corpus-based learning method. Computational Linguistics and Chinese Language Processing 3(1), 27–44 (1998)

    MathSciNet  Google Scholar 

  2. Chen, K.-J., Chen, C.-J.: Automatic semantic classification for Chinese unknown compound nouns. In: COLING 2000, pp. 173–179 (2000)

    Google Scholar 

  3. Chinese Word Sketch (2006), http://wordsketch.ling.sinica.edu.tw/

  4. Chung, Y.-S., Chen, K.-J.: Analysis of Chinese morphemes and its application to sense and part-of-speech prediction for Chinese compounds. In: Proceedings of the Joint Conference of 23rd International Conference on the Computer Processing of Oriental Languages (2010)

    Google Scholar 

  5. Ciaramita, M., Johnson, M.: Supersense tagging of unknown nouns in WordNet. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 168–175 (2003)

    Google Scholar 

  6. Curran, J.R., Moens, M.: Improvements in automatic thesaurus extraction. In: Proceedings of the ACL 2002 Workshop on Unsupervised Lexical Acquisition, Philadelphia, Pennsylvania, pp. 59–66 (2002)

    Google Scholar 

  7. Curran, J.R.: Supersense tagging of unknown nouns using semantic similarity. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, pp. 26–33 (2005)

    Google Scholar 

  8. Dagan, I., Lee, L., Pereira, F.: Similarity-based models of word co-occurrence probabilities. Machine Learning Journal 34(1-3), 43–69 (1999)

    Article  MATH  Google Scholar 

  9. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  10. Frege, G.: On sense and reference. The Philosophical Review 57, 207–230 (1948)

    Article  Google Scholar 

  11. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  12. Gao, J., Li, M., Wu, A., Huang, C.-N.: Chinese word segmentation and named entity recognition: A pragmatic approach. Computational Linguistics 31(4), 531–574 (2006)

    Article  Google Scholar 

  13. Geffet, M., Dagan, I.: The distributional inclusion hypotheses and lexical entailment. In: Proceedings of the 43rd Annual Meeting of the ACL, pp. 107–114 (2005)

    Google Scholar 

  14. Harris, Z.: Mathematical Structures of Language. Wiley, NY (1968)

    MATH  Google Scholar 

  15. Kwong, O.Y., Tsou, B.K.: Categorical fluidity in Chinese and its implications for part-of-speech tagging. In: Proceedings of the Conference on European Chapter of the Association for Computational Linguistics, pp. 115–118 (2003)

    Google Scholar 

  16. Lin, D.: An information-theoretic definition of similarity. In: Proceedings of 15th International Conference on Machine Learning, pp. 296–304 (1998)

    Google Scholar 

  17. Lin, D., Zhou, S., Qin, L., Zhou, M.: Identifying synonyms among distributionally similar words. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, pp. 1492–1493 (2003)

    Google Scholar 

  18. Liu, Y., Yu, S., Zhu, X.: Construction of the contemporary Chinese compound words database and its application. In: Zhang, P. (ed.) The Contemporary Educational Techniques and Teaching Chinese as a Foreign Language, pp. 273–278. Guangxi Normal University Press (2000)

    Google Scholar 

  19. Mei, J., Zhu, Y., Gao, Y., Ying, H.: Cilin《同 義 詞 詞 林》梅家駒等 商務印書館 (1984) (in Chinese)

    Google Scholar 

  20. Ng, H.T., Low, J.K.: Chinese part-of-speech tagging: One-at-a-time or all-at-once? Word-based or character-based? In: Proceedings of EMNLP, Barcelona, Spain (2004)

    Google Scholar 

  21. Packard, J.L.: The Morphology of Chinese: A Linguistic and Cognitive Approach. Cambridge University Press (2000)

    Google Scholar 

  22. Pereira, F., Tishby, N., Lee, L.: Distributional clustering of similar words. In: Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics, pp. 183–190 (1993)

    Google Scholar 

  23. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)

    Google Scholar 

  24. Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problem of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1998)

    Google Scholar 

  25. Schütze, H.: Automatic word sense discrimination. Computational Linguistics 24(1), 97–124 (1992)

    Google Scholar 

  26. Tseng, H., Chen, K.-J.: Design of Chinese morphological analyzer. In: Proceedings of the First SIGHAN Workshops on Chinese Language Processing (2002)

    Google Scholar 

  27. Weeds, J., Weir, D.: Co-occurrence retrieval: A flexible framework for lexical distributional similarity. Computational Linguistics 31(4), 439–475 (2006)

    Article  Google Scholar 

  28. Widdows, D.: Unsupervised methods for developing taxonomies by combining syntactic and statistical information. In: Proceedings of the 2003 Conference of the North American Chapter of the Association For Computational Linguistics on Human Language Technology, Morristown, NJ, pp. 197–204 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chan, S.W.K., Chong, M.M.C. (2013). Predicting Part-of-Speech Tags and Morpho-Syntactic Relations Using Similarity-Based Technique. In: Dediu, AH., Martín-Vide, C., Mitkov, R., Truthe, B. (eds) Statistical Language and Speech Processing. SLSP 2013. Lecture Notes in Computer Science(), vol 7978. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39593-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39593-2_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39592-5

  • Online ISBN: 978-3-642-39593-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics