Skip to main content

Incorporating Word Clustering into Complex Noun Phrase Identification

  • Conference paper
  • First Online:
Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data (CCL 2015, NLP-NABD 2015)

Abstract

Since the professional technical literature include amounts of complex noun phrases, identifying those phrases has an important practical value for such tasks as machine translation. Through analysis of those phrases in Chinese-English bilingual sentence pairs from the aircraft technical publications, we present an annotation specification based on the existing specification to label those phrases and a method for the complex noun phrase identification. In addition to the basic features including the word and the part-of-speech, we incorporate the word clustering features trained by Brown clustering model and Word Vector Class (WVC) model on a large unlabeled data into the machine learning model. Experimental results indicate that the combination of different word clustering features and basic features can leverage system performance, and improve the F-score by 1.83 % in contrast with the method only adding the basic features.

This work is supported by Humanities and Social Sciences Foundation for the Youth Scholars of Ministry of Education of China (№-14YJC740126) and National Natural Science Foundation of China (№-61402299).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Xu, H.: Application of commercial aircraft technical publication specifications. J. Aviat. Maint. Eng. 6, 91–93 (2012)

    Google Scholar 

  2. Zhou, Q.: Annotation scheme for Chinese treebank. J. Chin. Inf. 18(4), 1–8 (2004)

    Google Scholar 

  3. Koo, T., Carreras, X., Collins, M.: Simple semi-supervised dependency parsing. In: Proceedings of 46th Annual Meetings of the Association for Computational Linguistics (ACL), pp. 595–603 (2008)

    Google Scholar 

  4. Candito, M., Crabbé, B.: Improving generative statistical parsing with semi-supervised word clustering. In: Proceedings of the 11th International Conference on Parsing Technologies. Association for Computational Linguistics, pp. 138–141 (2009)

    Google Scholar 

  5. Liang, P.: Semi-supervised learning for natural language. Massachusetts Institute of Technology (2005)

    Google Scholar 

  6. Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. J. Synth. Lect. Artif. Intell. Mach. Learn. 3(1), 1–130 (2009)

    MATH  Google Scholar 

  7. Brown, P.F., deSouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18, 467–497 (1992)

    Google Scholar 

  8. Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data, pp. 139–141 (2001)

    Google Scholar 

  9. Sun, R., Liu, Q.: Chinese base noun phrase identification based on mutual information. J. Chin. Comput. Commun. 11, 71–72 (2012)

    Google Scholar 

  10. Meng, W., Zhu, H., Xu, Y.: A study of automatic acquisition of Chinese compound noun phrases based on corpus. J. Leshan Teach. 12, 57–61 (2014)

    Google Scholar 

  11. Guochen, L., Jianbing, D., et al.: Chinese base-chunk identification based on distributed character representation. J. Chin. Inf. 28(6), 18–25 (2014)

    Google Scholar 

  12. Kaixu, Z., Changle, Z.: Unsupervised feature learning for Chinese lexicon based on auto-encoder. J. Chin. Inf. 27(5), 1–7 (2013)

    Google Scholar 

  13. Munkhdalai, T., Li, M., Batsuren, K., et al.: Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations. J. Cheminf. 7, s9 (2015)

    Article  Google Scholar 

  14. Wu, Y.-C.: A top-down information theoretic word clustering algorithm for phrase recognition. J. Inf. Sci. 275, 213–225 (2014)

    Article  Google Scholar 

  15. Zhu, L., Chao, L.S., Wong, D.F., et al.: A noun-phrase chunking model based on SBCB ensemble learning algorithm. In: International Conference on Machine Learning and Cybernetics (ICMLC). IEEE, pp. 11–16 (2012)

    Google Scholar 

  16. Konkol, M., Brychcín, T., Konopík, M.: Latent semantics in named entity recognition. J. Expert Syst. Appl. 42, 3470–3479 (2015)

    Article  Google Scholar 

  17. Yu, S., Huiming, D., Xuefeng, Z.: The basic processing of contemporary Chinese corpus at Peking university. J. Chin. Inf. Process. 16(5), 49–64 (2002)

    Google Scholar 

  18. Wang, Z.: A contrastive study between English and Chinese of attributive-centered structure. Liaoning Normal University (2012)

    Google Scholar 

  19. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)

    Google Scholar 

  20. Lai, S., Liu, k., Xu, L., Zhao, J.: How to Generate a Good Word Embedding? arXiv preprint (2015). arXiv:1507.05523

  21. Qian, Y., Suen, C.Y.: Clustering combination method. In: 15th International Conference on IEEE, vol. 2, pp. 732–735 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lihua Xue .

Editor information

Editors and Affiliations

A Appendix

A Appendix

The Table 6.

Table 6. Feature selection

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Xue, L., Zhang, G., Zhou, Q., Ye, N. (2015). Incorporating Word Clustering into Complex Noun Phrase Identification. In: Sun, M., Liu, Z., Zhang, M., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. CCL NLP-NABD 2015 2015. Lecture Notes in Computer Science(), vol 9427. Springer, Cham. https://doi.org/10.1007/978-3-319-25816-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25816-4_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25815-7

  • Online ISBN: 978-3-319-25816-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics