Incorporating Word Clustering into Complex Noun Phrase Identification

Xue, Lihua; Zhang, Guiping; Zhou, Qiaoli; Ye, Na

doi:10.1007/978-3-319-25816-4_3

Lihua Xue¹⁹,
Guiping Zhang¹⁹,
Qiaoli Zhou¹⁹ &
…
Na Ye¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9427))

Included in the following conference series:

7102 Accesses

Abstract

Since the professional technical literature include amounts of complex noun phrases, identifying those phrases has an important practical value for such tasks as machine translation. Through analysis of those phrases in Chinese-English bilingual sentence pairs from the aircraft technical publications, we present an annotation specification based on the existing specification to label those phrases and a method for the complex noun phrase identification. In addition to the basic features including the word and the part-of-speech, we incorporate the word clustering features trained by Brown clustering model and Word Vector Class (WVC) model on a large unlabeled data into the machine learning model. Experimental results indicate that the combination of different word clustering features and basic features can leverage system performance, and improve the F-score by 1.83 % in contrast with the method only adding the basic features.

This work is supported by Humanities and Social Sciences Foundation for the Youth Scholars of Ministry of Education of China (№-14YJC740126) and National Natural Science Foundation of China (№-61402299).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Xu, H.: Application of commercial aircraft technical publication specifications. J. Aviat. Maint. Eng. 6, 91–93 (2012)
Google Scholar
Zhou, Q.: Annotation scheme for Chinese treebank. J. Chin. Inf. 18(4), 1–8 (2004)
Google Scholar
Koo, T., Carreras, X., Collins, M.: Simple semi-supervised dependency parsing. In: Proceedings of 46th Annual Meetings of the Association for Computational Linguistics (ACL), pp. 595–603 (2008)
Google Scholar
Candito, M., Crabbé, B.: Improving generative statistical parsing with semi-supervised word clustering. In: Proceedings of the 11th International Conference on Parsing Technologies. Association for Computational Linguistics, pp. 138–141 (2009)
Google Scholar
Liang, P.: Semi-supervised learning for natural language. Massachusetts Institute of Technology (2005)
Google Scholar
Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. J. Synth. Lect. Artif. Intell. Mach. Learn. 3(1), 1–130 (2009)
MATH Google Scholar
Brown, P.F., deSouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18, 467–497 (1992)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data, pp. 139–141 (2001)
Google Scholar
Sun, R., Liu, Q.: Chinese base noun phrase identification based on mutual information. J. Chin. Comput. Commun. 11, 71–72 (2012)
Google Scholar
Meng, W., Zhu, H., Xu, Y.: A study of automatic acquisition of Chinese compound noun phrases based on corpus. J. Leshan Teach. 12, 57–61 (2014)
Google Scholar
Guochen, L., Jianbing, D., et al.: Chinese base-chunk identification based on distributed character representation. J. Chin. Inf. 28(6), 18–25 (2014)
Google Scholar
Kaixu, Z., Changle, Z.: Unsupervised feature learning for Chinese lexicon based on auto-encoder. J. Chin. Inf. 27(5), 1–7 (2013)
Google Scholar
Munkhdalai, T., Li, M., Batsuren, K., et al.: Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations. J. Cheminf. 7, s9 (2015)
Article Google Scholar
Wu, Y.-C.: A top-down information theoretic word clustering algorithm for phrase recognition. J. Inf. Sci. 275, 213–225 (2014)
Article Google Scholar
Zhu, L., Chao, L.S., Wong, D.F., et al.: A noun-phrase chunking model based on SBCB ensemble learning algorithm. In: International Conference on Machine Learning and Cybernetics (ICMLC). IEEE, pp. 11–16 (2012)
Google Scholar
Konkol, M., Brychcín, T., Konopík, M.: Latent semantics in named entity recognition. J. Expert Syst. Appl. 42, 3470–3479 (2015)
Article Google Scholar
Yu, S., Huiming, D., Xuefeng, Z.: The basic processing of contemporary Chinese corpus at Peking university. J. Chin. Inf. Process. 16(5), 49–64 (2002)
Google Scholar
Wang, Z.: A contrastive study between English and Chinese of attributive-centered structure. Liaoning Normal University (2012)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)
Google Scholar
Lai, S., Liu, k., Xu, L., Zhao, J.: How to Generate a Good Word Embedding? arXiv preprint (2015). arXiv:1507.05523
Qian, Y., Suen, C.Y.: Clustering combination method. In: 15th International Conference on IEEE, vol. 2, pp. 732–735 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Knowledge Engineering Research Center, Shenyang Aerospace University, Shenyang, 110136, China
Lihua Xue, Guiping Zhang, Qiaoli Zhou & Na Ye

Authors

Lihua Xue
View author publications
You can also search for this author in PubMed Google Scholar
Guiping Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qiaoli Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Na Ye
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lihua Xue .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Maosong Sun
Tsinghua University, Beijing, China
Zhiyuan Liu
Soochow University, Suzhou, Jiangsu, China
Min Zhang
Tsinghua University, Beijing, China
Yang Liu

A Appendix

The Table 6.

Table 6. Feature selection

Full size table

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xue, L., Zhang, G., Zhou, Q., Ye, N. (2015). Incorporating Word Clustering into Complex Noun Phrase Identification. In: Sun, M., Liu, Z., Zhang, M., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. CCL NLP-NABD 2015 2015. Lecture Notes in Computer Science(), vol 9427. Springer, Cham. https://doi.org/10.1007/978-3-319-25816-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-25816-4_3
Published: 08 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25815-7
Online ISBN: 978-3-319-25816-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Incorporating Word Clustering into Complex Noun Phrase Identification

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation