Advertisement

A Combination System for Identifying Base Noun Phrase Correspondences

  • Hieu Chi NguyenEmail author
Conference paper
Part of the Studies in Computational Intelligence book series (SCI, volume 457)

Abstract

Bilingual Base Noun Phrase extraction is one of the key tasks of Natural Language Processing (NLP). This task is more challenges for the pair of English-Vietnamese because of the lack of available Vietnamese language resources such as robust NLP tools and annotated training data. This paper presents a bilingual dictionary-, a bilingual corpus- and knowledge-based method to identify Base Noun Phrase correspondences from a pair of English-Vietnamese bilingual sentences. Our method identifies anchor points of the Base Noun Phrase in English sentence, and then it performs alignment based on these anchor points. Our method not only overcomes the lack of resources of Vietnamese but also improves the performance of miss-alignment, null-alignment, overlap and conflict projection of the existing methods. The proposed technique can be easily applied to other language pairs. Experiment on 35,000 pairs of sentences in the English-Vietnamese bilingual corpus showed that our proposed method can obtain the accuracy of 78.5%.

Keywords

Base Noun Phrase anchor points BasedNP pairs 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Quirk, R., Greenbaum, S.: A University Grammar of English. Longman Group Limited, London (1990)Google Scholar
  2. 2.
    Kupiec, J.: An Algorithm for finding Noun phrase Correspondences in Bilingual Corpora. In: Proceedings of the 31st Annual Meeting on Association for Computational Linguistics, Columbus, Ohio, USA, pp. 17–22 (1993)Google Scholar
  3. 3.
    Hung, T.V.: Enlish for Computerscience, Printed in Ho Chi Minh City, Vietnam (1995)Google Scholar
  4. 4.
    Buu, H.V.: Patterns of English, Printed in Ho Chi Minh City, Vietnam (1996)Google Scholar
  5. 5.
    Ker, S.J., Chang, J.S.: A Class-based Approach to Word Alignment. Computational Linguistics 23(2), 313–343 (1997)Google Scholar
  6. 6.
    Can, N.T.: Vietnamese syntax grammar, Printed in Hanoi, Vietnam (1999)Google Scholar
  7. 7.
    Wantanabe, H., Kurohashi, S., Aramaki, E.: Finding Structural Correspondences from Bilingual Parsed Corpus, IBM Research, Tokyo Research Laboratory (1999)Google Scholar
  8. 8.
    Yarowsky, D., Ngai, G.: Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection across Aligned Corpora. Johns Hopkins University Baltimore, MD (2001)Google Scholar
  9. 9.
    Yarowsky, D., Ngai, G., Wicentowski, R.: Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora. In: Proc. of NAACL 2001 (2001)Google Scholar
  10. 10.
    Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A framework and graphical development environment for robust NLP tools and applications. In: Proc. of the 40th Anniversary Meeting of the Association for Computational Linguistics (2002)Google Scholar
  11. 11.
    Hwa, R., Resnik, P., Weinberg, A., Kolak, O.: Evaluating Translational Correspondence using Annotation Projection. In: The Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (2002)Google Scholar
  12. 12.
    Riloff, E., Schafer, C., Yarowsky, D.: Inducing Information Extraction Systems for New Languages via Cross-Language Projection. In: Proceedings of the 19th International Conference on Computational Linguistics, COLING 2002 (2002)Google Scholar
  13. 13.
    Wang, W., Zhou, M.: Structure Alignment Using Bilingual Chunking. In: The 19th International Conference on Computational Linguistics, COLING 2002 (2002)Google Scholar
  14. 14.
    Koehn, P.: Noun Phrase Translation. Ph.D. dissertation, University of Southern California (2003)Google Scholar
  15. 15.
    Och, F.J., Ney, H.: A Systematic Comparision of Various Statistical Alignment Models. Association for Computational Linguistics (2003)Google Scholar
  16. 16.
    Rebecca, Vickes, S.: The Fahasa/Heinemann Illustrated Encyclopedia, vol. 1,2,3 (2003)Google Scholar
  17. 17.
    Dien, D., Kiem, H.: POS-Tagger for English-Vietnamese Bilingual Corpus. In: HLT-NAACL 2003 Workshop (2003)Google Scholar
  18. 18.
    Hwang, Y.S., Paik, K., Sasaki, Y.: Bilingual Knowledge Extraction Using Chunk Alignment. In: PACLIC 18, December 8-10. Waseda University, Tokyo (2004)Google Scholar
  19. 19.
    Deng, Y.: Bitext Alignment for Statistical Machine Translation. Ph.D. dissertation, Johns Hopkins University, Baltimore, Maryland (2005)Google Scholar
  20. 20.
    Chau, Q.N., Tuoi, T.P., Tru, H.C.: Vietnamese Proper Noun Recognition. In: Proceedings of the 4th IEEE International Conference on Computer Sciences Research, Innovation and Vision for the Future, Ho Chi Minh City, Vietnam (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Ho Chi Minh University of IndustryHo Chi Minh CityVietnam

Personalised recommendations