Skip to main content

Phrasal Equivalence Classes for Generalized Corpus-Based Machine Translation

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2011)

Abstract

Generalizations of sentence-pairs in Example-based Machine Translation (EBMT) have been shown to increase coverage and translation quality in the past. These template-based approaches (G-EBMT) find common patterns in the bilingual corpus to generate generalized templates. In the past, patterns in the corpus were found by only few of the following ways: finding similar or dissimilar portions of text in groups of sentence-pairs, finding semantically similar words, or use dictionaries and parsers to find syntactic correspondences. This paper combines all the three aspects for generating templates. In this paper, the boundaries for aligning and extracting members (phrase-pairs) for clustering are found using chunkers (hence, syntactic information) trained independently on the two languages under consideration. Then semantically related phrase-pairs are grouped based on the contexts in which they appear. Templates are then constructed by replacing these clustered phrase-pairs by their class labels. We also perform a filtration step by simulating human labelers to obtain only those phrase-pairs that have high correspondences between the source and the target phrases that make up the phrase-pairs. Templates with English-Chinese and English-French language pairs gave significant improvements over a baseline with no templates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Block, H.U.: Example-Based Incremental Synchronous Interpretation. In: Wahlster, W. (ed.) Vermobil: Foundations of Speech-to-Speech Translation. Springer, Heidelberg (2000)

    Google Scholar 

  2. Brown, R.D.: Automated dictionary extraction for “knowledge-free” example-based translation. In: Proceedings of the Seventh International Conference on Theoretical and Methodological Issues in Machine Translation, pp. 111–118 (1997)

    Google Scholar 

  3. Brown, R.D.: Example-Based Machine Translation in the PANGLOSS System. In: Proceedings of The International Conference on Computational Linguistics, pp. 169–174 (1998)

    Google Scholar 

  4. Brown, R.D.: Automated Generalization of Translation Examples. In: Proceedings of The International Conference on Computational Linguistics, pp. 125–131 (2000)

    Google Scholar 

  5. Brown, R.D.: Transfer-Rule Induction for Example-Based Translation. In: Proceedings of The Machine Translation Summit VIII Workshop on Example-Based Machine Translation, pp. 1–11 (2001)

    Google Scholar 

  6. Brown, R.D.: A Modified BWT for highly scalable Example-based translation. In: Proceedings of The Association for Machine Translation in the Americas, pp. 27–36 (2004)

    Google Scholar 

  7. Bullinaria, J., Levy, J.: Extracting semantic representations from word co-occurrence statistics: A computational study. In: Behavior Research Methods, pp. 510–526 (2007)

    Google Scholar 

  8. Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 263–270 (2005)

    Google Scholar 

  9. Consortium, L.L.D.: Hansard corpus of parallel english and french. Linguistic Data Consortium (1997)

    Google Scholar 

  10. Gangadharaiah, R., Brown, R.D., Carbonell, J.G.: Spectral clustering for example based machine translation. In: HLT-NAACL (2006)

    Google Scholar 

  11. Gangadharaiah, R., Brown, R.D., Carbonell, J.G.: Automatic determination of number of clusters for creating templates in example-based machine translation. In: Proceedings of The Conference of the European Association for Machine Translation (2010)

    Google Scholar 

  12. Gough, N., Way, A.: Robust Large-Scale EBMT with Marker-Based Segmentation. In: Proceedings of The Conference on Theoretical and Methodological Issues in Machine Translation, pp. 95–104 (2004)

    Google Scholar 

  13. Goutte, C., Toft, P., Rostrup, E., Nielsen, F.A., Hansen, L.K.: On Clustering fMRI Time Series. NeuroImage, 298–310 (1998)

    Google Scholar 

  14. Guvenir, H.A., Cicekli, I.: Learning translation templates from examples. Information Systems, 353–363 (1998)

    Google Scholar 

  15. Harris, Z.: Distributional structure. Word 10(23), 146–162 (1954)

    Article  Google Scholar 

  16. Kaji, H., Kida, Y., Morimoto, Y.: Learning Translation Templates from Bilingual Text. In: Proceedings of The International Conference on Computational Linguistics, pp. 672–678 (1992)

    Google Scholar 

  17. Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proceedings of the IEEE ICASSP, vol. I, pp. 181–184 (1995)

    Google Scholar 

  18. Koehn, P.: Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models. In: Frederking, R.E., Taylor, K.B. (eds.) AMTA 2004. LNCS (LNAI), vol. 3265, pp. 115–124. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  19. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: Open Source Toolkit for Statistical Machine Translation. In: Annual Meeting of ACL, demonstration (2007)

    Google Scholar 

  20. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of The International Conference on Machine Learning, pp. 282–289 (2002)

    Google Scholar 

  21. Levy, R., Manning, C.D.: Is it harder to parse chinese, or the chinese treebank? In: Association for Computational Linguistics, pp. 439–446 (2003)

    Google Scholar 

  22. McTait, K.: Translation patterns, linguistic knowledge and complexity in ebmt. In: Proceedings of The Machine Translation Summit VIII Workshop on Example-Based Machine Translation, pp. 23–34 (2001)

    Google Scholar 

  23. NIST: Machine translation evaluation (2003)

    Google Scholar 

  24. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)

    Article  MATH  Google Scholar 

  25. Papineni, K., Roukos, S., Ward, T., Zhu, W.: Bleu: a method for automatic evaluation of machine translation. In: Association for Computational Linguistics, pp. 311–318 (2002)

    Google Scholar 

  26. Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: International Conference on New Methods in Language Processing, pp. 44–49 (1994)

    Google Scholar 

  27. Somers, H.L., McLean, I., Jones, D.: Experiments in multilingual example-based generation. In: International Conference on the Cognitive Science of Natural Language Processing (1994)

    Google Scholar 

  28. Tseng, H., Chang, P., Andrew, G., Jurafsky, D., Manning, C.: A conditional random field word segmenter. In: Fourth SIGHAN Workshop on Chinese Language Processing (2005)

    Google Scholar 

  29. Turney, P.D., Pantel, P.: From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 141–188 (2010)

    Google Scholar 

  30. Veale, T., Way, A.: Gaijin: A bootstrapping, template-driven approach to example-based mt. In: International Conference, Recent Advances in Natural Language Processing, pp. 239–244 (1997)

    Google Scholar 

  31. Vogel, S.: Pesa phrase pair extraction as sentence splitting. In: Machine Translation Summit X (2005)

    Google Scholar 

  32. Wilcoxon, F.: Individual comparisons by ranking methods (1945)

    Google Scholar 

  33. Yamada, K., Knight, K.: A syntax-based statistical translation model. In: Association for Computational Linguistics, pp. 523–530 (2001)

    Google Scholar 

  34. Zens, R., Och, F.J., Ney, H.: Phrase-based statistical machine translation. In: Jarke, M., Koehler, J., Lakemeyer, G. (eds.) KI 2002. LNCS (LNAI), vol. 2479, pp. 18–32. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gangadharaiah, R., Brown, R.D., Carbonell, J. (2011). Phrasal Equivalence Classes for Generalized Corpus-Based Machine Translation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6609. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19437-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19437-5_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19436-8

  • Online ISBN: 978-3-642-19437-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics