Phrasal Equivalence Classes for Generalized Corpus-Based Machine Translation

Gangadharaiah, Rashmi; Brown, Ralf D.; Carbonell, Jaime

doi:10.1007/978-3-642-19437-5_2

Rashmi Gangadharaiah¹⁷,
Ralf D. Brown¹⁷ &
Jaime Carbonell¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6609))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1330 Accesses
6 Citations

Abstract

Generalizations of sentence-pairs in Example-based Machine Translation (EBMT) have been shown to increase coverage and translation quality in the past. These template-based approaches (G-EBMT) find common patterns in the bilingual corpus to generate generalized templates. In the past, patterns in the corpus were found by only few of the following ways: finding similar or dissimilar portions of text in groups of sentence-pairs, finding semantically similar words, or use dictionaries and parsers to find syntactic correspondences. This paper combines all the three aspects for generating templates. In this paper, the boundaries for aligning and extracting members (phrase-pairs) for clustering are found using chunkers (hence, syntactic information) trained independently on the two languages under consideration. Then semantically related phrase-pairs are grouped based on the contexts in which they appear. Templates are then constructed by replacing these clustered phrase-pairs by their class labels. We also perform a filtration step by simulating human labelers to obtain only those phrase-pairs that have high correspondences between the source and the target phrases that make up the phrase-pairs. Templates with English-Chinese and English-French language pairs gave significant improvements over a baseline with no templates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Block, H.U.: Example-Based Incremental Synchronous Interpretation. In: Wahlster, W. (ed.) Vermobil: Foundations of Speech-to-Speech Translation. Springer, Heidelberg (2000)
Google Scholar
Brown, R.D.: Automated dictionary extraction for “knowledge-free” example-based translation. In: Proceedings of the Seventh International Conference on Theoretical and Methodological Issues in Machine Translation, pp. 111–118 (1997)
Google Scholar
Brown, R.D.: Example-Based Machine Translation in the PANGLOSS System. In: Proceedings of The International Conference on Computational Linguistics, pp. 169–174 (1998)
Google Scholar
Brown, R.D.: Automated Generalization of Translation Examples. In: Proceedings of The International Conference on Computational Linguistics, pp. 125–131 (2000)
Google Scholar
Brown, R.D.: Transfer-Rule Induction for Example-Based Translation. In: Proceedings of The Machine Translation Summit VIII Workshop on Example-Based Machine Translation, pp. 1–11 (2001)
Google Scholar
Brown, R.D.: A Modified BWT for highly scalable Example-based translation. In: Proceedings of The Association for Machine Translation in the Americas, pp. 27–36 (2004)
Google Scholar
Bullinaria, J., Levy, J.: Extracting semantic representations from word co-occurrence statistics: A computational study. In: Behavior Research Methods, pp. 510–526 (2007)
Google Scholar
Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 263–270 (2005)
Google Scholar
Consortium, L.L.D.: Hansard corpus of parallel english and french. Linguistic Data Consortium (1997)
Google Scholar
Gangadharaiah, R., Brown, R.D., Carbonell, J.G.: Spectral clustering for example based machine translation. In: HLT-NAACL (2006)
Google Scholar
Gangadharaiah, R., Brown, R.D., Carbonell, J.G.: Automatic determination of number of clusters for creating templates in example-based machine translation. In: Proceedings of The Conference of the European Association for Machine Translation (2010)
Google Scholar
Gough, N., Way, A.: Robust Large-Scale EBMT with Marker-Based Segmentation. In: Proceedings of The Conference on Theoretical and Methodological Issues in Machine Translation, pp. 95–104 (2004)
Google Scholar
Goutte, C., Toft, P., Rostrup, E., Nielsen, F.A., Hansen, L.K.: On Clustering fMRI Time Series. NeuroImage, 298–310 (1998)
Google Scholar
Guvenir, H.A., Cicekli, I.: Learning translation templates from examples. Information Systems, 353–363 (1998)
Google Scholar
Harris, Z.: Distributional structure. Word 10(23), 146–162 (1954)
Article Google Scholar
Kaji, H., Kida, Y., Morimoto, Y.: Learning Translation Templates from Bilingual Text. In: Proceedings of The International Conference on Computational Linguistics, pp. 672–678 (1992)
Google Scholar
Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proceedings of the IEEE ICASSP, vol. I, pp. 181–184 (1995)
Google Scholar
Koehn, P.: Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models. In: Frederking, R.E., Taylor, K.B. (eds.) AMTA 2004. LNCS (LNAI), vol. 3265, pp. 115–124. Springer, Heidelberg (2004)
Chapter Google Scholar
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: Open Source Toolkit for Statistical Machine Translation. In: Annual Meeting of ACL, demonstration (2007)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of The International Conference on Machine Learning, pp. 282–289 (2002)
Google Scholar
Levy, R., Manning, C.D.: Is it harder to parse chinese, or the chinese treebank? In: Association for Computational Linguistics, pp. 439–446 (2003)
Google Scholar
McTait, K.: Translation patterns, linguistic knowledge and complexity in ebmt. In: Proceedings of The Machine Translation Summit VIII Workshop on Example-Based Machine Translation, pp. 23–34 (2001)
Google Scholar
NIST: Machine translation evaluation (2003)
Google Scholar
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)
Article MATH Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.: Bleu: a method for automatic evaluation of machine translation. In: Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: International Conference on New Methods in Language Processing, pp. 44–49 (1994)
Google Scholar
Somers, H.L., McLean, I., Jones, D.: Experiments in multilingual example-based generation. In: International Conference on the Cognitive Science of Natural Language Processing (1994)
Google Scholar
Tseng, H., Chang, P., Andrew, G., Jurafsky, D., Manning, C.: A conditional random field word segmenter. In: Fourth SIGHAN Workshop on Chinese Language Processing (2005)
Google Scholar
Turney, P.D., Pantel, P.: From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 141–188 (2010)
Google Scholar
Veale, T., Way, A.: Gaijin: A bootstrapping, template-driven approach to example-based mt. In: International Conference, Recent Advances in Natural Language Processing, pp. 239–244 (1997)
Google Scholar
Vogel, S.: Pesa phrase pair extraction as sentence splitting. In: Machine Translation Summit X (2005)
Google Scholar
Wilcoxon, F.: Individual comparisons by ranking methods (1945)
Google Scholar
Yamada, K., Knight, K.: A syntax-based statistical translation model. In: Association for Computational Linguistics, pp. 523–530 (2001)
Google Scholar
Zens, R., Och, F.J., Ney, H.: Phrase-based statistical machine translation. In: Jarke, M., Koehler, J., Lakemeyer, G. (eds.) KI 2002. LNCS (LNAI), vol. 2479, pp. 18–32. Springer, Heidelberg (2002)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Carnegie Mellon University, USA
Rashmi Gangadharaiah, Ralf D. Brown & Jaime Carbonell

Authors

Rashmi Gangadharaiah
View author publications
You can also search for this author in PubMed Google Scholar
Ralf D. Brown
View author publications
You can also search for this author in PubMed Google Scholar
Jaime Carbonell
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gangadharaiah, R., Brown, R.D., Carbonell, J. (2011). Phrasal Equivalence Classes for Generalized Corpus-Based Machine Translation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6609. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19437-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-19437-5_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19436-8
Online ISBN: 978-3-642-19437-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics