Skip to main content

Inversion Transduction Grammar Constraints for Mining Parallel Sentences from Quasi-Comparable Corpora

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3651))

Abstract

We present a new implication of Wu’s (1997) Inversion Transduction Grammar (ITG) Hypothesis, on the problem of retrieving truly parallel sentence translations from large collections of highly non-parallel documents. Our approach leverages a strong language universal constraint posited by the ITG Hypothesis, that can serve as a strong inductive bias for various language learning problems, resulting in both efficiency and accuracy gains. The task we attack is highly practical since non-parallel multilingual data exists in far greater quantities than parallel corpora, but parallel sentences are a much more useful resource. Our aim here is to mine truly parallel sentences, as opposed to comparable sentence pairs or loose translations as in most previous work. The method we introduce exploits Bracketing ITGs to produce the first known results for this problem. Experiments show that it obtains large accuracy gains on this task compared to the expected performance of state-of-the-art models that were developed for the less stringent task of mining comparable sentence pairs.

This work was supported in part by the Hong Kong Research Grants Council through grants RGC6083/99E, RGC6256/00E, DAG03/04.EG09, and RGC6206/03E.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wu, D.: An algorithm for simultaneously bracketing parallel texts by aligning words. In: ACL-1995, Cambridge, MA (1995)

    Google Scholar 

  2. Wu, D.: Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics 23 (1997)

    Google Scholar 

  3. Zens, R., Ney, H.: A comparative study on reordering constraints in statistical machine translation. In: ACL-2003, Sapporo, pp. 192–202 (2003)

    Google Scholar 

  4. Zhang, H., Gildea, D.: Syntax-based alignment: Supervised or unsupervised? In: COLING-2004, Geneva (2004)

    Google Scholar 

  5. Yamada, K., Knight, K.: A syntax-based statistical translation model. In: ACL-2001, Toulouse, France (2001)

    Google Scholar 

  6. Zhang, H., Gildea, D.: Stochastic lexicalized inversion transduction grammar for alignment. In: ACL-2005, Ann Arbor, pp. 475–482 (2005)

    Google Scholar 

  7. Zens, R., Ney, H., Watanabe, T., Sumita, E.: Reordering constraints for phrasebased statistical machine translation. In: COLING-2004, Geneva (2004)

    Google Scholar 

  8. Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: ACL-2005, Ann Arbor, pp. 263–270 (2005)

    Google Scholar 

  9. Fung, P., Cheung, P.: Mining very-non-parallel corpora: Parallel sentence and lexicon extraction via bootstrapping and em. In: EMNLP-2004, Barcelona (2004)

    Google Scholar 

  10. Munteanu, D.S., Fraser, A., Marcu, D.: Improved machine translation performance via parallel sentence extraction from comparable corpora. In: NAACL-2004 (2004)

    Google Scholar 

  11. Zhao, B., Vogel, S.: Adaptive parallel sentences mining from web bilingual news collections. In: IEEE Workshop on Data Mining (2002)

    Google Scholar 

  12. Lewis, P.M., Stearns, R.E.: Syntax-directed transduction. Journal of the Association for Computing Machinery 15, 465–488 (1968)

    MATH  Google Scholar 

  13. Fung, P., Liu, X., Cheung, C.S.: Mixed-language query disambiguation. In: ACL- 1999, Maryland (1999)

    Google Scholar 

  14. Och, F.J., Ney, H.: Improved statistical alignment models. In: ACL-2000, Hong Kong (2000)

    Google Scholar 

  15. Brown, P.F., DellaPietra, S.A., DellaPietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation. Computational Linguistics 19, 263–311 (1993)

    Google Scholar 

  16. Leusch, G., Ueffing, N., Ney, H.: A novel string-to-string distance measure with applications to machine translation evaluation. In: MT Summit IX (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wu, D., Fung, P. (2005). Inversion Transduction Grammar Constraints for Mining Parallel Sentences from Quasi-Comparable Corpora. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_23

Download citation

  • DOI: https://doi.org/10.1007/11562214_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29172-5

  • Online ISBN: 978-3-540-31724-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics