Skip to main content

Multilingual text alignment

Aligning three or more versions of a text

  • Chapter
Parallel Text Processing

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 13))

Abstract

This chapter addresses a number of questions regarding multilingual texts, where multilingual texts is taken as meaning texts represented in more than two languages. In particular, it raises the question of whether there is any real use for mapping out multilingual translation equivalence. The view that is proposed is that multiple versions of a text can (and should) be seen as additional sources of information that can effectively be exploited to produce better bilingual alignments. A general multilingual alignment technique is presented, whose computational complexity, for a given number of texts, is the same as that of bilingual alignment. Experimental results show how this method. improves the accuracy of bilingual alignments on a trilingual corpus (The Gospel According to John, in English, French and Spanish).

This research was funded by the Canadian Department of Foreign Affairs and International Trade (http://www.dfait-maeci.gc.ca/), via the Agence de la Francophonie (http://www.francophonie.org/).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Barton, G. J. and Sternberg, M. J. E. (1987). A Strategy for the Rapid Multiple Alignment of Proteine Sequences. Journal of Molecular Biology, 198, 327–337.

    Article  Google Scholar 

  • Brown, P. F., Della Pietra, S., Della Pietra, V. J. and Mercer, R. L. (1993). The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19 (2), 263-311.

    Google Scholar 

  • Carillo, H. and Lipman, D. (1988). The Multiple Sequence Alignment Problem in Biology. In SIAM Journal of Applied Mathematics, 48(5), 1073–1082

    Google Scholar 

  • Chan, S., Wong, A. and Chiu, D. (1992). A Survey Of Multiple Sequence Comparison Methods. In Bulletin of Mathematical Biology, 54 (4), 563–598.

    Google Scholar 

  • Dagan, I. and Church, K. W. (1994). Termight: identifying and translating technical terminology. Proceedings of the 4` h Conference on Applied Natural Language Processing (ANLP ‘84), University of Stuttgart, Germany, 34–40.

    Google Scholar 

  • Dimitrova, L., Erjavec, T., Ide, N., Kaalep, H. J., Petkevic, V. and Tufis, D. (1998). Multext-East: Parallel and Comparable Corpora and Lexicons for Six Central and Eastern European Languages. Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics (ACL) and 17th International Conference on Computational Linguistics (COLING’98), Montréal, Canada, 315–319.

    Google Scholar 

  • Gale, W. A. and Church, K. W. (1991). A program for aligning sentences in bilingual corpora. http://www.up.univ-mrs.fr/—veronis/arcade/ Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics (ACL),Berkeley, 177–184.

  • Ide, N. and Véronis, J. (1994). MULTEXT (Multilingual Text Tools and Corpora). In Proceedings of the International Conference on Computational Linguistics (COLING) 1994, Kyoto, Japan, 588–592.

    Chapter  Google Scholar 

  • Isabelle, P., Dymetman, M., Foster, G. F., Jutras, J.-M., Macklovitch, E., Perrault, F., Ren, X. and Simard, M. (1993). Translation analysis and translation automation. Proceedings of the Fifth International Conference on Theoretical and Methodological Issues in Machine Translation (TMI’93), Kyoto, Japan, 201–217.

    Google Scholar 

  • Kay, M. and Röscheisen, M. (1993). Text-translation alignment. Computational Linguistics, 19(1), 121–142.

    Google Scholar 

  • Klavans, J. and Tzoukermann, E. (1995). Combining Corpus and Machine-readable Dictionary Data for Building Bilingual Lexicons. Machine Translation, 10(3), 185–218.

    Google Scholar 

  • Langlais, Ph., Simard, M., Véronis, J., Armstrong, S., Bonhomme, P., Debili, F., Isabelle, P., Souissi, E. and Théron, P. (1998). ARCADE: A Cooperative Research Project on Parallel Text Alignment Evaluation. Proceedings of the First International Conference on Language Resources and Evaluation (LREC), Granada, Spain, 289–292.

    Google Scholar 

  • Langlois, L. (1996). Bilingual Concordances: A New Tool for Bilingual Lexicographers. In Proceedings of the Second Conference of the Association for Machine Translation in the Americas (AMTA), Montréal, Canada, 34–42.

    Google Scholar 

  • Macklovitch, E. (1995). TransCheck — or the Automatic Validation of Human Translations. Proceedings of the Fifth Machine Translation Summit, MT Summit V, Luxembourg [no page numbers in original].

    Google Scholar 

  • Macklovitch, E. (1996). Peut-on vérifier automatiquement la cohérence terminologique? In META, 41 (3), 299–327.

    Google Scholar 

  • McEnery, A. M., Wilson, A., Sanchez-Leon, F. and Nieto-Serrano, A. (1997). Multilingual Resources for European Languages: Contributions of the CRATER Project. In Literary and Linguistic Computing, 12 (4), 219–226

    Google Scholar 

  • Melamed, I. D. (1996). Automatic construction of clean broad-coverage translation lexicons. Proceedings of the 2nd Conference of the Association for Machine Translation in the Americas (AMTA’96), Montreal, 125–134.

    Google Scholar 

  • Melamed, I. D. (1998) Manual Annotation of Translational Equivalence: The Blinker Project,University of Pennsylvania (IRCS Technical Report #98–07).

    Google Scholar 

  • Simard, M. (1998a). Projet TRIAL: Appariement de texte trilingue. [Online] Available: http://www-rali.iro.umontreal.ca/Trial.

  • Simard, M. (1998b). RALI-ARCADE: Analyse des erreurs d’alignement commises par Salign sur les corpus BAF et JOC. [Online] Available: http://www-rali.iro.umontreal.ca/arc-a2/analyseerreurs.

  • Simard, M. (1998c). The BAF: a corpus of English-French bitext. Proceedings of First International Conference on Language Resources and Evaluation (LREC), Granada, Spain, 489–496.

    Google Scholar 

  • Simard, M., Foster, G. F. and Isabelle, P. (1992). Using cognates to align sentences in bilingual corpora. Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation (TMI), Montréal, Canada, 67–81.

    Google Scholar 

  • Sternberg, M. J. E. (Ed.) (1996). Protein Structure Prediction —A Practical Approach. Oxford University Press, Oxford.

    Google Scholar 

  • Wagner, R. A. and Fischer, M. J. (1974). The String-to-string Correction Problem. Journal of the ACM, 21 (1), 168–173.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Simard, M. (2000). Multilingual text alignment. In: Véronis, J. (eds) Parallel Text Processing. Text, Speech and Language Technology, vol 13. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2535-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-94-017-2535-4_3

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-5555-2

  • Online ISBN: 978-94-017-2535-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics