Skip to main content

Local Context Selection for Aligning Sentences in Parallel Corpora

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4635))

Abstract

This paper presents a novel language-independent context-based sentence alignment technique given parallel corpora. We can view the problem of aligning sentences as finding translations of sentences chosen from different sources. Unlike current approaches which rely on pre-defined features and models, our algorithm employs features derived from the distributional properties of sentences and does not use any language dependent knowledge. We make use of the context of sentences and introduce the notion of Zipfian word vectors which effectively models the distributional properties of a given sentence. We accept the context to be the frame in which the reasoning about sentence alignment is done. We examine alternatives for local context models and demonstrate that our context based sentence alignment algorithm performs better than prominent sentence alignment techniques. Our system dynamically selects the local context for a pair of set of sentences which maximizes the correlation. We evaluate the performance of our system based on two different measures: sentence alignment accuracy and sentence alignment coverage. We compare the performance of our system with commonly used sentence alignment systems and show that our system performs 1.1951 to 1.5404 times better in reducing the error rate in alignment accuracy and coverage.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bicici, E., Yuret, D.: Clustering word pairs to answer analogy questions. In: Proceedings of the Fifteenth Turkish Symposium on Artificial Intelligence and Neural Networks (TAINN 2006), pp. 277–284, Akyaka, Mugla (June 2006)

    Google Scholar 

  2. Brown, P.F., Lai, J.C., Mercer, R.L.: Aligning sentences in parallel corpora. In: Proceedings of the 29th annual meeting on Association for Computational Linguistics, pp. 169–176, Association for Computational Linguistics, Morristown (1991)

    Google Scholar 

  3. Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–311 (1993)

    Google Scholar 

  4. Chen, S.F.: Aligning sentences in bilingual corpora using lexical information. In: Proceedings of the 31st annual meeting on Association for Computational Linguistics, pp. 9–16, Morristown, Association for Computational Linguistics (1993)

    Google Scholar 

  5. Erjavec, T.: MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora. In: Fourth International Conference on Language Resources and Evaluation, LREC 2004, pp. 1535–1538. Paris (2004), ELRA. http://nl.ijs.si/et/Bib/LREC04/

  6. Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. Computational Linguistics 19(1), 75–102 (1993)

    Google Scholar 

  7. Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge (1997)

    MATH  Google Scholar 

  8. Joachims, T.: Learning to Classify Text using Support Vector Machines. Kluwer Academic Publishers, Boston (2002)

    Google Scholar 

  9. Kruskal, J.B.: An overview of sequence comparison. In: Sankoff, D., Kruskal, J.B. (eds.) Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, pp. 1–44. Addison-Wesley, London (1983)

    Google Scholar 

  10. Moore, R.C.: Fast and accurate sentence alignment of bilingual corpora. In: Richardson, S.D. (ed.) AMTA 2002. LNCS (LNAI), vol. 2499, pp. 135–144. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  11. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarity in the amino acid sequences of two proteins. J. Mol. Biol. 48, 443–453 (1970)

    Article  Google Scholar 

  12. Ristad, E.S., Thomas, R.G.: New techniques for context modeling. In: ACL, pp. 220–227 (1995)

    Google Scholar 

  13. Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D., Varga, D.: The JRC-acquis: A multilingual aligned parallel corpus with 20+ languages, pp. 2142–2147 (2006), Comment: hunalign is available at http://mokk.bme.hu/resources/hunalign

  14. Turney, P.: Measuring semantic similarity by latent relational analysis. In: Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI 2005), pp. 1136–1141 (August 2005)

    Google Scholar 

  15. Wang, X.: Robust utilization of context in word sense disambiguation. In: Dey, A.K., Kokinov, B., Leake, D.B., Turner, R. (eds.) CONTEXT 2005. LNCS (LNAI), vol. 3554, pp. 529–541. Springer, Heidelberg (2005)

    Google Scholar 

  16. Yarowsky, D.: Decision lists for lexical ambiguity resolution. In: Hayes-Roth, B., Korf, R. (eds.) Proceedings of the Twelfth National Conference on Artificial Intelligence, Menlo Park. American Association for Artificial Intelligence, AAAI Press, Stanford (1994)

    Google Scholar 

  17. Yarowsky, D., Florian, R.: Evaluating sense disambiguation across diverse parameter spaces. Natural Language Engineering 8(4), 293–310 (2002)

    Article  Google Scholar 

  18. Zipf, G.K.: The meaning-frequency relationship of words. The Journal of General Psychology 33, 251–256 (1945)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Boicho Kokinov Daniel C. Richardson Thomas R. Roth-Berghofer Laure Vieu

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Biçici, E. (2007). Local Context Selection for Aligning Sentences in Parallel Corpora . In: Kokinov, B., Richardson, D.C., Roth-Berghofer, T.R., Vieu, L. (eds) Modeling and Using Context. CONTEXT 2007. Lecture Notes in Computer Science(), vol 4635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74255-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74255-5_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74254-8

  • Online ISBN: 978-3-540-74255-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics