Local Context Selection for Aligning Sentences in Parallel Corpora

Biçici, Ergun

doi:10.1007/978-3-540-74255-5_7

Local Context Selection for Aligning Sentences in Parallel Corpora

Ergun Biçici¹

Conference paper

1381 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4635))

Abstract

This paper presents a novel language-independent context-based sentence alignment technique given parallel corpora. We can view the problem of aligning sentences as finding translations of sentences chosen from different sources. Unlike current approaches which rely on pre-defined features and models, our algorithm employs features derived from the distributional properties of sentences and does not use any language dependent knowledge. We make use of the context of sentences and introduce the notion of Zipfian word vectors which effectively models the distributional properties of a given sentence. We accept the context to be the frame in which the reasoning about sentence alignment is done. We examine alternatives for local context models and demonstrate that our context based sentence alignment algorithm performs better than prominent sentence alignment techniques. Our system dynamically selects the local context for a pair of set of sentences which maximizes the correlation. We evaluate the performance of our system based on two different measures: sentence alignment accuracy and sentence alignment coverage. We compare the performance of our system with commonly used sentence alignment systems and show that our system performs 1.1951 to 1.5404 times better in reducing the error rate in alignment accuracy and coverage.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bicici, E., Yuret, D.: Clustering word pairs to answer analogy questions. In: Proceedings of the Fifteenth Turkish Symposium on Artificial Intelligence and Neural Networks (TAINN 2006), pp. 277–284, Akyaka, Mugla (June 2006)
Google Scholar
Brown, P.F., Lai, J.C., Mercer, R.L.: Aligning sentences in parallel corpora. In: Proceedings of the 29th annual meeting on Association for Computational Linguistics, pp. 169–176, Association for Computational Linguistics, Morristown (1991)
Google Scholar
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–311 (1993)
Google Scholar
Chen, S.F.: Aligning sentences in bilingual corpora using lexical information. In: Proceedings of the 31st annual meeting on Association for Computational Linguistics, pp. 9–16, Morristown, Association for Computational Linguistics (1993)
Google Scholar
Erjavec, T.: MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora. In: Fourth International Conference on Language Resources and Evaluation, LREC 2004, pp. 1535–1538. Paris (2004), ELRA. http://nl.ijs.si/et/Bib/LREC04/
Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. Computational Linguistics 19(1), 75–102 (1993)
Google Scholar
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge (1997)
MATH Google Scholar
Joachims, T.: Learning to Classify Text using Support Vector Machines. Kluwer Academic Publishers, Boston (2002)
Google Scholar
Kruskal, J.B.: An overview of sequence comparison. In: Sankoff, D., Kruskal, J.B. (eds.) Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, pp. 1–44. Addison-Wesley, London (1983)
Google Scholar
Moore, R.C.: Fast and accurate sentence alignment of bilingual corpora. In: Richardson, S.D. (ed.) AMTA 2002. LNCS (LNAI), vol. 2499, pp. 135–144. Springer, Heidelberg (2002)
Chapter Google Scholar
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarity in the amino acid sequences of two proteins. J. Mol. Biol. 48, 443–453 (1970)
Article Google Scholar
Ristad, E.S., Thomas, R.G.: New techniques for context modeling. In: ACL, pp. 220–227 (1995)
Google Scholar
Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D., Varga, D.: The JRC-acquis: A multilingual aligned parallel corpus with 20+ languages, pp. 2142–2147 (2006), Comment: hunalign is available at http://mokk.bme.hu/resources/hunalign
Turney, P.: Measuring semantic similarity by latent relational analysis. In: Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI 2005), pp. 1136–1141 (August 2005)
Google Scholar
Wang, X.: Robust utilization of context in word sense disambiguation. In: Dey, A.K., Kokinov, B., Leake, D.B., Turner, R. (eds.) CONTEXT 2005. LNCS (LNAI), vol. 3554, pp. 529–541. Springer, Heidelberg (2005)
Google Scholar
Yarowsky, D.: Decision lists for lexical ambiguity resolution. In: Hayes-Roth, B., Korf, R. (eds.) Proceedings of the Twelfth National Conference on Artificial Intelligence, Menlo Park. American Association for Artificial Intelligence, AAAI Press, Stanford (1994)
Google Scholar
Yarowsky, D., Florian, R.: Evaluating sense disambiguation across diverse parameter spaces. Natural Language Engineering 8(4), 293–310 (2002)
Article Google Scholar
Zipf, G.K.: The meaning-frequency relationship of words. The Journal of General Psychology 33, 251–256 (1945)
Google Scholar

Download references

Author information

Authors and Affiliations

Koç University, Rumeli Feneri Yolu 34450, Sariyer Istanbul, Turkey
Ergun Biçici

Authors

Ergun Biçici
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Boicho Kokinov Daniel C. Richardson Thomas R. Roth-Berghofer Laure Vieu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Biçici, E. (2007). Local Context Selection for Aligning Sentences in Parallel Corpora . In: Kokinov, B., Richardson, D.C., Roth-Berghofer, T.R., Vieu, L. (eds) Modeling and Using Context. CONTEXT 2007. Lecture Notes in Computer Science(), vol 4635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74255-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-74255-5_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74254-8
Online ISBN: 978-3-540-74255-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics