Advertisement

Abstract

Existing sentence alignment methods are founded fundamentally on sentence length and lexical correspondences. Methods based on the former follow in general the length proportionality assumption that the lengths of sentences in one language tend to be proportional to that of their translations, and are known to bear poor adaptivity to new languages and corpora. In this paper, we attempt to interpret this assumption from a new perspective via the notion of collaborative matching, based on the observation that sentences can work collaboratively during alignment rather than separately as in previous studies. Our approach is tended to be independent on any specific language and corpus, so that it can be adaptively applied to a variety of texts without binding to any prior knowledge about the texts. We use one-to-one sentence alignment to illustrate this approach and implement two specific alignment methods, which are evaluated on six bilingual corpora of different languages and domains. Experimental results confirm the effectiveness of this collaborative matching approach.

Keywords

Sentence alignment Machine translation 

References

  1. 1.
    Brown, P.F., Lai, J.C., Mercer, R.L.: Aligning sentences in parallel corpora. In: Proceedings of the 29th Annual Meeting on Association for Computational Linguistics (ACL 1991), pp. 169–176 (1991)Google Scholar
  2. 2.
    Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)Google Scholar
  3. 3.
    Collier, N., Ono, K., Hirakawa, H.: An experiment in hybrid dictionary and statistical sentence alignment. In: Proceedings of the 17th International Conference on Computational Linguistics - The 36th Annual Meeting of the Association for Computational Linguistics (COLING-ACL 1998), pp. 268–274 (1998)Google Scholar
  4. 4.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, Hoboken (1991)CrossRefGoogle Scholar
  5. 5.
    Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. In: Proceedings of the 29th Annual Meeting on Association for Computational Linguistics (ACL 1991), pp. 177–184 (1991)Google Scholar
  6. 6.
    Haruno, M., Yamazaki, T.: High-performance bilingual text alignment using statistical and dictionary information. In: Proceedings of the 34th Annual Meeting on Association for Computational Linguistics (ACL 1996), pp. 131–138 (1996)Google Scholar
  7. 7.
    Kit, C., et al.: Clause alignment for hong kong legal texts: a lexical-based approach. Int. J. Corpus Linguist. 9, 29–51 (2004)CrossRefGoogle Scholar
  8. 8.
    Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In: MT Summit 2005, pp. 79–86 (2005)Google Scholar
  9. 9.
    Li, P., Sun, M., Xue, P.: Fast-champollion: a fast and robust sentence alignment algorithm. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010): Posters, pp. 710–718 (2010)Google Scholar
  10. 10.
    Ma, X.: Champollion: a robust parallel text sentence aligner. In: LREC 2006, pp. 489–492 (2006)Google Scholar
  11. 11.
    Moore, R.C.: Fast and accurate sentence alignment of bilingual corpora. In: Richardson, S.D. (ed.) AMTA 2002. LNCS (LNAI), vol. 2499, pp. 135–144. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-45820-4_14CrossRefGoogle Scholar
  12. 12.
    Nie, J.Y., Simard, M., Isabelle, P., Durand, R.: Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 74–81 (1999)Google Scholar
  13. 13.
    Quan, X., Kit, C.: Towards non-monotonic sentence alignment. Inf. Sci. 323, 34–47 (2015)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Quan, X., Kit, C., Song, Y.: Non-monotonic sentence alignment via semisupervised learning. In: Proceedings of 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), pp. 622–630 (2013)Google Scholar
  15. 15.
    Varga, D., Németh, L., Halácsy, P., Kornai, A., Trón, V., Nagy, V.: Parallel corpora for medium density languages. In: Recent Advances in Natural Language Processing (RANLP 2005), pp. 590–596 (2005)Google Scholar
  16. 16.
    Wu, D.: Aligning a parallel English-Chinese corpus statistically with lexical criteria. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics (ACL 1994), pp. 80–87 (1994)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.School of Data and Computer ScienceSun Yat-sen UniversityGuangzhouChina
  2. 2.Department of Linguistics and TranslationCity University of Hong KongKowloon TongHong Kong

Personalised recommendations