Skip to main content

Leveraging the Advantages of Associative Alignment Methods for PB-SMT Systems

  • Conference paper
  • First Online:
Human Language Technology. Challenges for Computer Science and Linguistics (LTC 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10930))

Included in the following conference series:

  • 509 Accesses

Abstract

Training statistical machine translation systems used to require heavy computation times. It has been shown that approximations in the probabilistic approach could lead to impressing improvements (Fast align). We show that, by leveraging the advantages of the associative approach, we achieve similar, even faster, training times, while keeping comparable BLEU scores. Our contributions are of two types: of the engineering type, by introducing multi-processing both in sampling-based alignment and hierarchical sub-sentential alignment; of modeling type, by introducting approximations in hierarchical sub-sentential alignment that lead to important reductions in time without affecting the alignments produced. We test and compare our improvements on six typical language pairs of the Europarl corpus.

This paper is a part of the outcome of research performed under a Waseda University Grant for Special Research Projects (Project number: 2015A-063).

Y. Lepage—Thanks to Chonlathorn Kwankajornkiet from Chulalongkorn University, Thailand, for her contribution in implementing the C core component of Cutnalign during a training period at IPS, Waseda University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.statmt.org.

  2. 2.

    http://github.com/clab/fast_align.

  3. 3.

    https://anymalign.limsi.fr/.

  4. 4.

    Thanks to the authors for providing the source code.

  5. 5.

    train-model.perl –first step 4 .

  6. 6.

    Anymalign is an anytime process, and should be given a timeout.

  7. 7.

    Notice that, by definition: \( \mathrm{Ncut} (X, Y) = \mathrm{Ncut} (\bar{X}, \bar{Y}) \) and \( \mathrm{Ncut} (X, \bar{Y}) = \mathrm{Ncut} (\bar{X}, Y) \). The same holds for \( \mathrm{cut} \).

References

  1. Ayan, N.F., Dorr, B.J.: Going beyond AER: an extensive analysis of word alignments and their impact on MT. In: Proceedings of COLING/ACL, pp. 9–16 (2006)

    Google Scholar 

  2. Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)

    Google Scholar 

  3. Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19(1), 61–74 (1993)

    Google Scholar 

  4. Dyer, C., Chahuneau, V., Smith, N.A.: A simple, fast, and effective reparameterization of IBM model 2. In: Proceedings of HLT-NAACL, pp. 644–648 (2013)

    Google Scholar 

  5. Gale, W.A., Church, K.W.: Identifying word correspondences in parallel texts. In: Proceedings of the Workshop on Speech and Natural Language, vol. 91, pp. 152–157 (1991)

    Google Scholar 

  6. Gao, Q., Vogel, S.: Parallel implementations of word alignment tool. In: Software Engineering, Testing, and Quality Assurance for Natural Language Processing, pp. 49–57 (2008)

    Google Scholar 

  7. Gong, L., Max, A., Yvon, F.: Improving bilingual sub-sentential alignment by sampling-based transpotting. In: Proceedings of IWSLT, pp. 243–250 (2013)

    Google Scholar 

  8. Heafield, K.: Kenlm: faster and smaller language model queries. In: Proceedings of the 6th Workshop on Statistical Machine Translation, pp. 187–197 (2011)

    Google Scholar 

  9. Koehn, P.: Europarl: a parallel corpus for statistical machine translation. In: Proceedings of Machine Translation Summit, vol. 5, pp. 79–86 (2005)

    Google Scholar 

  10. Koehn, P., Axelrod, A., Birch, A., Callison-Burch, C., Osborne, M., Talbot, D., White, M.: Edinburgh system description for the 2005 IWSLT speech translation evaluation. In: Proceedings of IWSLT, pp. 68–75 (2005)

    Google Scholar 

  11. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R.: Moses: open source toolkit for statistical machine translation. In: Proceedings of ACL (Poster sessions), pp. 177–180 (2007)

    Google Scholar 

  12. Lardilleux, A., Yvon, F., Lepage, Y.: Hierarchical sub-sentential alignment with Anymalign. In: Proceedings of EAMT 2012, pp. 279–286 (2012)

    Google Scholar 

  13. Lardilleux, A., Yvon, F., Lepage, Y.: Generalizing sampling-based multilingual alignment. Mach. Transl. 27(1), 1–23 (2013)

    Article  Google Scholar 

  14. Levenberg, A., Callison-Burch, C., Osborne, M.: Stream-based translation models for statistical machine translation. In: Proceedings of HLT-NAACL, pp. 394–402 (2010)

    Google Scholar 

  15. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)

    Article  Google Scholar 

  16. Smaïli, K., Jamoussi, S., Langlois, D., Haton, J.P.: Statistical feature language model. In: Proceedings of ICSLP, pp. 1357–1360 (2004)

    Google Scholar 

  17. Zha, H., He, X., Ding, C., Simon, H., Gu, M.: Bipartite graph partitioning and data clustering. In: Proceedings of International Conference on Information and Knowledge Management, pp. 25–32 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yves Lepage .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, B., Lepage, Y. (2018). Leveraging the Advantages of Associative Alignment Methods for PB-SMT Systems. In: Vetulani, Z., Mariani, J., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2015. Lecture Notes in Computer Science(), vol 10930. Springer, Cham. https://doi.org/10.1007/978-3-319-93782-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93782-3_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93781-6

  • Online ISBN: 978-3-319-93782-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics