Assessing the Accuracy of Discourse Connective Translations: Validation of an Automatic Metric

  • Najeh Hajlaoui
  • Andrei Popescu-Belis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7817)


Automatic metrics for the evaluation of machine translation (MT) compute scores that characterize globally certain aspects of MT quality such as adequacy and fluency. This paper introduces a reference-based metric that is focused on a particular class of function words, namely discourse connectives, of particular importance for text structuring, and rather challenging for MT. To measure the accuracy of connective translation (ACT), the metric relies on automatic word-level alignment between a source sentence and respectively the reference and candidate translations, along with other heuristics for comparing translations of discourse connectives. Using a dictionary of equivalents, the translations are scored automatically, or, for better precision, semi-automatically. The precision of the ACT metric is assessed by human judges on sample data for English/French and English/Arabic translations: the ACT scores are on average within 2% of human scores. The ACT metric is then applied to several commercial and research MT systems, providing an assessment of their performance on discourse connectives.


Machine translation MT evaluation discourse connectives 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Denkowski, M., Lavie, A.: METEOR-NEXT and the METEOR Paraphrase Tables: Improved Evaluation Support for Five Target Languages. In: Proc. of the ACL 2010 Joint Workshop on Statistical Machine Translation and Metrics MATR, Uppsala (2010)Google Scholar
  2. 2.
    Habash, N., Rambow, O.: Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In: Proc. of ACL 2010, Ann Arbor, MI, pp. 573–580 (2005)Google Scholar
  3. 3.
    Hajlaoui, N., Popescu-Belis, A.: Translating English Discourse Connectives into Arabic: a Corpus-based Analysis and an Evaluation Metric. In: Proc. of the CAASL4 Workshop at AMTA 2012 (Fourth Workshop on Computational Approaches to Arabic Script-based Languages), San Diego, CA, p. 8 (2012)Google Scholar
  4. 4.
    Koehn, P.: Europarl: A Parallel Corpus for Statistical Machine Translation. In: Proc. of the Tenth Machine Translation Summit, Phuket, pp. 79–86 (2005)Google Scholar
  5. 5.
    Lin, C.-Y., Och, F.J.: Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics. In: Proc. of the ACL, Barcelona (2004)Google Scholar
  6. 6.
    Max, A., Crego, J.M., Yvon, F.: Contrastive Lexical Evaluation of Machine Translation. In: Proc. of the International Conference on Language Resources and Evaluation (LREC), Valletta, Malta (2010)Google Scholar
  7. 7.
    Meyer, T., Popescu-Belis, A.: Using sense-labeled discourse connectives for statistical machine translation. In: Proc. of the EACL 2012 Joint Workshop on Exploiting Synergies between IR and MT and Hybrid Approaches to MT (ESIRMT-HyTra), Avignon, pp. 129–138 (2012)Google Scholar
  8. 8.
    Meyer, T., Popescu-Belis, A., Hajlaoui, N., Gesmundo, A.: Machine Translation of Labeled Discourse Connectives. In: Proc. of AMTA 2012 (10th Conference of the Association for Machine Translation in the Americas), San Diego, CA, p. 10 (2012)Google Scholar
  9. 9.
    Nagard, R.L., Koehn, P.: Aiding pronoun translation with co-reference resolution. In: Proc. of the Joint 5th Workshop on Statistical Machine Translation and Metrics (MATR), Uppsala, pp. 258–267 (2010)Google Scholar
  10. 10.
    Naskar, S.K., Toral, A., Gaspari, F., Way, A.: A framework for diagnostic evaluation of MT based on linguistic checkpoints. In: Proc. of MT Summit XIII, Xiamen, China (2011)Google Scholar
  11. 11.
    Och, F.J., Ney, H.: Improved Statistical Alignment Models. In: Proc. of the ACL, Hong-Kong, China, pp. 440–447 (2000)Google Scholar
  12. 12.
    Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)zbMATHCrossRefGoogle Scholar
  13. 13.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proc. of ACL, Philadelphia, PA, pp. 311–318 (2002)Google Scholar
  14. 14.
    Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., Webber, B.: The Penn Discourse Treebank 2.0. In: Proc. of 6th International Conference on Language Resources and Evaluation (LREC), Marrakech, Morocco, pp. 2961–2968 (2008)Google Scholar
  15. 15.
    Popovic, M., Ney, H.: Towards automatic error analysis of machine translation output. Computational Linguistics 37(4), 657–688 (2011)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Zhou, M., Wang, B., Liu, S., Li, M., Zhang, D., Zhao, T.: Diagnostic evaluation of machine translation systems using automatically constructed linguistic check-points. In: Proc. of COLING, Manchester, UK, pp. 1121–1128 (2008)Google Scholar
  17. 17.
    Zufferey, S., Cartoni, B.: English and French causal connectives in contrast. Languages in Contrast 12(2), 232–250 (2012)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Najeh Hajlaoui
    • 1
  • Andrei Popescu-Belis
    • 1
  1. 1.Idiap Research InstituteMartignySwitzerland

Personalised recommendations