Automatic and Human Evaluation on English-Croatian Legislative Test Set

  • Marija Brkić
  • Sanja Seljan
  • Tomislav Vičić
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7817)


This paper presents work on the manual and automatic evaluation of the online available machine translation (MT) service Google Translate, for the English-Croatian language pair in legislation and general domains. The experimental study is conducted on the test set of 200 sentences in total. Human evaluation is performed by native speakers, using the criteria of fluency and adequacy, and it is enriched by error analysis. Automatic evaluation is performed on a single reference set by using the following metrics: BLEU, NIST, F-measure and WER. The influence of lowercasing, tokenization and punctuation is discussed. Pearson’s correlation between automatic metrics is given, as well as correlation between the two criteria, fluency and adequacy, and automatic metrics.


Machine Translation Human Evaluation Statistical Machine Translation Computational Linguistics Automatic Evaluation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Callison-Burch, C., Fordyce, C., Koehn, P., Monz, C., Schroeder, J.: Meta-evaluation of machine translation. In: Proceedings of the Second Workshop on Statistical Machine Translation, pp. 136–158 (2007)Google Scholar
  2. 2.
    Coughlin, D.: Correlating automated and human assessments of machine translation quality. In: Proceedings of MT Summit IX, pp. 63–70 (2003)Google Scholar
  3. 3.
    Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 138–145. Morgan Kaufmann Publishers Inc. (2002)Google Scholar
  4. 4.
    Farrús Cabeceran, M., Ruiz Costa-Jussà, M., Mariño Acebal, J.B., Rodríguez Fonollosa, J.A., et al.: Linguistic-based evaluation criteria to identify statistical machine translation errors. In: Proceedings of EAMT, pp. 52–57 (2010)Google Scholar
  5. 5.
    Flanagan, M.: Error classification for mt evaluation. In: Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas, pp. 65–72 (1994)Google Scholar
  6. 6.
    Hovy, E., King, M., Popescu-Belis, A.: Principles of context-based machine translation evaluation. Machine Translation 17(1), 43–75 (2002)CrossRefGoogle Scholar
  7. 7.
    Koehn, P.: Statistical significance tests for machine translation evaluation. In: Proceedings of EMNLP, vol. 4, pp. 388–395 (2004)Google Scholar
  8. 8.
    Koehn, P.: Statistical Machine Translation, vol. 11. Cambridge University Press (2010)Google Scholar
  9. 9.
    Leusch, G., Ueffing, N., Ney, H., et al.: A novel string-to-string distance measure with applications to machine translation evaluation. In: Proceedings of MT Summit IX, pp. 33–40 (2003)Google Scholar
  10. 10.
    Melamed, I.D., Green, R., Turian, J.P.: Precision and recall of machine translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Companion Volume of the Proceedings of HLT-NAACL 2003–Short Papers, vol. 2, pp. 61–63. Association for Computational Linguistics (2003)Google Scholar
  11. 11.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)Google Scholar
  12. 12.
    Stymne, S.: Blast: A tool for error analysis of machine translation output. In: Proc. of the 49th ACL, HLT, Systems Demonstrations, pp. 56–61 (2011)Google Scholar
  13. 13.
    Tillmann, C., Vogel, S., Ney, H., Zubiaga, A., Sawaf, H.: Accelerated dp based search for statistical translation. In: European Conf. on Speech Communication and Technology, pp. 2667–2670 (1997)Google Scholar
  14. 14.
    Vilar, D., Xu, J., d’Haro, L.F., Ney, H.: Error analysis of statistical machine translation output. In: Proceedings of LREC, pp. 697–702 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Marija Brkić
    • 1
  • Sanja Seljan
    • 2
  • Tomislav Vičić
    • 3
  1. 1.Department of InformaticsUniversity of RijekaRijekaCroatia
  2. 2.Department of Information SciencesFaculty of Humanities and Social SciencesZagrebCroatia
  3. 3.Freelance translatorCroatia

Personalised recommendations