Skip to main content
Log in

Factor-based evaluation for English to Hindi MT outputs

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Design and implementation of automatic evaluation methods is an integral part of any scientific research in accelerating the development cycle of the output. This is no less true for automatic machine translation (MT) systems. However, no such global and systematic scheme exists for evaluation of performance of an MT system. The existing evaluation metrics, such as BLEU, METEOR, TER, although used extensively in literature have faced a lot of criticism from users. Moreover, performance of these metrics often varies with the pair of languages under consideration. The above observation is no less pertinent with respect to translations involving languages of the Indian subcontinent. This study aims at developing an evaluation metric for English to Hindi MT outputs. As a part of this process, a set of probable errors have been identified manually as well as automatically. Linear regression has been used for computing weight/penalty for each error, while taking human evaluations into consideration. A sentence score is computed as the weighted sum of the errors. A set of 126 models has been built using different single classifiers and ensemble of classifiers in order to find the most suitable model for allocating appropriate weight/penalty for each error. The outputs of the models have been compared with the state-of-the-art evaluation metrics. The models developed for manually identified errors correlate well with manual evaluation scores, whereas the models for the automatically identified errors have low correlation with the manual scores. This indicates the need for further improvement and development of sophisticated linguistic tools for automatic identification and extraction of errors. Although many automatic machine translation tools are being developed for many different language pairs, there is no such generalized scheme that would lead to designing meaningful metrics for their evaluation. The proposed scheme should help in developing such metrics for different language pairs in the coming days.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. The translator as accessed in September, 2017. All the translations mentioned in the paper have been done in Sept, 2017.

  2. If the subject is a pronoun, the gender will be that of the noun the pronoun is referring to.

  3. http://tdil-dc.in/.

  4. http://www.bing.com/translator/.

  5. https://translate.google.co.in/.

  6. http://www.cdacmumbai.in/matra.

  7. http://sivareddy.in/downloads (The tagger has been developed by IIIT Hyderabad).

References

  • Balyan, R., & Chatterjee, N. (2015). Translating noun compounds using semantic relations. Computer Speech & Language, 32(1), 91–108.

    Article  Google Scholar 

  • Banerjee, S. & Lavie, A. (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the workshop on intrinsic and extrinsic evaluation measures for MT and/or summarization at 43rd ACL, Ann Arbor, Michigan.

  • Bernard, C. (1989). Language universals and linguistic typology. Chicago: The University of Chicago Press.

    Google Scholar 

  • Bharati, A. & Kulkarni, A. (2005). English from Hindi viewpoint: A Paaninian perspective. In Platinum Jubilee conference of Linguistic Society of India, held at CALTS, University of Hyderabad, Hyderabad.

  • Breiman, L. (1996a). Bagging predictors. Machine Learning, 24(2), 123–140.

    Google Scholar 

  • Breiman, L. (1996b). Stacked regressions. Machine Learning, 24(1), 49–64.

    Google Scholar 

  • Chatterjee, N., & Balyan, R. (2011). Context resolution of verb particle constructions for English to Hindi translation. In Proceedings of the 25th Pacific Asia conference on language, information and computation (PACLIC 25), Singapore (pp. 140–149).

  • Chatterjee, N., Johnson, A., & Krishna, M. (2007). Some Improvements over the BLEU metric for measuring translation quality for Hindi. Proceedings of ICCTA IEEE Computer Society, 2007, 485–490.

    Google Scholar 

  • Dave, S., Parikh, J., & Bhattacharya, P. (2001). Interlingua-based English–Hindi machine translation and language divergence. Machine Translation, 16, 251.

    Article  Google Scholar 

  • Doddington, G. (2002). Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proceedings of LILT 2002, human language technology conference, San Diego, California. pp. 138–145.

  • Dorr, B. (1993). Machine translation: A view from the Lexicon. Cambridge, MA: The MIT Press.

    Google Scholar 

  • Dorr, B. (1994). Classification of machine translation divergences and a proposed solution. Computational Linguistics, 20(4), 597–633.

    Google Scholar 

  • Farrús, M., Costa-jussà, M. R., Mariño, J. B., & Fonollosa, J. A. R. (2010). Linguistic-based evaluation criteria to identify statistical machine translation errors. In Proceedings of EAMT, Saint Raphael, France (pp. 52–57).

  • Freund, Y., & Schapire, R. (1996). Experiments with a new boosting algorithm. In Proceedings of the thirteenth international conference on machine learning, Bari, Italy (pp. 148–156).

  • Guenther, W. C. (1964). Analysis of variance. Upper Saddle river: Prentice-Hall.

    Google Scholar 

  • Gupta, D., & Chatterjee, N. (2001). Study of divergence for example based English–Hindi machine translation, STRANS 2001. IIT Kanpur, 2001, 132–139.

    Google Scholar 

  • Gupta, D., & Chatterjee, N. (2003). Identification of divergence for English-to-Hindi EBMT. In MT Summit-IX, Orleans. LA, 2003 (pp. 141–148).

  • Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady, 10(8), 707–710.

    Google Scholar 

  • Papineni, K., Roukos, S., Ward, T., & Zhu, W. (2002). BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia (pp. 311–318).

  • Popović, M. (2011). Hjerson: An open source tool for automatic error classification of machine translation output. The Prague Bulletin of Mathematical Linguistics, 96, 59–67.

    Article  Google Scholar 

  • Popović, M., & Ney, H. (2007). Word error rates: Decomposition over POS classes and applications for error analysis. In Proceedings of the 2nd ACL 07 workshop on statistical machine translation (WMT 07), Prague, Czech Republic (pp. 48–55).

  • Schapire, R. (1990). The strength of weak learnability. Machine Learning, 5(2), 197–227.

    Google Scholar 

  • Sinha, R. M. K., & Thakur, A. (2005a). Translation divergence in English–Hindi MT, In EAMT 10th annual conference, Budapest. Hungary, May 2005 (pp. 245–254).

  • Sinha, R. M. K., & Thakur, A. (2005b). Divergence patterns in machine translation between Hindi and English, In MT Summit X. Phuket. Thailand, September 2005 (pp. 346–353).

  • Snover, M., Dorr, B., Schwartz, R., Micciulla, L., & Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. In Proceedings of association for machine translation in the Americas—AMTA 2006. Cambridge, MA (pp. 223–231).

  • Stone, C. J. (1985). Additive regression and other nonparametric models. The Annals of Statistics, 13(2), 689–705.

    Article  Google Scholar 

  • Vilar, D., Xu, J., D’Haro, L. F., & Ney, H. (2006). Error analysis of statistical machine translation output. In Proceedings of the 5th international conference on language resources and evaluation (LREC, 06). Genoa (pp. 697–702).

  • Wolpert, D. (1992). Stacked generalization. Neural Networks, 5(2), 241–260.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Niladri Chatterjee.

Additional information

Renu Balyan: Work done while at IIT Delhi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Balyan, R., Chatterjee, N. Factor-based evaluation for English to Hindi MT outputs. Lang Resources & Evaluation 52, 969–996 (2018). https://doi.org/10.1007/s10579-018-9426-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-018-9426-y

Keywords

Navigation