Multi-modal indicators for estimating perceived cognitive load in post-editing of machine translation

  • Nico HerbigEmail author
  • Santanu Pal
  • Mihaela Vela
  • Antonio Krüger
  • Josef van Genabith


In this paper, we develop a model that uses a wide range of physiological and behavioral sensor data to estimate perceived cognitive load (CL) during post-editing (PE) of machine translated (MT) text. By predicting the subjectively reported perceived CL, we aim to quantify the extent of demands placed on the mental resources available during PE. This could for example be used to better capture the usefulness of MT proposals for PE, including the mental effort required, in contrast to the mere closeness to a reference perspective that current MT evaluation focuses on. We compare the effectiveness of our physiological and behavioral features individually and in combination with each other and with the more traditional text and time features relevant to the task. Many of the physiological and behavioral features have not previously been applied to PE. Based on the data gathered from ten participants, we show that our multi-modal measurement approach outperforms all baseline measures in terms of predicting the perceived level of CL as measured by a psychological scale. Combinations of eye-, skin-, and heart-based indicators enhance the results over each individual measure. Additionally, adding PE time improves the regression results further. An investigation of correlations between the best performing features, including sensor features previously unexplored in PE, and the corresponding subjective ratings indicates that the multi-modal approach takes advantage of several weakly to moderately correlated features to combine them into a stronger model.


Cognitive load Multi-modality Post-editing Machine translation Physiological measurements Behavioral measurements 



  1. Arshad S, Wang Y, Chen F (2013) Analysing mouse activity for cognitive load detection. In: Proceedings of the 25th Australian computer-human interaction conference: augmentation, application, innovation, collaboration, ACM, pp 115–118Google Scholar
  2. Asteriadis S, Tzouveli P, Karpouzis K, Kollias S (2009) Estimation of behavioral user state based on eye gaze and head pose—application in an e-learning environment. Multimed Tools Appl 41(3):469–493CrossRefGoogle Scholar
  3. Callison-Burch C, Koehn P, Monz C, Peterson K, Przybocki M, Zaidan OF (2010) Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation. In: Proceedings of the joint fifth workshop on statistical machine translation and metrics, association for computational linguistics, pp 17–53Google Scholar
  4. Chanel G, Rebetez C, Bétrancourt M, Pun T (2008) Boredom, engagement and anxiety as indicators for adaptation to difficulty in games. In: Proceedings of the 12th international conference on entertainment and media in the ubiquitous era, ACM, pp 13–17Google Scholar
  5. Chen F, Zhou J, Wang Y, Yu K, Arshad SZ, Khawaji A, Conway D (2016) Robust multimodal cognitive load measurement. Springer, New YorkCrossRefGoogle Scholar
  6. Chen S, Epps J (2013) Automatic classification of eye activity for cognitive load measurement with emotion interference. Comput Methods Prog Biomed 110(2):111–124CrossRefGoogle Scholar
  7. Corder GW (2009) Nonparametric statistics for non-statisticians: a step-by-step approach. Wiley, New YorkCrossRefzbMATHGoogle Scholar
  8. Demberg V, Sayeed A (2016) The frequency of rapid pupil dilations as a measure of linguistic processing difficulty. PLoS ONE 11(1):1–29CrossRefGoogle Scholar
  9. Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1923CrossRefGoogle Scholar
  10. Doherty S, O’Brien S, Carl M (2010) Eye tracking as an MT evaluation technique. Mach Transl 24(1):1–13CrossRefGoogle Scholar
  11. Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. arXivGoogle Scholar
  12. Goldberg JH, Kotval XP (1999) Computer interface evaluation using eye movements: methods and constructs. Int J Ind Ergon 24(6):631–645CrossRefGoogle Scholar
  13. Guerberof A (2009) Productivity and quality in the post-editing of outputs from translation memories and machine translation. Int J Localiz 7(1):11–21Google Scholar
  14. Hart SG, Staveland LE (1988) Development of NASA-TLX (Task Load Index): results of empirical and theoretical research. In: Advances in psychology, vol 52, Elsevier, Amsterdam, pp 139–183Google Scholar
  15. Hockey GRJ (1997) Compensatory control in the regulation of human performance under stress and high workload: a cognitive-energetical framework. Biol Psychol 45(1):73–93CrossRefGoogle Scholar
  16. Hosseini SA, Khalilzadeh MA (2010) Emotional stress recognition system using EEG and psychophysiological signals: Using new labelling process of EEG signals in emotional stress state. In: International conference on biomedical engineering and computer science, IEEE, pp 1–6Google Scholar
  17. Iqbal ST, Zheng XS, Bailey BP (2004) Task-evoked pupillary response to mental workload in human-computer interaction. In: Extended abstracts on human factors in computing systems, ACM, pp 1477–1480Google Scholar
  18. Kahou SE, Bouthillier X, Lamblin P, Gulcehre C, Michalski V, Konda K, Jean S, Froumenty P, Dauphin Y, Boulanger-Lewandowski N (2016) Emonets: multimodal deep learning approaches for emotion recognition in video. J Multimodal User Interfaces 10(2):99–111CrossRefGoogle Scholar
  19. Koponen M (2012) Comparing human perceptions of post-editing effort with post-editing operations. In: Proceedings of the seventh workshop on statistical machine translation, association for computational linguistics, pp 181–190Google Scholar
  20. Koponen M (2016) Is machine translation post-editing worth the effort? A survey of research into post-editing and effort. J Specialised Transl 25:131–148Google Scholar
  21. Koponen M, Aziz W, Ramos L, Specia L (2012) Post-editing time as a measure of cognitive effort. In: AMTA workshop on post-editing technology and practice, pp 11–20Google Scholar
  22. Kramer AF (1991) Physiological metrics of mental workload: A review of recent progress. Multiple-task performance pp 279–328Google Scholar
  23. Krings HP (2001) Repairing texts: empirical investigations of machine translation post-editing processes, vol 5. Kent State University Press, KentGoogle Scholar
  24. Kruger JL, Doherty S (2016) Measuring cognitive load in the presence of educational video: towards a multimodal methodology. Aust J Educ Technol 32(6):19Google Scholar
  25. Kruger JL, Doherty S, Fox W, De Lissa P (2018) Multimodal measurement of cognitive load during subtitle processing. Innovation and expansion in translation process research, p 267Google Scholar
  26. Lacruz I, Shreve GM (2014) Pauses and cognitive effort in post-editing. Post-editing of machine translation: processes and applications, p 246Google Scholar
  27. Lacruz I, Shreve GM, Angelone E (2012) Average pause ratio as an indicator of cognitive effort in post-editing: a case study. In: AMTA workshop on post-editing technology and practice, pp 21–30Google Scholar
  28. Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning, pp 282–289Google Scholar
  29. Lavie A, Agarwal A (2007) Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the second workshop on statistical machine translationGoogle Scholar
  30. Lin CY, Och FJ (2004) Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In: Proceedings of the 42nd annual meeting of the association for computational linguisticsGoogle Scholar
  31. Mack DJ, Belfanti S, Schwarz U (2017) The effect of sampling rate and lowpass filters on saccades-a modeling approach. Behav Res Methods 49(6):2146–2162CrossRefGoogle Scholar
  32. Mellinger CD (2014) Computer-assisted translation: an empirical investigation of cognitive effort. Kent State University, KentGoogle Scholar
  33. Moorkens J, O’Brien S, da Silva IA, de Lima Fonseca NB, Alves F (2015) Correlations of perceived post-editing effort with measurements of actual effort. Mach Transl 29(3–4):267–284CrossRefGoogle Scholar
  34. Mulder L (1992) Measurement and analysis methods of heart rate and respiration for use in applied environments. Biol Psychol 34(2):205–236CrossRefGoogle Scholar
  35. O’Brien S (2005) Methodologies for measuring the correlations between post-editing effort and machine translatability. Mach Transl 19(1):37–58CrossRefGoogle Scholar
  36. O’Brien S (2006) Eye-tracking and translation memory matches. Perspectives 14(3):185–205Google Scholar
  37. O’Brien S (2006b) Pauses as indicators of cognitive effort in post-editing machine translation output. Across Lang Cult 7(1):1–21CrossRefGoogle Scholar
  38. Paas F, Tuovinen JE, Tabbers H, Van Gerven PW (2003) Cognitive load measurement as a means to advance cognitive load theory. Educ Psychol 38(1):63–71CrossRefGoogle Scholar
  39. Paas FG, Van Merriënboer JJ (1994) Instructional control of cognitive load in the training of complex cognitive tasks. Educ Psychol Rev 6(4):351–371CrossRefGoogle Scholar
  40. Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the ACL, pp 311–318Google Scholar
  41. Popovic M, Lommel A, Burchardt A, Avramidis E, Uszkoreit H (2014) Relations between different types of post-editing operations, cognitive effort and temporal effort. In: Proceedings of the 17th annual conference of the european association for machine translation, pp 191–198Google Scholar
  42. Rowe DW, Sibert J, Irwin D (1998) Heart rate variability: indicator of user state as an aid to human-computer interaction. In: Proceedings of the conference on human factors in computing systems, pp 480–487Google Scholar
  43. Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguisticsGoogle Scholar
  44. Sennrich R, Birch A, Currey A, Germann U, Haddow B, Heafield K, Miceli Barone AV, Williams P (2017) The University of Edinburgh’s neural MT systems for WMT17. In: Proceedings of the second conference on machine translation, vol 2. Shared Task Papers, pp 389–399Google Scholar
  45. Shi Y, Ruiz N, Taib R, Choi E, Chen F (2007) Galvanic skin response (GSR) as an index of cognitive load. In: Extended abstracts on human factors in computing systems, pp 2651–2656Google Scholar
  46. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the association for machine translation in the Americas, pp 223–231Google Scholar
  47. Snover M, Madnani N, Dorr B, Schwartz R (2009) Fluency, adequacy, or HTER? Exploring different human judgments with a tunable MT metric. In: Proceedings of the 4th workshop on statistical machine translation, pp 259–268Google Scholar
  48. Solovey E, Schermerhorn P, Scheutz M, Sassaroli A, Fantini S, Jacob R (2012) Brainput: enhancing interactive systems with streaming fNIRS brain input. In: Proceedings of the conference on human factors in computing systems, ACM, pp 2193–2202Google Scholar
  49. Soukupova T, Cech J (2016) Real-time eye blink detection using facial landmarks. In: 21st computer vision winter workshop, pp 1–8Google Scholar
  50. Specia L, Raj D, Turchi M (2010) Machine translation evaluation versus quality estimation. Mach Transl 24(1):39–50CrossRefGoogle Scholar
  51. Stuyven E, Van der Goten K, Vandierendonck A, Claeys K, Crevits L (2000) The effect of cognitive load on saccadic eye movements. Acta Psychologica 104(1):69–85CrossRefGoogle Scholar
  52. Sweller J (1988) Cognitive load during problem solving: effects on learning. Cognit Sci 12(2):257–285CrossRefGoogle Scholar
  53. Sweller J, Van Merrienboer JJ, Paas FG (1998) Cognitive architecture and instructional design. Educ Psychol Rev 10(3):251–296CrossRefGoogle Scholar
  54. Tatsumi M (2009) Correlation between automatic evaluation metric scores, post-editing speed, and some other factors. The twelfth machine translation summit, pp 332–339Google Scholar
  55. Temnikova IP (2010) Cognitive evaluation approach for a controlled language post-editing experiment. In: Proceedings of the international conference on language resources and evaluationGoogle Scholar
  56. Van Orden KF, Limbert W, Makeig S, Jung TP (2001) Eye activity correlates of workload during a visuospatial memory task. Hum Factors 43(1):111–121CrossRefGoogle Scholar
  57. Vieira LN (2014) Indices of cognitive effort in machine translation post-editing. Mach Transl 28(3–4):187–216CrossRefGoogle Scholar
  58. Vieira LN (2016) How do measures of cognitive effort relate to each other? A multivariate analysis of post-editing process data. Mach Transl 30(1–2):41–62CrossRefGoogle Scholar
  59. Villarejo MV, Zapirain BG, Zorrilla AM (2012) A stress sensor based on galvanic skin response (GSR) controlled by ZigBee. Sensors 12(5):6075–6101CrossRefGoogle Scholar
  60. Yamakoshi T, Yamakoshi K, Tanaka S, Nogawa M, Park SB, Shibata M, Sawada Y, Rolfe P, Hirose Y (2008) Feasibility study on driver’s stress detection from differential skin temperature measurement. In: Engineering in medicine and biology society, IEEE, pp 1076–1079Google Scholar
  61. Zampieri M, Vela M (2014) Quantifying the influence of MT output in the translators’ performance: A case study in technical translation. In: Proceedings of the EACL workshop on humans and computer-assisted translation, pp 93–98Google Scholar

Copyright information

© Springer Nature B.V. 2019

Authors and Affiliations

  1. 1.German Research Center for Artificial Intelligence (DFKI)SaarbrückenGermany
  2. 2.Saarland UniversitySaarbrückenGermany

Personalised recommendations