Assessing Quality in Human- and Machine-Generated Subtitles and Captions

  • Stephen DohertyEmail author
  • Jan-Louis Kruger
Part of the Machine Translation: Technologies and Applications book series (MATRA, volume 1)


The depth, breadth, and complexity of audiovisual translation (AVT) are growing at a rapid rate. AVT is becoming increasingly merged with language technologies, including computer-assisted translation tools, machine translation, automated subtitling and captioning software, and automatic speech recognition systems. An essential component in this exciting and challenging technological development of current and future applications of AVT is the definition and assessment of quality in a way that is transparent, reliable, consistent, meaningful to all stakeholders, and readily applicable to the growing diversity of AVT. This chapter first provides a critical overview of current and future issues in the assessment of quality in human and machine-generated subtitling and captioning. It builds upon a range of contemporary industry sources and moves into cutting-edge research on the processing and reception of AVT products across a variety of media and languages. We then move to discuss the impact of new media and technologies on best practice, policy, and research. Lastly, we identify numerous challenges and potential solutions for all stakeholders in order to encourage dialogue between disciplines with the aim of articulating and answering questions of quality in AVT in an evolving technological landscape.


Translation quality assessment Principles to practice Audiovisual translation Cognition Multimodality New media Eye-tracking AVT Reception studies 


  1. Álvarez A, Martíınez-Hinarejos CD, Arzelus H, Balenciaga M, del Pozo A (2017) Improving the automatic segmentation of subtitles through conditional random field. Speech Comm 88:83–95CrossRefGoogle Scholar
  2. Apone T, Botkin B, Brooks M, Goldberg L (2011) Caption accuracy metrics project: research into automated error ranking of real-time captions in live television news programs. The Carl and Ruth Shapiro Family National Center for Accessible Media, BostonGoogle Scholar
  3. Bentin S, Kutas M, Hillyard S (1993) Electrophysiological evidence for task effects on semantic priming in auditory word processing. Psychophysiology 30:161–169CrossRefGoogle Scholar
  4. Blake J (2015) YouTube: we know automatic subtitles aren’t good enough, Newsbeat. Available via: Accessed 8 Aug 2017
  5. Cho H, Shen L, Wilson K (2014) Perceived realism: dimensions and roles in narrative persuasion. Commun Res 41(6):828–851CrossRefGoogle Scholar
  6. D’Ydewalle G, De Bruycker W (2007) Eye movements of children and adults while reading television subtitles. Eur Psychol 12(3):196–205CrossRefGoogle Scholar
  7. D’Ydewalle G, Muylle P, van Rensbergen J (1985) Attention shifts in partially redundant information situations. In: Groner R, McConkie C, Menz C (eds) Eye movements and human information processing. Elsevier, Amsterdam, pp 375–384Google Scholar
  8. D’Ydewalle G, Praet C, Verfaillie K, van Rensbergen JV (1991) Watching subtitled television: automatic reading behaviour. Commun Res 18(5):650–666CrossRefGoogle Scholar
  9. Dawson S (2015) Hamburger challenge. Available via: http://wwwyoutubecom/watch?v=CRPK8sy4Qqk. Accessed 9 Aug 2017
  10. Díaz Cintas J (2013) Subtitling: theory, practice and research. In: Millán C, Bartrina F (eds) The Routledge handbook of translation studies. Routledge, London, pp 273–287Google Scholar
  11. Díaz-Cintas J, Remael A (2014) Audiovisual translation: subtitling. Routledge, LondonGoogle Scholar
  12. Doherty S (2016) The impact of translation technologies on the process and product of translation. Int J Commun 10:947–969Google Scholar
  13. Doherty S, Kruger J-L (2018) The development of eye tracking in empirical research on subtitling and captioning. In: Dwyer T, Perkins C, Redmond S, Sita J (eds) Eye tracking the moving image. Bloomsbury, London, pp 46–64Google Scholar
  14. Dumouchel P, Boulianne G, Brousseau J (2011) Measures for quality of closed captioning. In: Şerban A, Matamala A, Lavaur JM (eds) Audiovisual translation in close-up: practical and theoretical approaches. Peter Lang, Bern, pp 161–172Google Scholar
  15. Fox W (2016) Integrated titles: an improved viewing experience? In: Hansen-Schirra S, Grucza S (eds) Eyetracking and applied linguistics. Language Science Press, Berlin, pp 5–30Google Scholar
  16. Gambier Y (2013) The position of audiovisual translation studies. In: Millán-Varela C, Bartrina F (eds) The Routledge handbook of translation studies. Routledge, London, pp 45–59CrossRefGoogle Scholar
  17. Gaspari F, Almaghout H, Doherty S (2015) A survey of machine translation competences: insights for translation technology educators and practitioners. Perspect Stud Translatol 23(3):333–358CrossRefGoogle Scholar
  18. Gernsbacher MA (2015) Video captions benefit everyone. Policy Insight Behav Brain Sci 2(1):195–202CrossRefGoogle Scholar
  19. Green MC, Brock TC, Kaufman GF (2004) Understanding media enjoyment: the role of transportation into narrative worlds. Commun Theory 14(4):311–327CrossRefGoogle Scholar
  20. Gunter TC, Friederici AD (1999) Concerning the automaticity of syntactic processing. Psychophysiology 36(1):126–137CrossRefGoogle Scholar
  21. Harrenstien K (2009) Automatic captions in YouTube. Google, California. Available via: Accessed 15 June 2017
  22. Igareda P, Matamala A (2011) Developing a learning platform for AVT: challenges and solutions. JoSTrans 16:145–162Google Scholar
  23. Ivarsson J, Carroll M (1998) Subtitling. TransEdit, SimrishamnGoogle Scholar
  24. Jurafsky D, Martin JH (2009) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd edn. Pearson Prentice Hall, Upper Saddle RiverGoogle Scholar
  25. Just MA, Carpenter PA (1980) A theory of reading: from eye fixations to comprehension. Psychol Rev 87(4):329–354CrossRefGoogle Scholar
  26. Kalyuga S (2012) Instructional benefits of spoken words: a review of cognitive load factors. Educ Res Rev 7(2):145–159CrossRefGoogle Scholar
  27. Karamitroglou F (1998) A proposed set of subtitling standards in Europe. Trans J 2(2). Available via: Accessed 01 June 2017
  28. Kruger J-L, Doherty S (2016) Measuring cognitive load in the presence of educational video: towards a multimodal methodology. Australas J Educ Technol 32(6):19–31CrossRefGoogle Scholar
  29. Kruger J-L, Steyn F (2014) Subtitles and eye tracking: reading and performance. Read Res Q 49(1):105–120CrossRefGoogle Scholar
  30. Kruger J-L, Soto-Sanfiel MT, Doherty S, Ibrahim R (2016) Towards a cognitive audiovisual translatology: subtitles and embodied cognition. In: Muñoz R (ed) Reembedding translation process research. John Benjamins, Amsterdam, pp 171–194CrossRefGoogle Scholar
  31. Kruger J-L, Doherty S, Ibrahim R (2017a) Electroencephalographic beta coherence as an objective measure of psychological immersion in film. Int J Trans 19:1–11Google Scholar
  32. Kruger J-L, Doherty S, Sato-Sanfiel M (2017b) Original language subtitles: their effects on the native and foreign viewer. Comunicar 50(1):23–32CrossRefGoogle Scholar
  33. Kruger J-L, Doherty S, Fox W, de Lissa P (2018) Multimodal measurement of cognitive load during subtitle processing: same-language subtitles for foreign language viewers. In: Lacruz I, Jääskeläinen R (eds) New directions in cognitive and empirical translation process research. John Benjamins, Amsterdam, pp 267–294Google Scholar
  34. Kubitschke L, Cullen K, Dolpin C, Larin S, Cederbom A (2013) Study on assessing and promoting e-accessibility. European Commission Directorate-General of Communications Networks, Content and TechnologyGoogle Scholar
  35. Lison P, Tiedemann J (2016) OpenSubtitles2016: extracting large parallel corpora from movie and TV subtitles proceedings of the 10th international conference on language resources and evaluation, Portorož, Slovenia, pp 923–929Google Scholar
  36. Lockrey M (2015) YouTube automatic captions score an incredible 95% accuracy rate! The deaf captioner, medium. Available via: Accessed 20 Jan 2017
  37. Mangiron C (2013) Subtitling in game localisation: a descriptive study. Perspect Stud Translatol 21(1):42–56CrossRefGoogle Scholar
  38. Mangiron C (2016) Reception of game subtitles: an empirical study. Translator 22(1):72–93CrossRefGoogle Scholar
  39. Mayer RE, Moreno R (2003) Nine ways to reduce cognitive load in multimedia learning. Educ Psychol 38(1):43–52CrossRefGoogle Scholar
  40. McMains S, Kastner S (2009) Visual attention. Encyclopaedia of neuroscience. Springer, Berlin, pp 4296–4302CrossRefGoogle Scholar
  41. Mikul C (2014) Caption quality: international approaches to standards and measurement. Media Access Australia, SydneyGoogle Scholar
  42. Nilsson N, Nordahl R, Serafin S (2016) Immersion revisited: a review of existing definitions of immersion and their relation to different theories of presence. Hum Technol 12(2):108–134CrossRefGoogle Scholar
  43. O’Hagan M (2009) Evolution of user-generated translation: Fansubs, translation hacking and crowdsourcing. J Int Localis 1(1):94–121Google Scholar
  44. O’Hagan M (2013) The impact of new technologies on translation studies: a technological turn? In: Millán-Varela C, Bartrina F (eds) The Routledge handbook of translation studies. Routledge, London, pp 503–518Google Scholar
  45. Ortiz-Boix C, Matamala A (2017) Assessing the quality of post-edited wildlife documentaries. Perspect Stud Trans Theory Pract 25(4):571–593Google Scholar
  46. Perego E (2016) History, development, challenges and opportunities of empirical research in audiovisual translation. Across Lang Cult 17(2):155–162CrossRefGoogle Scholar
  47. Perego E, Del Missier F, Porta M, Mosconi M (2010) The cognitive effectiveness of subtitle processing. Media Psychol 13(3):243–272CrossRefGoogle Scholar
  48. Plass JL, Moreno R, Brünken R (2010) Cognitive load theory. Cambridge University Press, New YorkCrossRefGoogle Scholar
  49. Pollatsek A, Raney GE, LaGasse L, Rayner K (1993) The use of information below fixation in reading and in visual search. Can J Exp Psychol 47:179–200CrossRefGoogle Scholar
  50. Rajendran D, Duchowski A, Orero P, Martínez J, Romero-Fresco P (2013) Effects of text chunking on subtitling: a quantitative and qualitative examination. Perspect Stud Translatol 21(1):5–21CrossRefGoogle Scholar
  51. Romero-Fresco P (2016) Accessing communication: the quality of live subtitles in the UK. Lang Commun 49:56–69CrossRefGoogle Scholar
  52. Romero-Fresco P, Pérez JM (2015) Accuracy rate in live subtitling: the NER model. In: Díaz Cintas J, Baños Piñero R (eds) Audiovisual translation in a global context. Palgrave Macmillan, London, pp 28–50Google Scholar
  53. Rossi S, Gugler MF, Friederici AD, Hahne A (2006) The impact of proficiency on syntactic second language processing of German and Italian: evidence from event-related potentials. J Cogn Neurosci 18(12):2030–2048CrossRefGoogle Scholar
  54. Sasamoto R, Doherty S (2016) Towards the optimal use of impact captions on TV programmes. In: O’Hagan M, Zhang Q (eds) Conflict and communication: a changing Asia in a globalising world. Nova Science Publishers, Hauppauge, pp 210–247Google Scholar
  55. Smith AC, Monaghan P, Huettig F (2017) The multimodal nature of spoken word processing in the visual world: testing the predictions of alternative models of multimodal integration. J Mem Lang 93:276–303CrossRefGoogle Scholar
  56. Staub A, Rayner K (2007) Eye movements and on-line comprehension processes. In: Gaskell G (ed) Oxford handbook of psycholinguistics. Oxford University Press, Oxford, pp 327–342Google Scholar
  57. Szarkowska A, Krejtz I, Pilipczuk O, Dutka Ł, Kruger J-L (2016) The effects of text editing and subtitle presentation rate on the comprehension and reading patterns of interlingual and intralingual subtitles among deaf, hard of hearing and hearing viewers. Across Lang Cult 17(2):183–204CrossRefGoogle Scholar
  58. Tal-Or N, Cohen J (2010) Understanding audience involvement: conceptualizing and manipulating identification and transportation. Poetics 38(4):402–418CrossRefGoogle Scholar
  59. Wald M, Bain K (2008) Universal access to communication and learning: the role of automatic speech recognition. Univ Access Inf Soc 6(4):435–447CrossRefGoogle Scholar
  60. Wissmath B, Weibel D, Groner R (2009) Dubbing or subtitling? Effects on spatial presence, transportation, flow, and enjoyment. J Media Psychol 21(3):114–125CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Humanities and Languages, The University of New South WalesSydneyAustralia
  2. 2.Department of LinguisticsMacquarie UniversitySydneyAustralia
  3. 3.North-West UniversityVanderbijlparkSouth Africa

Personalised recommendations