Skip to main content

Effect of Occlusion on Deaf and Hard of Hearing Users’ Perception of Captioned Video Quality

  • Conference paper
  • First Online:
Universal Access in Human-Computer Interaction. Access to Media, Learning and Assistive Environments (HCII 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12769))

Included in the following conference series:

Abstract

While the availability of captioned television programming has increased, the quality of this captioning is not always acceptable to Deaf and Hard of Hearing (DHH) viewers, especially for live or unscripted content broadcast from local television stations. Although some current caption metrics focus on textual accuracy (comparing caption text with an accurate transcription of what was spoken), other properties may affect DHH viewers’ judgments of caption quality. In fact, U.S. regulatory guidance on caption quality standards includes issues relating to how the placement of captions may occlude other video content. To this end, we conducted an empirical study with 29 DHH participants to investigate the effect on user’s judgements of caption quality or their enjoyment of the video, when captions overlap with an onscreen speaker’s eyes or mouth, or when captions overlap with onscreen text. We observed significantly more negative user-response scores in the case of such overlap. Understanding the relationship between these occlusion features and DHH viewers’ judgments of the quality of captioned video will inform future work towards the creation caption evaluation metrics, to help ensure the accessibility of captioned television or video.

The contents of this paper were developed under a grant from the National Institute on Disability, Independent Living, and Rehabilitation Research (NIDILRR grant number #90DPCP0002). NIDILRR is a Center within the Administration for Community Living (ACL), Department of Health and Human Services (HHS). The contents of this paper do not necessarily represent the policy of NIDILRR, ACL, HHS, and you should not assume endorsement by the Federal Government.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Throughout this paper, we use the term “metrics” to refer to some formula or algorithm that can produce a numerical score to represent the quality of a captioned video, whether it requires some human judgements or is calculated in a fully automatic manner. Thus, a metric may consider various features, and research on the relationship between features and the judgements of DHH viewers is foundational to deciding to incorporate particular features into a metric. Furthermore, we use the term “features” to refer to the aspects or properties of captioned video that may contribute to its quality. For instance, some prior research has investigated how DHH individuals’ judgements of the quality of captions may be influenced by: incorrect transcription of speech into text [32], the latency of the caption relative to the timing of speech [33], font size or color in captions [5, 7], and other features.

References

  1. Ali, A., Renals, S.: Word error rate estimation for speech recognition: e-WER. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 20–24 (2018)

    Google Scholar 

  2. Akahori, W., Hirai, T., Morishima, S.: Dynamic subtitle placement considering the region of interest and speaker location. In: Proceedings of 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (2017)

    Google Scholar 

  3. Apone, T., Botkin, B., Brooks, M., Goldberg, L.: Research into automated error ranking of real-time captions in live television news programs. Caption accuracy metrics project. National Center for Accessible Media (2011). http://ncam.wgbh.org/file_download/136

  4. Apone, T., Botkin, B., Brooks, M., Goldberg, L.: Caption accuracy metrics project. Research into automated error ranking of real-time captions in live television news programs. The Carl and Ruth Shapiro Family National Center for Accessible Media at WGBH (NCAM) (2011)

    Google Scholar 

  5. Berke, L., Albusays, K., Seita, M., Huenerfauth, M.: Preferred appearance of captions generated by automatic speech recognition for deaf and hard-of-hearing viewers. In: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (CHI EA 2019), p. 6. ACM, New York (2019). Paper LBW1713. https://doi.org/10.1145/3290607.3312921

  6. Berke, L., Seita, M., Huenerfauth, M.: Deaf and hard-of-hearing users’ prioritization of genres of online video content requiring accurate captions. In: Proceedings of the 17th International Web for All Conference (W4A 2020), pp. 1–12. ACM, New York (2020). Article 3. https://doi.org/10.1145/3371300.3383337

  7. Berke, L.: Displaying confidence from imperfect automatic speech recognition for captioning. In: ACM Special Interest Group on Accessible Computing (SIGACCESS), no. 117, pp. 14–18 (2017). https://doi.org/10.1145/3051519.3051522

  8. Blackwell, D.L., Lucas, J.W., Clarke, T.C.: Summary health statistics for U.S. adults: national health interview survey, 2012. National Center for Health Statistics. Vital Health Statistics. Series 10 (260) (2014)

    Google Scholar 

  9. Blanchfield, B.B., Feldman, J.J., Dunbar, J.J., Gardner, E.N.: The severely to profoundly hearing-impaired population in the United States: prevalence estimates and demographics. J. Am. Acad. Audiol. 12(4), 183–189 (2001). http://www.ncbi.nlm.nih.gov/pubmed/11332518

  10. Brown, A., et al.: Dynamic subtitles: the user experience. In: Proceedings of the ACM International Conference on Interactive Experiences for TV and Online Video (TVX 2015), pp. 103–112. ACM, New York (2015). https://doi.org/10.1145/2745197.2745204

  11. Burnham, D., et al.: Parameters in television captioning for deaf and hard-of-hearing adults: effects of caption rate versus text reduction on comprehension. J. Deaf Stud. Deaf Educ. 13(3), 391–404 (2008). https://doi.org/10.1093/deafed/enn003

    Article  Google Scholar 

  12. Crabb, M., Jones, R., Armstrong, M., Hughes, C.J.: Online news videos: the UX of subtitle position. In: Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility (ASSETS 2015), pp. 215–222. ACM, New York (2015). https://doi.org/10.1145/2700648.2809866

  13. Dwyer, T., Perkins, C., Redmond, S., Sita, J.: Seeing into Screens: Eye Tracking and the Moving Image. Bloomsbury, New York (2018)

    Book  Google Scholar 

  14. FFMPEG Developers: FFMPEG tool (version be1d324) [software] (2016). http://ffmpeg.org/

  15. Federal Communications Commission: Closed Captioning Quality Report and Order, Declaratory Ruling, FNPRM (2014). https://www.fcc.gov/document/closed-captioning-quality-report-and-order-declaratory-ruling-fnprm

  16. Federal Communications Commission: Closed captioning of internet protocol-delivered video programming: implementation of the twenty-first century communications and video accessibility act of 2010. Adopted rules governing the closed captioning requirements for the owners, providers, and distributors of video programming delivered using IP, and governing the closed captioning capabilities of certain apparatus on which consumers view video programming. MB Docket No. 11-154. FCC 12-9 (2012)

    Google Scholar 

  17. Glasser, A., Mason Riley, E., Weeks, K., Kushalnagar, R.: Mixed reality speaker identification as an accessibility tool for deaf and hard of hearing users. In: Proceedings of the 25th ACM Symposium on Virtual Reality Software and Technology (VRST 2019), pp. 1–3. ACM, New York (2019). Article 80. https://doi.org/10.1145/3359996.3364720

  18. Government of Canada: Canadian Radio-Television and Telecommunications Commission, & Crtc. Broadcasting Regulatory Policy CRTC 2019-308 (2019). https://crtc.gc.ca/eng/archive/2019/2019-308.htm

  19. Gower, M., Shiver, B., Pandhi, C., Trewin, S.: Leveraging pauses to improve video captions. In: Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2018), pp. 414–416. ACM, New York (2018). https://doi.org/10.1145/3234695.3241023

  20. Gulliver, S.R., Ghinea, G.: How level and type of deafness affect user perception of multimedia video clips. Univ. Access Inf. Soc. 2(4), 374–386 (2003). https://doi.org/10.1007/s10209-003-0067-5

    Article  Google Scholar 

  21. Gulliver, S.R., Ghinea, G.: Impact of captions on hearing impaired and hearing perception of multimedia video clips. In: Proceedings of the IEEE International Conference on Multimedia and Expo (2003)

    Google Scholar 

  22. Hong, R., Wang, M., Xu, M., Yan, S., Chua, T.-S.: Dynamic captioning: video accessibility enhancement for hearing impairment. In: Proceedings of the 18th ACM International Conference on Multimedia (MM 2010), pp. 421–430. ACM, New York (2010). https://doi.org/10.1145/1873951.1874013

  23. Kafle, S., Huenerfauth, M.: Evaluating the usability of automatically generated captions for people who are deaf or hard of hearing. In: Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2017), pp. 165–174. ACM, New York (2017). https://doi.org/10.1145/3132525.3132542

  24. Kushalnagar, R.S., Lasecki, W.S., Bigham, J.P.: Accessibility evaluation of classroom captions. In: ACM Special Interest Group on Accessible Computing (SIGACCESS), p. 24, vol. 5, no. 3, January 2014. Article 7. https://doi.org/10.1145/2543578

  25. Lasecki, W.S., Miller, C.D., Bigham, J.P.: Warping time for more effective real-time crowdsourcing. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2013), pp. 2033–2036. ACM, New York (2013). https://doi.org/10.1145/2470654.2466269

  26. Lee, D.G., Fels, D.I., Udo, J.P.: Emotive captioning. Comput. Entertain. 5(2), 15 (2007). https://doi.org/10.1145/1279540.1279551. Article 11

    Article  Google Scholar 

  27. Nam, S., Fels, D.I., Chignell, M.H.: Modeling closed captioning subjective quality assessment by deaf and hard of hearing viewers. Proc. IEEE Trans. Comput. Soc. Syst. 7(3), 621–631 (2020). https://doi.org/10.1109/TCSS.2020.2972399

    Article  Google Scholar 

  28. NIDCD: National Institute of Deafness and Other Communication Disorder: Captions for Deaf and Hard-of-Hearing Viewers (2017). https://www.nidcd.nih.gov/health/captions-deaf-and-hard-hearing-viewers

  29. Ofcom: measuring live subtitling quality, UK (2015). https://www.nidcd.nih.gov/health/captions-deaf-and-hard-hearing-viewers

  30. Oskar Olofsson: Detecting Unsynchronized Audio and Subtitles Using Machine Learning, Dissertation (2019). http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-261414

  31. Press Release: World-first approach to reduce latency in live captioning. Ericsson (2016). https://www.ericsson.com/en/press-releases/2016/6/world-first-approach-to-reduce-latency-in-live-captioning

  32. Romero-Fresco, P., Pérez, J.M.: Accuracy rate in live subtitling: the NER model. In: Audiovisual Translation in a Global Context. Palgrave Macmillan, London (2015)

    Google Scholar 

  33. Sandford, J.: The impact of subtitle display rate on enjoyment under normal television viewing conditions. In: Proceedings of IET Conference Proceedings (2015). https://doi.org/10.1049/ibc.2015.0018

  34. Strelcyk, O., Singh, G.: TV listening and hearing aids. PLoS ONE 13(6), e0200083 (2018). https://doi.org/10.1371/journal.pone.0200083

    Article  Google Scholar 

  35. Vy, Q.V., Fels, D.I.: Using placement and name for speaker identification in captioning. In: Miesenberger, K., Klaus, J., Zagler, W., Karshmer, A. (eds.) ICCHP 2010. LNCS, vol. 6179, pp. 247–254. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14097-6_40

    Chapter  Google Scholar 

  36. Waller, J.M., Kushalnagar, R.S.: Evaluation of automatic caption segmentation. In: Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2016), pp. 331–332. ACM, New York (2016). https://doi.org/10.1145/2982142.2982205

  37. Wehrmeyer, J.: Eye-tracking Deaf and hearing viewing of sign language interpreted news broadcasts. J. Eye Move. Res. (2014). https://core.ac.uk/download/pdf/158976673.pdf

  38. Zedan, I.A., Elsayed, K.M., Emary, E.: Caption detection, localization and type recognition in Arabic news video. In: Proceedings of the 10th International Conference on Informatics and Systems (INFOS 2016), pp. 114–120. ACM, New York (2016). https://doi.org/10.1145/2908446.2908472

  39. Zhang, Z., Wang, C., Wang, Y.: Video-based face recognition: state of the art. In: Sun, Z., Lai, J., Chen, X., Tan, T. (eds.) CCBR 2011. LNCS, vol. 7098, pp. 1–9. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25449-9_1

    Chapter  Google Scholar 

  40. Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: The Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  41. Brown, A., et al.: Dynamic subtitles: the user experience. In: Proceedings of the ACM International Conference on Interactive Experiences for TV and Online Video (TVX 2015), pp. 103–112. Association for Computing Machinery, New York (2015). https://doi.org/10.1145/2745197.2745204

  42. English-Language Working Group: Closed Captioning Standards and Protocol for Canadian English Language Television Programming Services (2008). https://www.cab-acr.ca/english/social/captioning/captioning.pdf. Accessed 19 Nov 2020

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matt Huenerfauth .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Amin, A.A., Hassan, S., Huenerfauth, M. (2021). Effect of Occlusion on Deaf and Hard of Hearing Users’ Perception of Captioned Video Quality. In: Antona, M., Stephanidis, C. (eds) Universal Access in Human-Computer Interaction. Access to Media, Learning and Assistive Environments. HCII 2021. Lecture Notes in Computer Science(), vol 12769. Springer, Cham. https://doi.org/10.1007/978-3-030-78095-1_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-78095-1_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-78094-4

  • Online ISBN: 978-3-030-78095-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics