Effect of Occlusion on Deaf and Hard of Hearing Users’ Perception of Captioned Video Quality

Amin, Akhter Al; Hassan, Saad; Huenerfauth, Matt

doi:10.1007/978-3-030-78095-1_16

Akhter Al Amin¹⁰,
Saad Hassan¹⁰ &
Matt Huenerfauth¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12769))

Included in the following conference series:

International Conference on Human-Computer Interaction

1403 Accesses
1 Citations
1 Altmetric

Abstract

While the availability of captioned television programming has increased, the quality of this captioning is not always acceptable to Deaf and Hard of Hearing (DHH) viewers, especially for live or unscripted content broadcast from local television stations. Although some current caption metrics focus on textual accuracy (comparing caption text with an accurate transcription of what was spoken), other properties may affect DHH viewers’ judgments of caption quality. In fact, U.S. regulatory guidance on caption quality standards includes issues relating to how the placement of captions may occlude other video content. To this end, we conducted an empirical study with 29 DHH participants to investigate the effect on user’s judgements of caption quality or their enjoyment of the video, when captions overlap with an onscreen speaker’s eyes or mouth, or when captions overlap with onscreen text. We observed significantly more negative user-response scores in the case of such overlap. Understanding the relationship between these occlusion features and DHH viewers’ judgments of the quality of captioned video will inform future work towards the creation caption evaluation metrics, to help ensure the accessibility of captioned television or video.

The contents of this paper were developed under a grant from the National Institute on Disability, Independent Living, and Rehabilitation Research (NIDILRR grant number #90DPCP0002). NIDILRR is a Center within the Administration for Community Living (ACL), Department of Health and Human Services (HHS). The contents of this paper do not necessarily represent the policy of NIDILRR, ACL, HHS, and you should not assume endorsement by the Federal Government.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Throughout this paper, we use the term “metrics” to refer to some formula or algorithm that can produce a numerical score to represent the quality of a captioned video, whether it requires some human judgements or is calculated in a fully automatic manner. Thus, a metric may consider various features, and research on the relationship between features and the judgements of DHH viewers is foundational to deciding to incorporate particular features into a metric. Furthermore, we use the term “features” to refer to the aspects or properties of captioned video that may contribute to its quality. For instance, some prior research has investigated how DHH individuals’ judgements of the quality of captions may be influenced by: incorrect transcription of speech into text [32], the latency of the caption relative to the timing of speech [33], font size or color in captions [5, 7], and other features.

References

Ali, A., Renals, S.: Word error rate estimation for speech recognition: e-WER. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 20–24 (2018)
Google Scholar
Akahori, W., Hirai, T., Morishima, S.: Dynamic subtitle placement considering the region of interest and speaker location. In: Proceedings of 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (2017)
Google Scholar
Apone, T., Botkin, B., Brooks, M., Goldberg, L.: Research into automated error ranking of real-time captions in live television news programs. Caption accuracy metrics project. National Center for Accessible Media (2011). http://ncam.wgbh.org/file_download/136
Apone, T., Botkin, B., Brooks, M., Goldberg, L.: Caption accuracy metrics project. Research into automated error ranking of real-time captions in live television news programs. The Carl and Ruth Shapiro Family National Center for Accessible Media at WGBH (NCAM) (2011)
Google Scholar
Berke, L., Albusays, K., Seita, M., Huenerfauth, M.: Preferred appearance of captions generated by automatic speech recognition for deaf and hard-of-hearing viewers. In: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (CHI EA 2019), p. 6. ACM, New York (2019). Paper LBW1713. https://doi.org/10.1145/3290607.3312921
Berke, L., Seita, M., Huenerfauth, M.: Deaf and hard-of-hearing users’ prioritization of genres of online video content requiring accurate captions. In: Proceedings of the 17th International Web for All Conference (W4A 2020), pp. 1–12. ACM, New York (2020). Article 3. https://doi.org/10.1145/3371300.3383337
Berke, L.: Displaying confidence from imperfect automatic speech recognition for captioning. In: ACM Special Interest Group on Accessible Computing (SIGACCESS), no. 117, pp. 14–18 (2017). https://doi.org/10.1145/3051519.3051522
Blackwell, D.L., Lucas, J.W., Clarke, T.C.: Summary health statistics for U.S. adults: national health interview survey, 2012. National Center for Health Statistics. Vital Health Statistics. Series 10 (260) (2014)
Google Scholar
Blanchfield, B.B., Feldman, J.J., Dunbar, J.J., Gardner, E.N.: The severely to profoundly hearing-impaired population in the United States: prevalence estimates and demographics. J. Am. Acad. Audiol. 12(4), 183–189 (2001). http://www.ncbi.nlm.nih.gov/pubmed/11332518
Brown, A., et al.: Dynamic subtitles: the user experience. In: Proceedings of the ACM International Conference on Interactive Experiences for TV and Online Video (TVX 2015), pp. 103–112. ACM, New York (2015). https://doi.org/10.1145/2745197.2745204
Burnham, D., et al.: Parameters in television captioning for deaf and hard-of-hearing adults: effects of caption rate versus text reduction on comprehension. J. Deaf Stud. Deaf Educ. 13(3), 391–404 (2008). https://doi.org/10.1093/deafed/enn003
Article Google Scholar
Crabb, M., Jones, R., Armstrong, M., Hughes, C.J.: Online news videos: the UX of subtitle position. In: Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility (ASSETS 2015), pp. 215–222. ACM, New York (2015). https://doi.org/10.1145/2700648.2809866
Dwyer, T., Perkins, C., Redmond, S., Sita, J.: Seeing into Screens: Eye Tracking and the Moving Image. Bloomsbury, New York (2018)
Book Google Scholar
FFMPEG Developers: FFMPEG tool (version be1d324) [software] (2016). http://ffmpeg.org/
Federal Communications Commission: Closed Captioning Quality Report and Order, Declaratory Ruling, FNPRM (2014). https://www.fcc.gov/document/closed-captioning-quality-report-and-order-declaratory-ruling-fnprm
Federal Communications Commission: Closed captioning of internet protocol-delivered video programming: implementation of the twenty-first century communications and video accessibility act of 2010. Adopted rules governing the closed captioning requirements for the owners, providers, and distributors of video programming delivered using IP, and governing the closed captioning capabilities of certain apparatus on which consumers view video programming. MB Docket No. 11-154. FCC 12-9 (2012)
Google Scholar
Glasser, A., Mason Riley, E., Weeks, K., Kushalnagar, R.: Mixed reality speaker identification as an accessibility tool for deaf and hard of hearing users. In: Proceedings of the 25th ACM Symposium on Virtual Reality Software and Technology (VRST 2019), pp. 1–3. ACM, New York (2019). Article 80. https://doi.org/10.1145/3359996.3364720
Government of Canada: Canadian Radio-Television and Telecommunications Commission, & Crtc. Broadcasting Regulatory Policy CRTC 2019-308 (2019). https://crtc.gc.ca/eng/archive/2019/2019-308.htm
Gower, M., Shiver, B., Pandhi, C., Trewin, S.: Leveraging pauses to improve video captions. In: Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2018), pp. 414–416. ACM, New York (2018). https://doi.org/10.1145/3234695.3241023
Gulliver, S.R., Ghinea, G.: How level and type of deafness affect user perception of multimedia video clips. Univ. Access Inf. Soc. 2(4), 374–386 (2003). https://doi.org/10.1007/s10209-003-0067-5
Article Google Scholar
Gulliver, S.R., Ghinea, G.: Impact of captions on hearing impaired and hearing perception of multimedia video clips. In: Proceedings of the IEEE International Conference on Multimedia and Expo (2003)
Google Scholar
Hong, R., Wang, M., Xu, M., Yan, S., Chua, T.-S.: Dynamic captioning: video accessibility enhancement for hearing impairment. In: Proceedings of the 18th ACM International Conference on Multimedia (MM 2010), pp. 421–430. ACM, New York (2010). https://doi.org/10.1145/1873951.1874013
Kafle, S., Huenerfauth, M.: Evaluating the usability of automatically generated captions for people who are deaf or hard of hearing. In: Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2017), pp. 165–174. ACM, New York (2017). https://doi.org/10.1145/3132525.3132542
Kushalnagar, R.S., Lasecki, W.S., Bigham, J.P.: Accessibility evaluation of classroom captions. In: ACM Special Interest Group on Accessible Computing (SIGACCESS), p. 24, vol. 5, no. 3, January 2014. Article 7. https://doi.org/10.1145/2543578
Lasecki, W.S., Miller, C.D., Bigham, J.P.: Warping time for more effective real-time crowdsourcing. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2013), pp. 2033–2036. ACM, New York (2013). https://doi.org/10.1145/2470654.2466269
Lee, D.G., Fels, D.I., Udo, J.P.: Emotive captioning. Comput. Entertain. 5(2), 15 (2007). https://doi.org/10.1145/1279540.1279551. Article 11
Article Google Scholar
Nam, S., Fels, D.I., Chignell, M.H.: Modeling closed captioning subjective quality assessment by deaf and hard of hearing viewers. Proc. IEEE Trans. Comput. Soc. Syst. 7(3), 621–631 (2020). https://doi.org/10.1109/TCSS.2020.2972399
Article Google Scholar
NIDCD: National Institute of Deafness and Other Communication Disorder: Captions for Deaf and Hard-of-Hearing Viewers (2017). https://www.nidcd.nih.gov/health/captions-deaf-and-hard-hearing-viewers
Ofcom: measuring live subtitling quality, UK (2015). https://www.nidcd.nih.gov/health/captions-deaf-and-hard-hearing-viewers
Oskar Olofsson: Detecting Unsynchronized Audio and Subtitles Using Machine Learning, Dissertation (2019). http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-261414
Press Release: World-first approach to reduce latency in live captioning. Ericsson (2016). https://www.ericsson.com/en/press-releases/2016/6/world-first-approach-to-reduce-latency-in-live-captioning
Romero-Fresco, P., Pérez, J.M.: Accuracy rate in live subtitling: the NER model. In: Audiovisual Translation in a Global Context. Palgrave Macmillan, London (2015)
Google Scholar
Sandford, J.: The impact of subtitle display rate on enjoyment under normal television viewing conditions. In: Proceedings of IET Conference Proceedings (2015). https://doi.org/10.1049/ibc.2015.0018
Strelcyk, O., Singh, G.: TV listening and hearing aids. PLoS ONE 13(6), e0200083 (2018). https://doi.org/10.1371/journal.pone.0200083
Article Google Scholar
Vy, Q.V., Fels, D.I.: Using placement and name for speaker identification in captioning. In: Miesenberger, K., Klaus, J., Zagler, W., Karshmer, A. (eds.) ICCHP 2010. LNCS, vol. 6179, pp. 247–254. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14097-6_40
Chapter Google Scholar
Waller, J.M., Kushalnagar, R.S.: Evaluation of automatic caption segmentation. In: Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2016), pp. 331–332. ACM, New York (2016). https://doi.org/10.1145/2982142.2982205
Wehrmeyer, J.: Eye-tracking Deaf and hearing viewing of sign language interpreted news broadcasts. J. Eye Move. Res. (2014). https://core.ac.uk/download/pdf/158976673.pdf
Zedan, I.A., Elsayed, K.M., Emary, E.: Caption detection, localization and type recognition in Arabic news video. In: Proceedings of the 10th International Conference on Informatics and Systems (INFOS 2016), pp. 114–120. ACM, New York (2016). https://doi.org/10.1145/2908446.2908472
Zhang, Z., Wang, C., Wang, Y.: Video-based face recognition: state of the art. In: Sun, Z., Lai, J., Chen, X., Tan, T. (eds.) CCBR 2011. LNCS, vol. 7098, pp. 1–9. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25449-9_1
Chapter Google Scholar
Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: The Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Brown, A., et al.: Dynamic subtitles: the user experience. In: Proceedings of the ACM International Conference on Interactive Experiences for TV and Online Video (TVX 2015), pp. 103–112. Association for Computing Machinery, New York (2015). https://doi.org/10.1145/2745197.2745204
English-Language Working Group: Closed Captioning Standards and Protocol for Canadian English Language Television Programming Services (2008). https://www.cab-acr.ca/english/social/captioning/captioning.pdf. Accessed 19 Nov 2020

Download references

Author information

Authors and Affiliations

Rochester Institute of Technology, Rochester, NY, USA
Akhter Al Amin, Saad Hassan & Matt Huenerfauth

Authors

Akhter Al Amin
View author publications
You can also search for this author in PubMed Google Scholar
Saad Hassan
View author publications
You can also search for this author in PubMed Google Scholar
Matt Huenerfauth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matt Huenerfauth .

Editor information

Editors and Affiliations

Foundation for Research and Technology – Hellas (FORTH), Heraklion, Crete, Greece
Margherita Antona
University of Crete and Foundation for Research and Technology – Hellas (FORTH), Heraklion, Crete, Greece
Constantine Stephanidis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Amin, A.A., Hassan, S., Huenerfauth, M. (2021). Effect of Occlusion on Deaf and Hard of Hearing Users’ Perception of Captioned Video Quality. In: Antona, M., Stephanidis, C. (eds) Universal Access in Human-Computer Interaction. Access to Media, Learning and Assistive Environments. HCII 2021. Lecture Notes in Computer Science(), vol 12769. Springer, Cham. https://doi.org/10.1007/978-3-030-78095-1_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-78095-1_16
Published: 03 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78094-4
Online ISBN: 978-3-030-78095-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Effect of Occlusion on Deaf and Hard of Hearing Users’ Perception of Captioned Video Quality