Skip to main content

When Worlds Collide: AI-Created, Human-Mediated Video Description Services and the User Experience

  • Conference paper
  • First Online:
HCI International 2021 - Late Breaking Papers: Cognition, Inclusion, Learning, and Culture (HCII 2021)

Abstract

This paper reports on a user-experience study undertaken as part of the H2020 project MeMAD (‘Methods for Managing Audiovisual Data: Combining Automatic Efficiency with Human Accuracy’), in which multimedia content describers from the television and archive industries tested Flow, an online platform, designed to assist the post-editing of automatically generated data, in order to enhance the production of archival descriptions of film content. Our study captured the participant experience using screen recordings, the User Experience Questionnaire (UEQ), a benchmarked interactive media questionnaire and focus group discussions, reporting a broadly positive post-editing environment. Users designated the platform’s role in the collation of machine-generated content descriptions, transcripts, named-entities (location, persons, organisations) and translated text as helpful and likely to enhance creative outputs in the longer term. Suggestions for improving the platform included the addition of specialist vocabulary functionality, shot-type detection, film-topic labelling, and automatic music recognition. The limitations of the study are, most notably, the current level of accuracy achieved in computer vision outputs (i.e. automated video descriptions of film material) which has been hindered by the lack of reliable and accurate training data, and the need for a more narratively oriented interface which allows describers to develop their storytelling techniques and build descriptions which fit within a platform-hosted storyboarding functionality. While this work has value in its own right, it can also be regarded as paving the way for the future (semi)automation of audio descriptions to assist audiences experiencing sight impairment, cognitive accessibility difficulties or for whom ‘visionless’ multimedia consumption is their preferred option.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Braun, S., Starr, K.: Finding the right words: investigating machine-generated video description quality using a human-derived corpus-based approach. J. Audiov. Transl. 2(2), 11–25 (2019). https://doi.org/10.47476/jat.v2i2.103

    Article  Google Scholar 

  2. Starr, K., Braun, S., Delfani, J.: Taking a cue from the human: linguistic and visual prompts for the automatic sequencing of multimodal narrative. J. Audiov. Transl. 3(2), 140–169 (2020). https://doi.org/10.47476/jat.v3i2.2020.138

    Article  Google Scholar 

  3. Huang, T.H., et al.: Visual storytelling. In: Proceedings of NAACL-HLT, San Diego, California, 12–17 June, pp. 1233–1239 (2016). https://doi.org/10.18653/v1/N16-1147

  4. Park, J.S., Rohrbach, M., Darrell, T., Rohrbach, A.: Adversarial inference for multi-sentence video description. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6598–6608 (2019). https://doi.org/10.1109/CVPR.2019.00676

  5. Laaksonen, J., Guo, Z.: PicSOM experiments in TRECVID 2020. In: TRECVID 2020 Workshop, 17–19 November, Online Conference (2020)

    Google Scholar 

  6. Limecraft homepage. https://www.limecraft.com/. Accessed 09 June 2021

  7. Laugwitz, B., Held, T., Schrepp, M.: Construction and evaluation of a user experience questionnaire. In: Holzinger, A. (ed.) USAB 2008. LNCS, vol. 5298, pp. 63–76. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89350-9_6

    Chapter  Google Scholar 

  8. Lisena, P., Laaksonen, J., Troncy, R.: FaceRec: an interactive framework for face recognition in video archives. In: 2nd International Workshop on Data-driven Personalisation of Television (DataTV) Collocated with the ACM International Conference on Interactive Media Experiences (IMX 2021), 21–23 June 2021, forthcoming. https://doi.org/10.5281/zenodo.4764633

  9. Harrando, I., Troncy, R.: Named entity recognition as graph classification. In: Verborgh, R., et al. (eds.) ESWC 2021. LNCS, vol. 12739, pp. 103–108. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-80418-3_19

    Chapter  Google Scholar 

  10. Porjazovski, D., Leinonen, J., Kurimo, M.: Named entity recognition for spoken finnish. In: Proceedings of 2nd International Workshop on AI for Smart TV Content Production Access and Delivery (AI4TV), pp. 25–29 (2020). https://doi.org/10.1145/3422839.3423066

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sabine Braun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Braun, S. et al. (2021). When Worlds Collide: AI-Created, Human-Mediated Video Description Services and the User Experience. In: Stephanidis, C., et al. HCI International 2021 - Late Breaking Papers: Cognition, Inclusion, Learning, and Culture. HCII 2021. Lecture Notes in Computer Science(), vol 13096. Springer, Cham. https://doi.org/10.1007/978-3-030-90328-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-90328-2_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-90327-5

  • Online ISBN: 978-3-030-90328-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics