Crowdsourced Video Subtitling with Adaptation Based on User-Corrected Lattices

Miranda, João; Astudillo, Ramón F.; Costa, Ângela; Silva, André; Silva, Hugo; Graça, João; Raj, Bhiksha

doi:10.1007/978-3-319-49169-1_14

João Miranda^21,22,
Ramón F. Astudillo^22,23,
Ângela Costa²²,
André Silva²³,
Hugo Silva²³,
João Graça²³ &
…
Bhiksha Raj²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10077))

Included in the following conference series:

International Conference on Advances in Speech and Language Technologies for Iberian Languages

662 Accesses

Abstract

This paper investigates an approach for fast hybrid human and machine video subtitling based on lattice disambiguation and posterior model adaptation. The approach aims at correcting Automatic Speech Recognition (ASR) transcriptions requiring minimal effort from the user and facilitating user corrections from smart-phone devices. Our approach is based on three key concepts. Firstly, only a portion of the data is sent to the user for correction. Secondly, user action is limited to selecting from a fixed set of options extracted from the ASR word lattice. Thirdly, user feedback is used to update the ASR parameters and further enhance performance. To investigate the potential and limitations of this approach, we carry out experiments employing simulated and real user corrections of TED talks videos. Simulated corrections include both the true reference and the best combination of the options shown to the user. Real corrections are obtained from 30 editors through a special purpose web-interface displaying the options for small video segments. We analyze the fixed option approach and the trade-off between model adaptation and increasing the amount of corrected data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://unbabel.com/.

References

TED Talks. http://www.ted.com/talks
Audhkhasi, K., Georgiou, P.G., Narayanan, S.S.: Reliability-weighted acoustic model adaptation using crowd sourced transcriptions. In: INTERSPEECH, pp. 3045–3048 (2011)
Google Scholar
Audhkhasi, K., Georgiou, P.G., Narayanan, S.S.: Analyzing quality of crowd-sourced speech transcriptions of noisy audio for acoustic model adaptation. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4137–4140. IEEE (2012)
Google Scholar
Bertoldi, N., Simianer, P., Cettolo, M., Wäschle, K., Federico, M., Riezler, S.: Online adaptation to post-edits for phrase-based statistical machine translation. Mach. Translation 28(3–4), 309–339 (2014)
Article Google Scholar
Buhrmester, M., Kwang, T., Gosling, S.D.: Amazon’s mechanical turk: a new source of inexpensive, yet high-quality, data? Perspect. Psychol. Sci. 6(1), 3–5 (2011)
Article Google Scholar
Eskenazi, M., Levow, G.A., Meng, H., Parent, G., Suendermann, D.: Crowdsourcing for Speech Processing: Applications to Data Collection Transcription and Assessment. Wiley, Hoboken (2013)
Book Google Scholar
Federico, M., Bentivogli, L., Paul, M., Stüker, S.: Overview of the IWSLT 2012 evaluation campaign. In: IWSLT, pp. 11–27 (2011)
Google Scholar
Fiscus, J.: A post-processing system to yield reduced word error rates: recognizer output voting error reduction (ROVER). In: Proceedings of the ASRU, Santa Barbara, USA (1997)
Google Scholar
Green, S., Heer, J., Manning, C.D.: The efficacy of human post-editing for language translation. In: Proceedings of the SIGCHI Conference On Human Factors In Computing Systems, pp. 439–448. ACM (2013)
Google Scholar
Hakkani-Tür, D.Z., Béchet, F., Riccardi, G., Tür, G.: Beyond ASR 1-best: using word confusion networks in spoken language understanding. Comput. Speech Lang. 20(4), 495–514 (2006)
Article Google Scholar
Kuhn, R., De Mori, R.: A cache-based natural language model for speech recognition. IEEE Trans. Pattern Anal. Mach. Intell. 12(6), 570–583 (1990)
Article Google Scholar
Lavie, A., Denkowski, M., Dyer, C.: Learning from post-editing: online model adaptation for statistical machine translation. In: EACL 2014 (2014)
Google Scholar
Meinedo, H., Abad, A., Pellegrini, T., Trancoso, I., Neto, J.A.: The L2F broadcast news speech recognition system. In: Proceedings of Fala, pp. 93–96 (2010)
Google Scholar
Mühlberger, G., Zelger, J., Sagmeister, D.: User-driven correction of OCR errors: combing crowdsourcing and information retrieval technology. In: Digital Access to Textual Cultural Heritage (DATeCH 2014), pp. 53–56, Madrid, Spain (2014)
Google Scholar
Neto, J.a., Almeida, L., Hochberg, M., Martins, C., Nunes, L., Renals, S., Robinson, T.: Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system (1995)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. Technical report, DTIC Document (1985)
Google Scholar
Vertanen, K., MacKay, D.J.: Speech dasher: fast writing using speech and gaze. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 595–598. ACM (2010)
Google Scholar
Wald, M.: Crowdsourcing correction of speech recognition captioning errors. In: W4A. ACM (2011)
Google Scholar
Yao, K., Yu, D., Seide, F., Su, H., Deng, L., Gong, Y.: Adaptation of context-dependent deep neural networks for automatic speech recognition. In: 2012 IEEE Spoken Language Technology Workshop (SLT), pp. 366–369. IEEE (2012)
Google Scholar

Download references

Acknowledgements

This work has been partially supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UID/CEC/50021/2013, the grant number SFRH/BPD/68428/2010 and by the TRATAHI Portugal-CMU Project CMUP-EPB/TIC/0065/2013. Ângela Costa was supported by a Ph.D. fellowship from Fundação para a Ciência e Tecnologia (SFRH/BD/85737/2012).

Author information

Authors and Affiliations

VoiceInteraction, São Paulo, Brazil
João Miranda
INESC-ID/Instituto Superior Técnico, Lisboa, Portugal
João Miranda, Ramón F. Astudillo & Ângela Costa
Unbabel Inc., San Francisco, USA
Ramón F. Astudillo, André Silva, Hugo Silva & João Graça
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
Bhiksha Raj

Authors

João Miranda
View author publications
You can also search for this author in PubMed Google Scholar
Ramón F. Astudillo
View author publications
You can also search for this author in PubMed Google Scholar
Ângela Costa
View author publications
You can also search for this author in PubMed Google Scholar
André Silva
View author publications
You can also search for this author in PubMed Google Scholar
Hugo Silva
View author publications
You can also search for this author in PubMed Google Scholar
João Graça
View author publications
You can also search for this author in PubMed Google Scholar
Bhiksha Raj
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ramón F. Astudillo .

Editor information

Editors and Affiliations

INESC-ID/IST, Universidade de Lisboa, Lisbon, Portugal
Alberto Abad
I3A/University of Zaragoza, Zaragoza, Spain
Alfonso Ortega
DETI/IEETA, University of Aveiro, Aveiro, Portugal
António Teixeira
AtlantTIC Research Center, Universidad de Vigo, Vigo, Spain
Carmen García Mateo
Universitat Politècnica de València, Valencia, Spain
Carlos D. Martínez Hinarejos
University of Coimbra, Coimbra, Portugal
Fernando Perdigão
INESC-ID/ISCTE-IUL, Lisbon, Portugal
Fernando Batista
INESC-ID/IST, Universidade de Lisboa, Lisbon, Portugal
Nuno Mamede

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Miranda, J. et al. (2016). Crowdsourced Video Subtitling with Adaptation Based on User-Corrected Lattices. In: Abad, A., et al. Advances in Speech and Language Technologies for Iberian Languages. IberSPEECH 2016. Lecture Notes in Computer Science(), vol 10077. Springer, Cham. https://doi.org/10.1007/978-3-319-49169-1_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-49169-1_14
Published: 04 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49168-4
Online ISBN: 978-3-319-49169-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics