Advertisement

Language Model Adaptation for Lecture Transcription by Document Retrieval

  • A Martínez-Villaronga
  • M. A del Agua
  • J. A Silvestre-Cerdà
  • J Andrés-Ferrer
  • A Juan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8854)

Abstract

With the spread of MOOCs and video lecture repositories it is more important than ever to have accurate methods for automatically transcribing video lectures. In this work, we propose a simple yet effective language model adaptation technique based on document retrieval from the web. This technique is combined with slide adaptation, and compared against a strong baseline language model and a stronger slide-adapted baseline. These adaptation techniques are compared within two different acoustic models: a standard HMM model and the CD-DNN-HMM model. The proposed method obtains improvements on WER of up to 14% relative with respect to a competitive baseline as well as outperforming slide adaptation.

Keywords

language model adaptation video lectures document retrieval 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    coursera.org: Take the World’s Best Courses, Online, For Free, http://www.coursera.org/
  2. 2.
    poliMedia: Videolectures from the “Universitat Politècnica de València, http://polimedia.upv.es/catalogo/
  3. 3.
    SuperLectures: We take full care of your event video recordings, http://www.superlectures.com
  4. 4.
  5. 5.
    transLectures-UPV Toolkit (TLK) for Automatic Speech Recognition, http://translectures.eu/tlk
  6. 6.
    Udacity: Learn, Think, Do, http://www.udacity.com/
  7. 7.
    Videolectures.NET: Exchange Ideas and Share Knowledge, http://www.videolectures.net/
  8. 8.
    del-Agua, M.A., Giménez, A., Serrano, N., Andrés-Ferrer, J., Civera, J., Sanchis, A., Juan, A.: The translectures-UPV toolkit. In: Navarro Mesa, J.L., Giménez, A.O., Teixeira, A. (eds.) IberSPEECH 2014. LNCS (LNAI), vol. 8854, pp. 269–278. Springer, Heidelberg (2014)Google Scholar
  9. 9.
    Chang, P.C., Shan Lee, L.: Improved language model adaptation using existing and derived external resources. In: Proc. of ASRU, pp. 531–536 (2003)Google Scholar
  10. 10.
    Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Computer Speech & Language 13(4), 359–393 (1999)CrossRefGoogle Scholar
  11. 11.
    Jelinek, F., Mercer, R.L.: Interpolated Estimation of Markov Source Parameters from Sparse Data. In: Proc. of the Workshop on Pattern Recognition in Practice, pp. 381–397 (1980)Google Scholar
  12. 12.
    Ketterl, M., Schulte, O.A., Hochman, A.: Opencast matterhorn: A community-driven open source solution for creation, management and distribution of audio and video in academia. In: Proc. of ISM, pp. 687–692 (2009)Google Scholar
  13. 13.
    Kneser, R., Ney, H.: Improved Backing-off for M-gram Language Modeling. In: Proc. of ICASSP, pp. 181–184 (1995)Google Scholar
  14. 14.
    Lecorv, G., Gravier, G., Sbillot, P.: An unsupervised web-based topic language model adaptation method. In: Proc. of ICASSP 2008, pp. 5081–5084 (2008)Google Scholar
  15. 15.
    Martínez-Villaronga, A., del Agua, M.A., Andrés-Ferrer, J., Juan, A.: Language model adaptation for video lectures transcription. In: Proc. of ICASSP, pp. 8450–8454 (2013)Google Scholar
  16. 16.
    Munteanu, C., Penn, G., Baecker, R.: Web-based language modelling for automatic lecture transcription. In: Proc. of INTERSPEECH, pp. 2353–2356 (2007)Google Scholar
  17. 17.
    Rogina, I., Schaaf, T.: Lecture and presentation tracking in an intelligent meeting room. In: Proc of ICMI, pp. 47–52 (2002)Google Scholar
  18. 18.
    Schlippe, T., Gren, L., Vu, N.T., Schultz, T.: Unsupervised language model adaptation for automatic speech recognition of broadcast news using web 2.0, pp. 2698–2702 (2013)Google Scholar
  19. 19.
    Seide, F., Li, G., Chen, X., Yu, D.: Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: Proc. of ASRU, pp. 24–29 (2011)Google Scholar
  20. 20.
    Silvestre, J.A., et al.: Translectures. In: Proc. of IberSPEECH 2012, pp. 345–351 (2012)Google Scholar
  21. 21.
    Smith, R.: An overview of the tesseract ocr engine. In: Proc. of ICDAR 2007, pp. 629–633 (2007)Google Scholar
  22. 22.
    Stolcke, A.: SRILM – an extensible language modeling toolkit. In: Proc. of ICSLP, pp. 901–904 (2002)Google Scholar
  23. 23.
    Tsiartas, A., Georgiou, P., Narayanan, S.: Language model adaptation using www documents obtained by utterance-based queries. In: Proc. of ICASSP, pp. 5406–5409 (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • A Martínez-Villaronga
    • 1
  • M. A del Agua
    • 1
  • J. A Silvestre-Cerdà
    • 1
  • J Andrés-Ferrer
    • 1
  • A Juan
    • 1
  1. 1.MLLP, DSICUniversitat Politècnica de ValènciaValènciaSpain

Personalised recommendations