The LIMSI RT06s Lecture Transcription System

Lamel, Lori; Bilinski, Eric; Adda, Gilles; Gauvain, Jean-Luc; Schwenk, Holger

doi:10.1007/11965152_40

Lori Lamel¹⁹,
Eric Bilinski¹⁹,
Gilles Adda¹⁹,
Jean-Luc Gauvain¹⁹ &
…
Holger Schwenk¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4299))

Included in the following conference series:

International Workshop on Machine Learning for Multimodal Interaction

746 Accesses
7 Citations

Abstract

This paper describes recent research carried out in the context of the FP6 Integrated Project Chil in developing a system to automatically transcribe lectures and presentations. Widely available corpora were used to train both the acoustic and language models, since only a small amount of Chil data was available for system development. Acoustic model training made use of the transcribed portion of the TED corpus of Eurospeech recordings, as well as the ICSI, ISL, and NIST meeting corpora. For language model training, text materials were extracted from a variety of on-line conference proceedings. Experimental results are reported for close-talking and far-field microphones on development and evaluation data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

The Translanguage English Database (TED) Transcripts, LDC catalog number LDC2002T03, isbn 1-58563-202-3
Google Scholar
Bengio, Y., Ducharme, R.: A neural probabilistic language model. Advances in Neural Information Processing Systems (NIPS) 13, 933–938 (2001)
Google Scholar
Burger, S., MacLaran, V., Yu, H.: The ISL Meeting Corpus: The Impact of Meeting Type on Speech Style. In: ICSLP 2002, Denver (September 2002) (LDC2004S05, LDC2004E04, LDC2004E05)
Google Scholar
Fiscus, J.G.: A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER). In: Proc. ASRU 1997, Santa Barbara, pp. 347–354 (December 1997)
Google Scholar
Garofolo, J.S., Laprun, C.D., Michel, M., Stanford, V.M., Tabassi, E.: The NIST Meeting Room Pilot Corpus. In: LREC 2004, Lisbon (May 2004) (LDC2004S09, LDC2004T13)
Google Scholar
Gauvain, J.L., Lamel, L., Adda, G.: The Limsi Broadcast News Transcription System. Speech Communication 37(1-2), 89–108 (2002)
Article MATH Google Scholar
Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., Wooters, C.: The ICSI Meeting Corpus. In: ICASSP 2003, Hong Kong (April 2003) (LDC2004S02, LDC2004T04)
Google Scholar
Lamel, L., Adda, G., Bilinski, E., Gauvain, J.L.: Transcribing Lectures and Seminars. In: Proc. ISCA Eurospeech 2005, Lisbon (September 2005)
Google Scholar
Lamel, L.F., Schiel, F., Fourcin, A., Mariani, J., Tillmann, H.: The Translanguage English Database TED. In: ICSLP 1994, Yokohama (September 1994) (LDC2002S04)
Google Scholar
Lamel, L., Schwenk, H., Gauvain, J.L., Adda, G., Bilinski, E.: Improvements in Transcribing Lectures and Seminars. In: MLMI 2005 (July 2005)
Google Scholar
Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 9(2), 171–185 (1995)
Article Google Scholar
Macho, D., Padrell, J., Abad, A., Nadeu, C., Hernando, J., McDonough, J., Wolfel, M., Klee, U., Omologo, M., Brutti, A., Svaizer, P., Potamianos, G., Chu, S.: First experiments of automatic speech activity detection, source localization and speech recognition in the CHIL project. In: Workshop on Hands-Free Speech Communication and Microphone Arrays, Rutgers University, Piscataway, NJ (2005)
Google Scholar
Mangu, L., Brill, E., Stolcke, A.: Finding Consensus Among Words: Lattice-Based Word Error Minimization. In: Eurospeech 1999, Budapest, pp. 495–498 (September 1999)
Google Scholar
Schwenk, H.: Efficient training of large neural networks for language modeling. In: IJCNN, pp. 3059–3062 (2004)
Google Scholar
Waibel, A., Steusloff, H., Stiefelhagen, R.: CHIL - Computers in the Human Interaction Loop. In: 5th International Workshop on Image Analysis for Multimedia Interactive Services, Lisbon (April 2004), http://isl.ira.uka.de/chil
Woodland, P.C., Niesler, T., Whittaker, E.: Language Modeling in the HTK Hub5 LVCSR. In: The 1998 Hub5E Workshop (September 1998)
Google Scholar
Zhu, X., Barras, C., Lamel, L., Gauvain, J.-L.: Speaker Diarization: from Broadcast News to Lectures. In: Proc. RT06s (submitted)
Google Scholar
Zhu, X., Barras, C., Meignier, S., Gauvain, J.L.: Combining speaker identification and BIC for speaker diarization. In: Proc. Interspeech 2005, Lisboa, pp. 2441–2444 (September 2005)
Google Scholar
Zhu, X., Leung, C.C., Barras, C., Lamel, L., Gauvain, J.L.: Speech activity detection and speaker identification for CHIL. In: Proc. MLMI 2005, Edinburgh (July 2005)
Google Scholar

Download references

Author information

Authors and Affiliations

LIMSI-CNRS, BP 133, 91403 Orsay Cedex, France
Lori Lamel, Eric Bilinski, Gilles Adda, Jean-Luc Gauvain & Holger Schwenk

Authors

Lori Lamel
View author publications
You can also search for this author in PubMed Google Scholar
Eric Bilinski
View author publications
You can also search for this author in PubMed Google Scholar
Gilles Adda
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Luc Gauvain
View author publications
You can also search for this author in PubMed Google Scholar
Holger Schwenk
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, Scotland
Steve Renals
IDIAP Research Institute, Martigny, Switzerland
Samy Bengio
National Institute Of Standards and Technology, 100 Bureau Drive Stop 8940, Gaithersburg, MD, 20899
Jonathan G. Fiscus

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lamel, L., Bilinski, E., Adda, G., Gauvain, JL., Schwenk, H. (2006). The LIMSI RT06s Lecture Transcription System. In: Renals, S., Bengio, S., Fiscus, J.G. (eds) Machine Learning for Multimodal Interaction. MLMI 2006. Lecture Notes in Computer Science, vol 4299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11965152_40

Download citation

DOI: https://doi.org/10.1007/11965152_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69267-6
Online ISBN: 978-3-540-69268-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics