Speech Processing for Audio Indexing

Lamel, Lori; Gauvain, Jean-Luc

doi:10.1007/978-3-540-85287-2_2

Lori Lamel² &
Jean-Luc Gauvain²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5221))

Included in the following conference series:

International Conference on Natural Language Processing

1510 Accesses
22 Citations

Abstract

This paper addresses some of the recent trends in speech processing, with a focus on speech-to-text transcription as a means to facilitate access to multimedia information in a multilingual context. A brief overview of automatic speech recognition is given along with indicative performance measures for a range of tasks. Enriched transcriptions, that is enhancing the automatic word transcripts with meta-data derived from the audio data is discussed, followed by some hightlights of recent progress and remaining challenges in speech recognition.

This work has been partially financed under the GALE program of the Defense Advanced Research Projects Agency, Contract No. HR0011-06-C-0022 and by OSEO under the Quaero program.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

International Workshop on Spoken Languages Technologies for Under-resourced languages, SLTU Hanoi, (May 2008), http://www.mica.edu.vn/sltu
Schultz, T., Kirchhoff, K. (eds.): Multilingual Speech Processing. Elsevier, Amsterdam (2006)
Google Scholar
Bourlard, H., Furui, S., Morgan, N., Strik, H. (eds.): Modeling pronunciation variation for automatic speech recognition.In: Speech Communication, vol. 29(2-4) (November 1999) (Special issue)
Google Scholar
Fosler-Lussier, E., Byrne, W., Jurafsky, D. (eds.): Pronunciation Modeling and Lexicon Adaptation.In: Speech communication, vol. 46(2) (June 2005) (Special issue)
Google Scholar
Adda-Decker, M., Lamel, L.: Pronunciation variants across system configuration, language and speaking style. Speech Communication 29(2-4), 83–98 (1999)
Article Google Scholar
Aubert, X.L.: An overview of decoding techniques for large vocabulary continuous speech recognition. Computer Speech & Language 16(1), 89–114 (2002)
Article Google Scholar
Bahl, L.R., Baker, J.K., Cohen, P.S., Dixon, N.R., Jelinek, F., Mercer, R.L., Silverman, H.F.: Preliminary results on the performance of a system for the automatic recognition of continuous speech. In: IEEE ICASSP-1976, Philadelphia (April 1976)
Google Scholar
Barras, C., Zhu, X., Meignier, S., Gauvain, J.L.: Multistage speaker diarization of broadcast news. IEEE Transactions on Audio, Speech and Language Processing 14(5), 1505–1512 (2006)
Article Google Scholar
Bulyko, I., Ostendorf, M., Stolcke, A.: Gtting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures. In: Hearst, M., Ostendorf, M. (eds.) HLT-NAACL 2003, Edmonton, March 2003, vol. 2, pp. 7–9 (2003)
Google Scholar
Campbell, J.: Speaker Recognition: A Tutorial. Proc. of the IEEE 85(9) (September 1997)
Google Scholar
Deshmukh, N., Duncan, R., Ganapathiraju, A., Picone, J.: Benchmarking Human Performance for Continuous Speech Recognition. In: Fourth International Conference on Spoken Language Processing, Philadelphia, October 1996, vol. 1(10) (1996)
Google Scholar
Gauvain, J.L., Lamel, L., Adda, G.: The LIMSI Broadcast News Transcription System. Speech Communication 37(1-2), 89–108 (2002)
Article MATH Google Scholar
Hermansky, H., Sharma, S.: TRAPs - classifiers of TempoRAl Patterns. In: ICSLP 1998, Sydney (November 1998)
Google Scholar
Jelinek, F.: Continuous Speech Recognition by Statistical Methods. Proc. of the IEEE 64(4), 532–556 (1976)
Article Google Scholar
Katz, S.M.: Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer. IEEE Trans. Acoustics, Speech & Signal Processing ASSP-35(3), 400–401 (1987)
Article Google Scholar
Kemp, T., Waibel, A.: Unsupervised Training of a Speech Recognizer: Recent Experiments. In: ESCA Eurospeech 1999, Budapest, Hungary, September 1999, vol. 6, pp. 2725–2728 (1999)
Google Scholar
Kimball, O., Kao, C.L., Iyer, R., Arvizo, T., Makhoul, J.: Using Quick Transcriptions to Improve Conversational Speech Models. In: ICSLP 2004, Jeju, (October 2004)
Google Scholar
Lamel, L., Gauvain, J.L., Adda, G., Barras, C., Bilinski, E., Galibert, O., Pujol, A., Schwenk, H., Zhu, X.: The LIMSI 2006 TC-STAR EPPS Transcription Systems. In: ICASSP, Honolulu, April 2007, pp. 997–1000 (2007)
Google Scholar
Lamel, L., Gauvain, J.L.: Speech Recognition. In: Mitkov, R. (ed.) Chapter 16 in OUP Handbook on Computational Linguistics, pp. 305–322. Oxford University Press, Oxford (2003)
Google Scholar
Lamel, L., Gauvain, J.L., Adda, G.: Lightly Supervised and Unsupervised Acoustic Model Training. Computer, Speech & Language 16(1), 115–229 (2002)
Article Google Scholar
Lamel, L., Gauvain, J.L., Adda, G., Adda-Decker, M., Canseco, L., Chen, L., Galibert, O., Messaoudi, A., Schwenk, H.: Speech Transcription in Multiple Languages. In: IEEE ICASSP 2004, Montreal (April 2004)
Google Scholar
Lippmann, R.P.: Speech recognition by machines and humans. Speech Communication 22(1), 1–16
Google Scholar
Pellegrini, T., Lamel, L.: Experimental detection of vowel pronunciation variants in Amharic. In: LREC 2006, Genoa (2006)
Google Scholar
Przybocki, M.: Technology Advancements have Required NIST Evaluations to Change Data and Tasks - and now Metrics. In: Presented at the ELRA Workshop on Evaluation, LREC 2008, Marrakesh (2008)
Google Scholar
Stolcke, A., Chen, B., et al.: Recent innovations in speech-to-text transcription at SRI-ICSI-UW. IEEE Transactions on Audio, Speech, and Language Processing 14(5), 1729–1744 (2006)
Article Google Scholar
van Leeuwen, D.A., van den Berg, L.G., Steeneken, H.J.M.: Human Benchmarks for Speaker Independent Large Vocabulary Recognition Performance. In: ESCA Eurospeech 1995, Madrid, pp. 1461–1464 (September 1995)
Google Scholar
Rosenfeld, R.: Two decades of statistical language modeling: where do we go from here? Proc. IEEE 88(8), 1270–1278 (1999)
Article Google Scholar
Schwenk, H.: Continuous space language models. Computer Speech and Language 21, 492–518 (2007)
Article Google Scholar
Van Thong, J.M., Goddeau, D., Litvinova, A., Logan, B., Moreno, P., Swain, M.: SpeechBot: a speech recognition based audio indexing system for the web. In: RIAO 2000 Content-Based Multimedia Information Access, Paris, pp. 106–115 (April 2000)
Google Scholar
Zavaliagkos, G., Colthurst, T.: Utilizing Untranscribed Training Data to Improve Performance. In: DARPA Broadcast News Transcription & Understanding Wshop (November 1998)
Google Scholar
Zhu, X., Barras, C., Lamel, L., Gauvain, J.L.: Speaker Diarization: from Broadcast News to Lectures. In: Renals, S., Bengio, S., Fiscus, J. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 396–406. Springer, Heidelberg (2006)
Chapter Google Scholar
Zhu, Q., Stolcke, A., Chen, B.Y., Morgan, N.: Using MLP features in SRI’s conversational speech recognition system. Interspeech 2005, 2141-2144, Lisbon (2005)
Google Scholar
Zissman, M.A.: Comparison of Four Approaches to Automatic Language Identification of Telephone Speech. IEEE Trans. Speech and Audio Proc. 4(1), 31–44 (1996)
Article Google Scholar

Download references

Author information

Authors and Affiliations

LIMSI-CNRS, BP 133, 91403, Orsay Cedex, France
Lori Lamel & Jean-Luc Gauvain

Authors

Lori Lamel
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Luc Gauvain
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Chalmers University of Technology, 41296, Göteborg, Sweden
Bengt Nordström & Aarne Ranta &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lamel, L., Gauvain, JL. (2008). Speech Processing for Audio Indexing. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-85287-2_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85286-5
Online ISBN: 978-3-540-85287-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics