Automated Speech and Audio Analysis for Semantic Access to Multimedia

de Jong, Franciska; Ordelman, Roeland; Huijbregts, Marijn

doi:10.1007/11930334_18

Franciska de Jong^20,21,
Roeland Ordelman²⁰ &
Marijn Huijbregts²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4306))

Included in the following conference series:

International Conference on Semantic and Digital Media Technologies

360 Accesses
3 Citations

Abstract

The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute to increased granularity of automatically extracted metadata. A number of techniques will be presented, including the alignment of speech and text resources, large vocabulary speech recognition, key word spotting and speaker classification. The applicability of techniques will be discussed from a media crossing perspective. The added value of the techniques and their potential contribution to the content value chain will be illustrated by the description of two (complementary) demonstrators for browsing broadcast news archives.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Allauzen, A., Gauvain, J.L.: Diachronic vocabulary adaptation for broadcast news transcription. In: InterSpeech, Lisbon (September 2005)
Google Scholar
Auzanne, C., Garofolo, J.S., Fiscus, J.G., Fisher, W.M.: Automatic Language Model Adaptation for Spoken Document Retrieval. In: Proceedings of RIAO 2000, Content-Based Multimedia Information Access, pp. 132–141 (2000)
Google Scholar
Brown, M.G., Foote, J.T., Jones, G.J.F., Sparck Jones, K., Young, S.J.: Automatic Content-based Retrieval of Broadcast News. In: Proceedings of the third ACM international conference on Multimedia, San Francisco, pp. 35–43. ACM Press, New York (1995)
Chapter Google Scholar
Chase, L.: Blame assignment for errors made by large vocabulary speech recognizers. In: Proceedings Eurospeech 1997, Rhodes, Greece, pp. 1563–1566 (1997)
Google Scholar
de Jong, F.M.G., Kraaij, W.: Content reduction for cross-media browsing. In: Saggion, H., Minel, J.-L. (eds.) RANLP workshop Crossing Barriers in Text Summarization Reserach, Borovets, Bulgaria, pp. 64–69 (2005)
Google Scholar
Garofolo, J.S., Auzanne, C.G.P., Voorhees, E.M.: The TREC SDR Track: A Success Story. In: Eighth Text Retrieval Conference, Washington, pp. 107–129 (2000)
Google Scholar
Jourlin, P., Johnson, S.E., Spärck Jones, K., Woodland, P.C.: General Query Expansion Techniques for Spoken Document Retrieval. In: Proc. ESCA Workshop on Extracting Information from Spoken Audio, Cambridge, UK, pp. 8–13 (1999)
Google Scholar
Kraaij, W., van Gent, J., Ekkelenkamp, R., van Leeuwen, D.: Phoneme based spoken document retrieval. In: Proceedings of the fourteenth Twente Workshop on Language Technology TWLT-14, University of Twente, pp. 141–153 (1998)
Google Scholar
Moreno, P.J., Joerg, C., Van Thong, J.-M., Glickman, O.: A Recursive Algorithm for the Forced Alignment of Very Long Audio Segments. In: Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP 1998), Sydney, Australia (1998)
Google Scholar
Oostdijk, N.: The Spoken Dutch Corpus. Overview and first evaluation. In: Gravilidou, M., Carayannis, G., Markantonatou, S., Piperidis, S., Stainhaouer, G. (eds.) Second International Conference on Language Resources and Evaluation, vol. II, pp. 887–894 (2000)
Google Scholar
Ordelman, R.J.F.: Dutch Speech Recognition in Multimedia Information Retrieval. Phd thesis, University of Twente, Enschede, p.268. Taaluitgeverij Neslia Paniculata, Enschede (2003) ISSN: 1381-3617; No 03-56, ISBN: 90-75296-08-8
Google Scholar
Siohan, O., Myrvol, T., Lee, C.: Structural maximum a posteriori linear regression for fast hmm adaptation (2000)
Google Scholar
Smeaton, A.F., Kraaij, W., Over, P.: Trecvid - an overview. In: Proceedings of TRECVID 2003, USA. NIST (2003)
Google Scholar
Spitters, M., Kraaij, W.: Unsupervised clustering in multilingual news streams. In: Proceedings of the LREC 2002 workshop: Event Modelling for Multilingual Document Linking, pp. 42–46 (2002)
Google Scholar
Truong, K.P., van Leeuwen, D.A.: Automatic detection of laughter. In: InterSpeech, Lisbon, September 2005, pp. 485–488 (2005)
Google Scholar
van Leeuwen, D., Huijbregts, M.: The ami speaker diarization system for nist rt06s meeting data. In: NIST 2006 Spring Rich Transcrition Evaluation Workshop, Washington DC, USA (2006)
Google Scholar
Westerveld, T., de Vries, A.P., Ramírez, G.: Surface features in video retrieval. In: Detyniecki, M., Jose, J.M., Nürnberger, A., van Rijsbergen, C.J.‘. (eds.) AMR 2005. LNCS, vol. 3877, pp. 180–190. Springer, Heidelberg (2006)
Chapter Google Scholar
Woodland, P.C., Johnson, S.E., Jourlin, P., Spärck Jones, K.: Effects of Out of Vocabulary Words in Spoken Document Retrieval. In: 2000 ACM SIGIR Conference, pp. 372–374, Athens Greece (2000)
Google Scholar
Yapanel, U., Hansen, J.H.L.: A new perspective on feature extraction for robust in-vehicle speech recognition. In: Proceedings of Eurospeech, pp. 1281–1284 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, University of Twente, P.O. Box 217, 7500, AE, Enschede, The Netherlands
Franciska de Jong, Roeland Ordelman & Marijn Huijbregts
TNO-ICT, Delft, The Netherlands
Franciska de Jong

Authors

Franciska de Jong
View author publications
You can also search for this author in PubMed Google Scholar
Roeland Ordelman
View author publications
You can also search for this author in PubMed Google Scholar
Marijn Huijbregts
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Image, Video and Multimedia Systems Laboratory, School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou Str., 157 80, Athens, Greece
Yannis Avrithis
Informatics and Telematics Institute, Centre for Research and Technology-Hellas, 57001, Thessaloniki, Greece
Yiannis Kompatsiaris
Fachbereich Informatik, Universität Koblenz-Landau, Universitätsstraße 1, 56070, Koblenz, Germany
Steffen Staab
Centre for Digital Video Processing, Adaptive Information Cluster, Dublin City University, Ireland
Noel E. O’Connor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Jong, F., Ordelman, R., Huijbregts, M. (2006). Automated Speech and Audio Analysis for Semantic Access to Multimedia. In: Avrithis, Y., Kompatsiaris, Y., Staab, S., O’Connor, N.E. (eds) Semantic Multimedia. SAMT 2006. Lecture Notes in Computer Science, vol 4306. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11930334_18

Download citation

DOI: https://doi.org/10.1007/11930334_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49335-8
Online ISBN: 978-3-540-49337-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics