Indexing and Retrieval of Audio: A Survey

Lu, Goujun

doi:10.1023/A:1012491016871

Indexing and Retrieval of Audio: A Survey

Published: December 2001

Volume 15, pages 269–290, (2001)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Goujun Lu¹

480 Accesses
36 Citations
Explore all metrics

Abstract

With more and more audio being captured and stored, there is a growing need for automatic audio indexing and retrieval techniques that can retrieve relevant audio pieces quickly on demand. This paper provides a comprehensive survey of audio indexing and retrieval techniques. We first describe main audio characteristics and features and discuss techniques for classifying audio into speech and music based on these features. Indexing and retrieval of speech and music is then described separately. Finally, significance of audio in multimedia indexing and retrieval is discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

P. Aigrain, H. Zhang, and D. Petkovic, “Content-based representation and retrieval of visual media: A stateof the-art review,” Journal of Multimedia Tools and Applications, Vol. 3, pp. 179–202, 1996.
Google Scholar
J.R. Bach, “The virage image search engine: An open framework for image management,” in Proceedings of Conference on Storage and Retrieval for Image and Video Databases IV (SPIE Proceedings Vol. 2670), 1–2 Feb., San Jose, California, 1996, pp. 76–87.
A.S. Bregman, Auditory Scene Analysis—The Perception Organization of Sound, The MIT Press: Cambridge, MA, 1990.
Google Scholar
R. Comerford, J. Makhoul, and R. Schwartz, “The voice of the computer is heard in the land (and it listens too!),” IEEE Spectrum, Vol. 34, No. 12, pp. 39–47, 1997.
Google Scholar
V. Digalakis, S. Berkowitz, E. Bocchieri, C. Boulis, W. Byrne, H. Collier, A. Corduneanu, A. Kannan, S. Khudanpur, and A. Sankar, “Rapid speech recognizer adaptation to new speakers,” in 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing, March 15–19, Phoenix, Arizona, Vol. II, 1999, pp. 765–768.
Google Scholar
J.T. Foote, “A similarity measure for automatic audio classification,” in Pro. AAAI 1997 Spring Symposium on Intelligent Integration and Use of Text, Image, Video and Audio Corpora, Stanford, Palo Alto, CA, Mar. 1997.
W.B. Frakes and R. Baeza-Yates (Eds.), Information Retrieval: Data structures and Algorithms, Prentice Hall: Englewood Cliffs, NJ, 1992.
Google Scholar
A. Ghias et al., “Query by humming—Musical information retrieval in an audio database,” in Proceedings of ACM Multimedia 95, November 5–9, San Francisco, California, 1995.
S.J. Gibbs and D.C. Tsichritzis, Multimedia Programming—Objects, Environments and Frameworks, Addison-Wesley Publishing Company: Reading, MA, 1995.
Google Scholar
A.G. Hauptmann, M.J. Witbrock, A.I. Rudnicky, and S. Reed, “Speech for multimedia information retrieval,” in UIST-95 Proceedings of the User Interface Software Technology Conference, Pittsburgh, Nov. 1995.
R.L. Klevans and R.D. Rodman, Voice Recognition, Artech House: Boston, MA, 1997.
Google Scholar
G. Lu and T. Hankinson, “A technique towards automatic audio classification and retrieval,” in Proceedings of International Conference on Signal Processing, Oct. 12–16, Beijing, China, 1998.
P.A. Lynn and W. Fuerst, Introductory Digital Signal Processing with Computer Applications, John Wiley & Sons: New York, 1989.
Google Scholar
K.D. Martin, “Automatic transcription of simple polyphonic music: Robust front end processing,” M.I.T. Media Laboratory Perceptual Computing Section Technical Report No. 399, 1996, available at http://sound.media.mit.edu/papers.html.
R.J. McNab et al., “The New Zealand digital library MELody inDex,” D-Lib Magazine, May 1997, available at http://mirrored.ukoln.ac.uk/lis-journals/dlib/dlib/dlib/may97/meldex/05written.html.
K. Minami et al., “Enhanced video handling based on audio analysis,” in Proceedings of IEEE International Conference on Multimedia Computing and Systems, June 3–6, Ottawa, Canada, 1997, pp. 219–226.
B.C.J. Moore, An Introduction to Psychology of Hearing, Academic Press: New York, 1997.
Google Scholar
D.P. Morgan and C.L. Scofield, Neural Networks and Speech Processing, Kluwer: Dordrecht, 1991.
Google Scholar
W. Niblack, X. Zhu, J.L. Hafner, T. Breuel, D.B. Panceleon, D. Petkovic, M.D. Flickner, E. Upfal, S.I. Nin, S. Sull, B.E. Dom, B.-L. Yeo, S. Srinivansan, D. Zivkovic and M. Penner, “Updates to the QBIC system,” in Proceedings of Conference on Storage and Retrieval for Image and Video Databases VI (SPIE Proceedings Vol. 3312), 28–30 Jan., San Jose, California, 1998, pp. 150–161.
N.V. Patel and I.K. Sethi, “Audio characterization for video indexing,” SPIE Proceedings, Vol. 2670, pp. 373–384, 1996.
Google Scholar
A.W. Peevers, “A real time 3D signal analysis/synthesis tool based on the short time fourier transform,” http://cnmat.CNMAT.Berkeley.EDU/~alan/MS-html/MSthesis.v2ToC.html.
S. Pfeiffer, S. Fischer, and W. Effelsberg, “Automatic audio content analysis,” http://www.informatik.unimannheim.de/informatic/pi4/projects/MoCA/.
R. Polikar, “The wavelet tutorial,” http://www.public.iastate.edu/¡«rpolikar/WAVELETS/WTtutorial.htm.
L.R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” in Proceedings of The IEEE, Vol. 77, No. 2, 1989.
L.R. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition, Prentice Hall: Englewood Cliffs, NJ, 1993.
Google Scholar
G. Salton and M.J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill: New York, 1983.
Google Scholar
J. Saunders, “Real-time discrimination of broadcast speech/music,” in Proceedings ACASSP'96, Vol. 2, 1996, pp. 993–996.
Google Scholar
E.D. Scheirer, “Tempo and beat analysis of acoustic music signals,” http://sound.media.mit.edu/~eds/papers/ beat-track.html.
E.D. Scheirer, “The MPEG-4 structured audio standard,” in Proc. IEEE ICASSP 1998, also available at http://sound.media.mit.edu/papers.html.
E.D. Scheirer, “Using musical knowledge to extract expressive performance information from audio recordings,” available at http://sound.media.mit.edu/papers.html.
E. Scheirer and M. Slaney, “Construction and evaluation of a robust multifeature speech/music discriminator,” in Proceedings of the 1997 International Conference on Acoustics, Speech, and Signal Processing (ICASSP), April 21–24, Munich, Germany, 1997. Also available at http://web.interval.com/papers/1996-085/index.html.
J.R. Smith and S.-F. Chang, “Visually searching the web for content,” IEEE Multimedia Magazine, July–Sept., pp. 12–19, 1997.
S. Subramanya et al., “Transform-based indexing of audio data for multimedia databases,” in Proceedings of IEEE International Conference on Multimedia Computing and Systems, June 3–6, Ottawa, Canada, 1997, pp. 211–218.
The CMU Speech Project, http://www.speech.cs.cmu.edu/speech.
M.J. Witbrock and A.G. Hauptmann, “Speech recognition and information retrieval,” in Proceedings of the 1997 DARPA Speech Recognition Workshop, February 2–5, 1997.
E. Wold et al., “Content-based classification, search, and retrieval of audio,” IEEE Multimedia, Vol. 3, No. 3, pp. 27–36, 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

Gippsland School of Computing and Information Technology, Monash University, Churchill, 3842, Australia
Goujun Lu

Authors

Goujun Lu
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, G. Indexing and Retrieval of Audio: A Survey. Multimedia Tools and Applications 15, 269–290 (2001). https://doi.org/10.1023/A:1012491016871

Download citation

Issue Date: December 2001
DOI: https://doi.org/10.1023/A:1012491016871

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Indexing and Retrieval of Audio: A Survey

Abstract

Access this article

Similar content being viewed by others

Efficient audio-driven multimedia indexing through similarity-based speech / music discrimination

Music Information Retrieval: A Window into the Needs and Challenges

Sound Sharing and Retrieval

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Abstract

Access this article

Similar content being viewed by others

Efficient audio-driven multimedia indexing through similarity-based speech / music discrimination

Music Information Retrieval: A Window into the Needs and Challenges

Sound Sharing and Retrieval

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation