Audio Classification

Lu, Lie; Hanjalic, Alan

doi:10.1007/978-1-4614-8265-9_1032

Audio Classification

Lie Lu³ &
Alan Hanjalic⁴

Reference work entry
First Online: 01 January 2018

172 Accesses

Synonyms

Audio categorization; Audio indexing; Audio recognition

Definition

Audio classification aims at classifying a piece of audio signal into one of the pre-defined semantic classes. It is typically realized as a combination of a learning step to learn a statistical model of each semantic class, and an inference step to estimate which semantic class is closest to the given piece of audio signal.

Historical Background

Audio classification associates semantic labels with audio signals, and can also be referred to as audio indexing, audio categorization or audio recognition. As such, audio classification plays an important role in facilitating search and retrieval in large-scale audio collections (databases). Semantic labels are used to represent semantic classes or semantic concepts, which can be defined at different abstraction and complexity levels. Typical examples of basic semantic audio classes are speech, music, environmental sounds, and silence, which can be detected rather...

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 4,499.99; Price excludes VAT (USA)

Hardcover Book: USD 6,499.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

Baillie M, Jose JM. Audio-based event detection for sports video. In: Proceedings of the 2nd International Conference on Image and Video Retrieval; 2003. p. 300–9.
Google Scholar
Cai R, Lu L, Hanjalic A, Zhang HJ, Cai LH. A flexible framework for key audio effects detection and Auditory context inference. IEEE Trans Audio Speech Lang Process. 2006;14(3):1026–39.
Article Google Scholar
Cheng WH, Chu WT, Wu, JL. Semantic context detection based on hierarchical audio models. In: Proceedings of the 5th ACM SIGMM International Workshop on Multimedia Information Retrieval; 2003. p. 109–15.
Google Scholar
Duda RO, Hart PE, Stork DG. Pattern classification. 2nd ed. New York: Wiley; 2000.
MATH Google Scholar
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. New York: Springer; 2001.
Book MATH Google Scholar
Heckerman D. A tutorial on learning with Bayesian networks. Microsoft Research, Redmond, Washington, Tech. Rep. MSR-TR-95-06; 1995.
Google Scholar
Huang C, Darwiche A. Inference in belief networks: a procedural guide. Int J Approx Reason. 1996;15(3):225–63.
Article MathSciNet MATH Google Scholar
Liu Z, Wang Y, Chen T. Audio feature extraction and analysis for scene segmentation and classification. J VLSI Signal Process Syst Signal Image Video Technol. 1998;20(1–2):61–79.
Article Google Scholar
Lu L, Zhang HJ, Jiang H. Content analysis for audio classification and segmentation. IEEE Trans Speech Audio Process. 2002;10(7):504–16.
Article Google Scholar
Lu L, Zhang HJ, Li S. Content-based audio classification and segmentation by using support vector machines. ACM Multimed Syst J. 2003;8(6):482–92.
Article Google Scholar
Moncrieff S, Dorai C, Venkatesh S. Detecting indexical signs in film audio for scene interpretation. In: Proceedings of the IEEE International Conference on Multimedia and Expo; 2001. p. 1192–5.
Google Scholar
Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE. 1989;77(2):257–86.
Article Google Scholar
Reyes-Gomez MJ, Ellis DPW. Selection, parameter estimation, and discriminative training of hidden Markov models for general audio modeling. In: Proceedings of the IEEE International Conference on Multimedia and Expo; 2003. p. 73–6.
Google Scholar
Rui Y, Gupta A, Acero A. Automatically extracting highlights for TV baseball programs. In: Proceedings of the 8th ACM International Conference on Multimedia; 2000. p. 105–15.
Google Scholar
Xiong Z, Radhakrishnan R, Divakaran A, Huang TS. Audio events detection based highlights extraction from baseball, golf and soccer games in a unified framework. In: Proceedings of the IEEE International Conference on Multimedia and Expo; 2003. p. 401–4.
Google Scholar
Xu M, Maddage N, Xu CS, Kankanhalli M, Tian Q. Creating audio keywords for event detection in soccer video. In: Proceedings of the IEEE International Conference on Multimedia and Expo; 2003. p. 281–4.
Google Scholar
Zhang T, Jay Kuo CC. Hierarchical system for content-based audio classification and retrieval. In: Proceedings of the SPIE: Multimedia Storage and Archiving Systems III; 1998. p. 398–409.
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research Asia, Beijing, China
Lie Lu
Delft University of Technology, Delft, The Netherlands
Alan Hanjalic

Authors

Lie Lu
View author publications
You can also search for this author in PubMed Google Scholar
Alan Hanjalic
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lie Lu .

Editor information

Editors and Affiliations

Georgia Institute of Technology College of Computing, Atlanta, GA, USA
Ling Liu
University of Waterloo School of Computer Science, Waterloo, ON, Canada
M. Tamer Özsu

Section Editor information

Dept. of Computer Science, New Jersey Inst. of Technology, Newark, NJ, USA
Vincent Oria
Digital Content and Media Sciences ReseaMultimedia Information Research Division, National Institute of Informatics, Tokyo, Japan
Shin'ichi Satoh

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Lu, L., Hanjalic, A. (2018). Audio Classification. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_1032

Download citation

DOI: https://doi.org/10.1007/978-1-4614-8265-9_1032
Published: 07 December 2018
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics