Abstract
This paper proposes a hierarchical time-efficient method for audio classification and also presents an automatic procedure to select the best set of features for audio classification using Kolmogorov-Smirnov test (KS-test). The main motivation for our study is to propose a framework of general genre (e.g., action, comedy, drama, documentary, musical, etc...) movie video abstraction scheme for embedded devices-based only on the audio component. Accordingly simple audio features are extracted to ensure the feasibility of real-time processing. Five audio classes are considered in this paper: pure speech, pure music or songs, speech with background music, environmental noise and silence. Audio classification is processed in three stages, (i) silence or environmental noise detection, (ii) speech and non-speech classification and (iii) pure music or songs and speech with background music classification. The proposed system has been tested on various real time audio sources extracted from movies and TV programs. Our experiments in the context of real time processing have shown the algorithms produce very satisfactory results.
Similar content being viewed by others
References
Alatan AA, Akansu AN, Wolf W (2001) Multi-modal dialog scene detection using hidden markov models for content-based multimedia indexing. Multimedia Tools Appl 14(2):137–151
Bugatti A, Flammini A, Migliorati P (2002) Audio classification in speech and music: a comparison between a statistical and a neural approach. EURASIP J Appl Signal Process 2002(1):372–378
Burred JJ, Lerch A (2004) Hierarchical automatic audio signal classification. J Audio Eng Soc 52(7/8):724–739
Casagr N, Eck D, Kégl B (2005) Geometry in sound: a speech/music audio classifier inspired by an image classifier. In: Proc.of the int. computer music conferecnce (ICMC)
Chu W, Champagne B (2008) A noise-robust FFT-based auditory spectrum with application in audio classification. IEEE Trans Audio, Speech and Language Processing 16(1):137–150
Huang L-S, Yang C-H (2000) A novel approach to robust speech endpoint detection in car environments. In: Proc. IEEE int. conf. acoust., speech, signal process (ICASSP), vol 3. Istanbul, Turkey, pp 1751–1754
Khan MKS, Al-Khatib WG (2006) Machine-learning based classification of speech and music. Multimedia Syst 12(1):55–67
Krishnapuram B, Harternink A, Carin L, Figueiredo M (2004) A Bayesian approach to joint feature selection and classifier design. IEEE Trans Pattern Anal Mach Intell 26(9):1105–1111
Lavner Y, Ruinskiy D (2009) A decision-tree-based algorithm for speech/music classification and segmentation. EURASIP J Audio Speech Music Process 2009:1–14
Li Y, Lee S-H, Yeh C-H, Kuo CCJ (2006) Techniques for movie content analysis and skimming: tutorial and overview on video abstraction techniques. IEEE Signal Process Mag 23(2):79–89
Lopes RH, Reid I, Hobson PR (2007) The two-dimensional kolmogorov-smirnov test. In: XI international workshop on advanced computing and analysis techniques in physics research, Amsterdam, the Netherlands, 23–27 April 2007, pp 1–12
Lu L, Jiang H, Zhang H (2001) A robust audio classification and segmentation method. In: ACM Multimedia, pp 203–211
Lu L, Zhang H-J, Member S, Jiang H (2002) Content analysis for audio classification and segmentation. IEEE Trans Speech Audio Process 10:504–516
Lu L, Zhang H-J, Li SZ (2003) Content-based audio classification and segmentation by using support vector machines. In: Multimedia systems, vol 8. Springer, pp 482–492
Mahale P, Sayadiyan A, Faez K (2008) Mixed type audio classification using sinusoidal parameters. In: 3rd int. conf. information and communication technologies: from theory to applications, pp 1–5
Mitra P, Murthy C, Pal S (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312
Pachet F, Roy P (2007) Exploring billions of audio features. In: Proceedings of the international workshop on content-based multimedia indexing (CBMI 07). Eurasip, Bordeaux, France, pp 227–235
Pachet F, Roy P (2009) Analytical features: a knowledge-based approach to audio feature generation. EURASIP J Audio Speech Music Process 2009:1–23
Panagiotakis C, Tziritas G (2004) A speech/music discriminator based on RMS and zero-crossings. IEEE Trans Multimedia 7:155–166
Pikrakis A, Giannakopoulos T, Theodoridis S (2008) A speech/music discriminator of radio recordings based on dynamic programming and Bayesian networks. IEEE Trans Multimedia 10(5):846–857
Ruvolo P, Fasel I, Movellan JR (2010) A learning approach to hierarchical feature selection and aggregation for audio classification. Pattern Recogn Lett. doi:10.1016/j.patrec.2009.12.036
Saunders J (1996) Real-time discrimination of broadcast speech/music. In: Proc. IEEE int. conf. acoust., speech, signal process (ICASSP), Atlanta, GA, May 1996, pp 993–996
Scheirer E, Slaney M (1997) Construction and evaluation of a robust multifeature speech/music discriminator. In: Proc. IEEE int. conf. acoust., speech, signal process (ICASSP), vol 2, p 1331
Song J-H, Lee K-H, Chang J-H, Kim JK, Kim NS (2008) Analysis and improvement of speech/music classification for 3GPP2 SMV based on GMM. IEEE Signal Process Lett 15:103–106
Wang K, Shamma S (1994) Self-normalization and noise-robustness in early auditory representations. IEEE Trans Speech Audio Process 2(3):421–435
Wold E, Blum T, Keislar D, Wheaton J (1996) Content-based classification, search, and retrieval of audio. IEEE MultiMedia 3(3):27–36
Zhang T, Jay Kuo C-C (2001) Audio content analysis for online audiovisual data segmentation and classification. IEEE Trans Speech Audio Process 9(4):441–457
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Krishnamoorthy, P., Kumar, S. Hierarchical audio content classification system using an optimal feature selection algorithm. Multimed Tools Appl 54, 415–444 (2011). https://doi.org/10.1007/s11042-010-0546-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-010-0546-7