Skip to main content
Log in

Hierarchical audio content classification system using an optimal feature selection algorithm

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper proposes a hierarchical time-efficient method for audio classification and also presents an automatic procedure to select the best set of features for audio classification using Kolmogorov-Smirnov test (KS-test). The main motivation for our study is to propose a framework of general genre (e.g., action, comedy, drama, documentary, musical, etc...) movie video abstraction scheme for embedded devices-based only on the audio component. Accordingly simple audio features are extracted to ensure the feasibility of real-time processing. Five audio classes are considered in this paper: pure speech, pure music or songs, speech with background music, environmental noise and silence. Audio classification is processed in three stages, (i) silence or environmental noise detection, (ii) speech and non-speech classification and (iii) pure music or songs and speech with background music classification. The proposed system has been tested on various real time audio sources extracted from movies and TV programs. Our experiments in the context of real time processing have shown the algorithms produce very satisfactory results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Alatan AA, Akansu AN, Wolf W (2001) Multi-modal dialog scene detection using hidden markov models for content-based multimedia indexing. Multimedia Tools Appl 14(2):137–151

    Article  MATH  Google Scholar 

  2. Bugatti A, Flammini A, Migliorati P (2002) Audio classification in speech and music: a comparison between a statistical and a neural approach. EURASIP J Appl Signal Process 2002(1):372–378

    MATH  Google Scholar 

  3. Burred JJ, Lerch A (2004) Hierarchical automatic audio signal classification. J Audio Eng Soc 52(7/8):724–739

    Google Scholar 

  4. Casagr N, Eck D, Kégl B (2005) Geometry in sound: a speech/music audio classifier inspired by an image classifier. In: Proc.of the int. computer music conferecnce (ICMC)

  5. Chu W, Champagne B (2008) A noise-robust FFT-based auditory spectrum with application in audio classification. IEEE Trans Audio, Speech and Language Processing 16(1):137–150

    Article  Google Scholar 

  6. Huang L-S, Yang C-H (2000) A novel approach to robust speech endpoint detection in car environments. In: Proc. IEEE int. conf. acoust., speech, signal process (ICASSP), vol 3. Istanbul, Turkey, pp 1751–1754

  7. Khan MKS, Al-Khatib WG (2006) Machine-learning based classification of speech and music. Multimedia Syst 12(1):55–67

    Article  Google Scholar 

  8. Krishnapuram B, Harternink A, Carin L, Figueiredo M (2004) A Bayesian approach to joint feature selection and classifier design. IEEE Trans Pattern Anal Mach Intell 26(9):1105–1111

    Article  Google Scholar 

  9. Lavner Y, Ruinskiy D (2009) A decision-tree-based algorithm for speech/music classification and segmentation. EURASIP J Audio Speech Music Process 2009:1–14

    Article  Google Scholar 

  10. Li Y, Lee S-H, Yeh C-H, Kuo CCJ (2006) Techniques for movie content analysis and skimming: tutorial and overview on video abstraction techniques. IEEE Signal Process Mag 23(2):79–89

    Article  MATH  Google Scholar 

  11. Lopes RH, Reid I, Hobson PR (2007) The two-dimensional kolmogorov-smirnov test. In: XI international workshop on advanced computing and analysis techniques in physics research, Amsterdam, the Netherlands, 23–27 April 2007, pp 1–12

  12. Lu L, Jiang H, Zhang H (2001) A robust audio classification and segmentation method. In: ACM Multimedia, pp 203–211

  13. Lu L, Zhang H-J, Member S, Jiang H (2002) Content analysis for audio classification and segmentation. IEEE Trans Speech Audio Process 10:504–516

    Article  Google Scholar 

  14. Lu L, Zhang H-J, Li SZ (2003) Content-based audio classification and segmentation by using support vector machines. In: Multimedia systems, vol 8. Springer, pp 482–492

  15. Mahale P, Sayadiyan A, Faez K (2008) Mixed type audio classification using sinusoidal parameters. In: 3rd int. conf. information and communication technologies: from theory to applications, pp 1–5

  16. Mitra P, Murthy C, Pal S (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312

    Article  Google Scholar 

  17. Pachet F, Roy P (2007) Exploring billions of audio features. In: Proceedings of the international workshop on content-based multimedia indexing (CBMI 07). Eurasip, Bordeaux, France, pp 227–235

  18. Pachet F, Roy P (2009) Analytical features: a knowledge-based approach to audio feature generation. EURASIP J Audio Speech Music Process 2009:1–23

    Article  Google Scholar 

  19. Panagiotakis C, Tziritas G (2004) A speech/music discriminator based on RMS and zero-crossings. IEEE Trans Multimedia 7:155–166

    Article  Google Scholar 

  20. Pikrakis A, Giannakopoulos T, Theodoridis S (2008) A speech/music discriminator of radio recordings based on dynamic programming and Bayesian networks. IEEE Trans Multimedia 10(5):846–857

    Article  Google Scholar 

  21. Ruvolo P, Fasel I, Movellan JR (2010) A learning approach to hierarchical feature selection and aggregation for audio classification. Pattern Recogn Lett. doi:10.1016/j.patrec.2009.12.036

    Google Scholar 

  22. Saunders J (1996) Real-time discrimination of broadcast speech/music. In: Proc. IEEE int. conf. acoust., speech, signal process (ICASSP), Atlanta, GA, May 1996, pp 993–996

  23. Scheirer E, Slaney M (1997) Construction and evaluation of a robust multifeature speech/music discriminator. In: Proc. IEEE int. conf. acoust., speech, signal process (ICASSP), vol 2, p 1331

  24. Song J-H, Lee K-H, Chang J-H, Kim JK, Kim NS (2008) Analysis and improvement of speech/music classification for 3GPP2 SMV based on GMM. IEEE Signal Process Lett 15:103–106

    Article  Google Scholar 

  25. Wang K, Shamma S (1994) Self-normalization and noise-robustness in early auditory representations. IEEE Trans Speech Audio Process 2(3):421–435

    Article  Google Scholar 

  26. Wold E, Blum T, Keislar D, Wheaton J (1996) Content-based classification, search, and retrieval of audio. IEEE MultiMedia 3(3):27–36

    Article  Google Scholar 

  27. Zhang T, Jay Kuo C-C (2001) Audio content analysis for online audiovisual data segmentation and classification. IEEE Trans Speech Audio Process 9(4):441–457

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. Krishnamoorthy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Krishnamoorthy, P., Kumar, S. Hierarchical audio content classification system using an optimal feature selection algorithm. Multimed Tools Appl 54, 415–444 (2011). https://doi.org/10.1007/s11042-010-0546-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-010-0546-7

Keywords

Navigation