Multimedia Tools and Applications

, Volume 50, Issue 2, pp 415–435 | Cite as

Improvement to speech-music discrimination using sinusoidal model based features



This paper addresses a model-based audio content analysis for classification of speech-music mixed audio signals into speech and music. A set of new features is presented and evaluated based on sinusoidal modeling of audio signals. The new feature set, including variance of the birth frequencies and duration of the longest frequency track in sinusoidal model, as a measure of the harmony and signal continuity, is introduced and discussed in detail. These features are used and compared to typical features as inputs to an audio classifier. Performance of these sinusoidal model features is evaluated through classification of audio into speech and music using both the GMM (Gaussian Mixture Model) and the SVM (Support Vector Machine) classifiers. Experimental results show that the proposed features are quite successful in speech/music discrimination. By using only a set of two sinusoidal model features, extracted from 1-s segments of the signal, we achieved 96.84% accuracy in the audio classification. Experimental comparisons also confirm superiority of the sinusoidal model features to the popular time domain and frequency domain features in audio classification.


Audio classification Sinusoidal model 


  1. 1.
    Abu-E1-Quran AR, Goubran RA, Chan ADC (2006) Adaptive feature selection for speech/music classifications. In: IEEE International workshop on multimedia signal processing 212–216Google Scholar
  2. 2.
    Ajmera J, McCowan L, Bourlard H (2003) Speech/music segmentation using entropy and dynamism features in a HMM classification framework. In: ELSEVIER Transactions on Speech communication 351–363Google Scholar
  3. 3.
    Babu J, Pathari V (2007) Multimedia content segmentation based on speaker recognition.In: IEEE ICSCN 2007, 16–19Google Scholar
  4. 4.
    Lin C-C, Chen S-H, Truong T-K, Chang Y (2005) Audio Classification and categorization based on wavelets and support vector machine. In: IEEE Transactions on Speech and Audio Processing 13: 644–651Google Scholar
  5. 5.
    Cortes C, Vapnik V (1995) Support vector networks. In: Mach. Learn 20: 273–297Google Scholar
  6. 6.
    Cortizo E, Zurer M, Ferreras F (2005) Application of fisher linear discriminant analysis to speech/music classification. In: EUROCON 1666–16669Google Scholar
  7. 7.
    Duda R, Hart P, Stock D (2000) Pattern Classification. WileyGoogle Scholar
  8. 8.
    Ei-Maleh K, Klein M, Petrucci G, Kabal P (2000) Speech/music discrimination for multimedia applications. In: ICASSP 2000 2445–2448Google Scholar
  9. 9.
    Guo G, Li SZ (2003) Content-based audio classification and retrieval by support vector machines. In: IEEE Transactions on Neural Networks 14: 209–215Google Scholar
  10. 10.
    Hsu C-W, Chang C-C, Lin C-J (2009)A practical guide to support vector classification. In Department of Computer Science, National Taiwan University,
  11. 11.
    Jensen J, Hansen J.H.L (2001) Speech enhancement using a constrained iterative sinusoidal model. In: IEEE Transactions on Speech and Audio Processing 9: 731–740Google Scholar
  12. 12.
    Lagrange M, Marchand S (2007) Estimating the instantaneous frequency of sinusoidal components using phase-based methods. In: Journal of the Audio Engineering Society 55: 385–399Google Scholar
  13. 13.
    Li SZ (2000) Content-based audio classification and retrieval using the nearest feature line method. In: IEEE Transactions on Speech and Audio Processing 8: 619–625Google Scholar
  14. 14.
    Li D, Sethi I.K, Dimitrova N, McGee T (2001) Classification of general audio for content- based retrieval. In: ELSEVIER Pattern Recognition Letters 533–554Google Scholar
  15. 15.
    Lu L, Zhang H-J (2002) Content analysis for audio classification and segmentation. In: IEEE Transactions on Speech and Audio Processing 10: 504–516Google Scholar
  16. 16.
    Lu L, Zhang H-J, Li SZ (2003) Content-based audio classification and segmentation by using support vector Machines.: In: Multimedia Systems Journal 482–492Google Scholar
  17. 17.
    McAulay RJ, Quatieri TF (1986) Speech analysis/synthesis based on a sinusoidal representation. In: IEEE Transactions on Acoustic, Speech and Signal Processing ASSP- 34 744–754Google Scholar
  18. 18.
    Moon TK (1996) The Expectation-maximization algorithm. In: IEEE Signal Processing Magazine 13: 47–60Google Scholar
  19. 19.
    Mowlaee Begzadeh Mahale P, Sayadiyan A, Faez K (2008) Mixed type audio classification using sinusoidal parameters. In Proc. 3rd ICTTA’08 1–5Google Scholar
  20. 20.
    Nunes LO, Esquef PAA, Biscainho LWP, Merched R (2008) Partial tracking in sinusoidal modeling- an adaptive prediction-based RLS lattice solution. In: SIGMAP 2008 84–91Google Scholar
  21. 21.
    Rabiner LR, Shafer RW (1975) Digital processing of speech signals. Prentice-Hall, Englewood CliffsGoogle Scholar
  22. 22.
    Ramamohan S, Dandapat S (2006) Sinusoidal model-based analysis and classification of stressed speech. In: IEEE Transactions on Audio, Speech and Language Processing 14: 737–746Google Scholar
  23. 23.
    Regnier L, Peeters G (2009) Singing voice detection in music tracks using direct voice vibrato detection. In: Proceeding of ICASSP 2009 1685–1688Google Scholar
  24. 24.
    Sadjadi OS, Ahadi SM, Hazrati O (2007) Unsupervised speech/music classification using one-class support vector machines. In: 6th IEEE ICICS 1–5Google Scholar
  25. 25.
    Saunders J (1996) Real-time discrimination of broadcast speech/music. In: Proceeding of ICASSP 1996 993–996Google Scholar
  26. 26.
    Scheirer E, Slaney M (1997) Construction and evaluation of a robust multi-feature speech/music discriminator. In: Proceeding of ICASSP 1997 21–24Google Scholar
  27. 27.
    Smith JO, Serra X (1987) PARSHL: An analysis/synthesis program for non-harmonic sound based on sinusoidal representation.
  28. 28.
    Somervuo P, Harma A, Fagerlund S (2006) Parametric representation of bird sounds for automatic species recognition. In: IEEE Transactions on Audio, Speech and Language Processing 14: 2252–2263Google Scholar
  29. 29.
    Tancerel L, Ragot S, Ruoppilaand VT, Lefebyre R (2000) Combined speech and audio coding by discrimination. In: IEEE work-shop on speech coding, 17–20Google Scholar
  30. 30.
    Thoshkahana B, Sudha V, Ramakrishnan KR (2006) A speech-music discriminator using HILN model based features. In: ICASSP 2006 425–428Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Science & Research BranchIslamic Azad UniversityTehranIran
  2. 2.Sharif University of TechnologyTehranIran

Personalised recommendations