Improvement to speech-music discrimination using sinusoidal model based features

Shirazi, Jalil; Ghaemmaghami, Shahrokh

doi:10.1007/s11042-009-0416-3

Improvement to speech-music discrimination using sinusoidal model based features

Published: 10 November 2009

Volume 50, pages 415–435, (2010)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jalil Shirazi¹ &
Shahrokh Ghaemmaghami²

193 Accesses
11 Citations
Explore all metrics

Abstract

This paper addresses a model-based audio content analysis for classification of speech-music mixed audio signals into speech and music. A set of new features is presented and evaluated based on sinusoidal modeling of audio signals. The new feature set, including variance of the birth frequencies and duration of the longest frequency track in sinusoidal model, as a measure of the harmony and signal continuity, is introduced and discussed in detail. These features are used and compared to typical features as inputs to an audio classifier. Performance of these sinusoidal model features is evaluated through classification of audio into speech and music using both the GMM (Gaussian Mixture Model) and the SVM (Support Vector Machine) classifiers. Experimental results show that the proposed features are quite successful in speech/music discrimination. By using only a set of two sinusoidal model features, extracted from 1-s segments of the signal, we achieved 96.84% accuracy in the audio classification. Experimental comparisons also confirm superiority of the sinusoidal model features to the popular time domain and frequency domain features in audio classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abu-E1-Quran AR, Goubran RA, Chan ADC (2006) Adaptive feature selection for speech/music classifications. In: IEEE International workshop on multimedia signal processing 212–216
Ajmera J, McCowan L, Bourlard H (2003) Speech/music segmentation using entropy and dynamism features in a HMM classification framework. In: ELSEVIER Transactions on Speech communication 351–363
Babu J, Pathari V (2007) Multimedia content segmentation based on speaker recognition.In: IEEE ICSCN 2007, 16–19
Lin C-C, Chen S-H, Truong T-K, Chang Y (2005) Audio Classification and categorization based on wavelets and support vector machine. In: IEEE Transactions on Speech and Audio Processing 13: 644–651
Cortes C, Vapnik V (1995) Support vector networks. In: Mach. Learn 20: 273–297
Cortizo E, Zurer M, Ferreras F (2005) Application of fisher linear discriminant analysis to speech/music classification. In: EUROCON 1666–16669
Duda R, Hart P, Stock D (2000) Pattern Classification. Wiley
Ei-Maleh K, Klein M, Petrucci G, Kabal P (2000) Speech/music discrimination for multimedia applications. In: ICASSP 2000 2445–2448
Guo G, Li SZ (2003) Content-based audio classification and retrieval by support vector machines. In: IEEE Transactions on Neural Networks 14: 209–215
Hsu C-W, Chang C-C, Lin C-J (2009)A practical guide to support vector classification. In Department of Computer Science, National Taiwan University, http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
Jensen J, Hansen J.H.L (2001) Speech enhancement using a constrained iterative sinusoidal model. In: IEEE Transactions on Speech and Audio Processing 9: 731–740
Lagrange M, Marchand S (2007) Estimating the instantaneous frequency of sinusoidal components using phase-based methods. In: Journal of the Audio Engineering Society 55: 385–399
Li SZ (2000) Content-based audio classification and retrieval using the nearest feature line method. In: IEEE Transactions on Speech and Audio Processing 8: 619–625
Li D, Sethi I.K, Dimitrova N, McGee T (2001) Classification of general audio for content- based retrieval. In: ELSEVIER Pattern Recognition Letters 533–554
Lu L, Zhang H-J (2002) Content analysis for audio classification and segmentation. In: IEEE Transactions on Speech and Audio Processing 10: 504–516
Lu L, Zhang H-J, Li SZ (2003) Content-based audio classification and segmentation by using support vector Machines.: In: Multimedia Systems Journal 482–492
McAulay RJ, Quatieri TF (1986) Speech analysis/synthesis based on a sinusoidal representation. In: IEEE Transactions on Acoustic, Speech and Signal Processing ASSP- 34 744–754
Moon TK (1996) The Expectation-maximization algorithm. In: IEEE Signal Processing Magazine 13: 47–60
Mowlaee Begzadeh Mahale P, Sayadiyan A, Faez K (2008) Mixed type audio classification using sinusoidal parameters. In Proc. 3rd ICTTA’08 1–5
Nunes LO, Esquef PAA, Biscainho LWP, Merched R (2008) Partial tracking in sinusoidal modeling- an adaptive prediction-based RLS lattice solution. In: SIGMAP 2008 84–91
Rabiner LR, Shafer RW (1975) Digital processing of speech signals. Prentice-Hall, Englewood Cliffs
Google Scholar
Ramamohan S, Dandapat S (2006) Sinusoidal model-based analysis and classification of stressed speech. In: IEEE Transactions on Audio, Speech and Language Processing 14: 737–746
Regnier L, Peeters G (2009) Singing voice detection in music tracks using direct voice vibrato detection. In: Proceeding of ICASSP 2009 1685–1688
Sadjadi OS, Ahadi SM, Hazrati O (2007) Unsupervised speech/music classification using one-class support vector machines. In: 6th IEEE ICICS 1–5
Saunders J (1996) Real-time discrimination of broadcast speech/music. In: Proceeding of ICASSP 1996 993–996
Scheirer E, Slaney M (1997) Construction and evaluation of a robust multi-feature speech/music discriminator. In: Proceeding of ICASSP 1997 21–24
Smith JO, Serra X (1987) PARSHL: An analysis/synthesis program for non-harmonic sound based on sinusoidal representation. http://www-ccrma.stanford.edu/~jos/parshl/parshl.pdf
Somervuo P, Harma A, Fagerlund S (2006) Parametric representation of bird sounds for automatic species recognition. In: IEEE Transactions on Audio, Speech and Language Processing 14: 2252–2263
Tancerel L, Ragot S, Ruoppilaand VT, Lefebyre R (2000) Combined speech and audio coding by discrimination. In: IEEE work-shop on speech coding, 17–20
Thoshkahana B, Sudha V, Ramakrishnan KR (2006) A speech-music discriminator using HILN model based features. In: ICASSP 2006 425–428

Download references

Author information

Authors and Affiliations

Science & Research Branch, Islamic Azad University, Tehran, Iran
Jalil Shirazi
Sharif University of Technology, Tehran, Iran
Shahrokh Ghaemmaghami

Authors

Jalil Shirazi
View author publications
You can also search for this author in PubMed Google Scholar
Shahrokh Ghaemmaghami
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jalil Shirazi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shirazi, J., Ghaemmaghami, S. Improvement to speech-music discrimination using sinusoidal model based features. Multimed Tools Appl 50, 415–435 (2010). https://doi.org/10.1007/s11042-009-0416-3

Download citation

Published: 10 November 2009
Issue Date: November 2010
DOI: https://doi.org/10.1007/s11042-009-0416-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improvement to speech-music discrimination using sinusoidal model based features

Abstract

Access this article

Similar content being viewed by others

An Algorithm for Distinguishing Between Speech and Music

Feature Analysis for Audio Classification

Empirical mode decomposition based statistical features for discrimination of speech and low frequency music signal

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improvement to speech-music discrimination using sinusoidal model based features

Abstract

Access this article

Similar content being viewed by others

An Algorithm for Distinguishing Between Speech and Music

Feature Analysis for Audio Classification

Empirical mode decomposition based statistical features for discrimination of speech and low frequency music signal

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation