An Efficient Approach for Classification of Speech and Music

Swe, Ei Mon Mon; Pwint, Moe

doi:10.1007/978-3-540-89796-5_6

An Efficient Approach for Classification of Speech and Music

Ei Mon Mon Swe⁸ &
Moe Pwint⁸

Conference paper

1413 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5353))

Abstract

A new method to classify an audio segment into speech and music related to the automatic transcription of broadcast news is presented. To discriminate between speech and music, sample entropy (SampEn), a time complexity measure, mainly operates as a feature. SampEn is a variant of the approximate entropy (ApEn) that measures the regularity of time series. The basic idea is to label a given audio into speech or music depending on its regularity. Based on the SampEn sequence calculated over a window, the regularity of a given audio stream is measured. The effectiveness of the proposed method is tested on experiments, including broadcast news shows from BBC radio stations, WBAI news, UN news and music genres with different temporal distributions. Results show the robustness of the proposed method achieving high discrimination accuracy for all tested experiments.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ajmera, J., McCowan, I., Bourlard, H.: Speech/music segmentation using entropy and dynamism features in a HMM classification framework. Speech Communication 40, 351–363 (2003)
Article Google Scholar
Harb, H., Chen, L.: Robust speech and music discrimiantion using spectrum’s first order statisitcs and neural networks. In: Proc. IEEE Int., Symp. on Signal Processing and Its Applications, vol. 2, pp. 125–128 (2003)
Google Scholar
Lake, E., Richman, S., Pamela Griffin, M., Randall Moorman, J.: Sample entropy analysis of neonatal heart rate variability. Am. J. Physiol. Regul. Integr. Comp. Physiol. 283, R789–R797 (2002)
Article Google Scholar
Munoz-Exposito, J.E., et al.: Speech/Music discrimination using a single Warped LPC-based feature. In: Proc. of ISMIR, London, UK, pp. 614–617 (2005)
Google Scholar
Panagiotakis, C., Tziritas, G.: A Speech/Music Discriminator Based on RMS and Zero-Crossings. IEEE Transactions on MultiMedia 7(1), 155–166 (2005)
Article Google Scholar
Ngan, P.M.: Motion Detection using Approximate Entropy. DICTA, 379–384 (February 1997)
Google Scholar
Pikrakis, A., Giannakopoulos, T., Theodoridis, S.: A computationally efficient speech/music discriminator for radio recordings. In: Proc. ISMIR 2006, Victoria, Canada, pp. 107–110 (2006)
Google Scholar
Pikrakis, A., Giannakopoulos, T., Theodoridis, S.: Speech/Music Discrimination for radio broadcasts using a hybrid HMM-Bayseian Network architecture. In: Proc. EUSIPCO 2006, Florence, Italy, September 4-8 (2006)
Google Scholar
Pincus, S., Singer, B.H.: Randomness and degrees of irregularity. Proc. Natl. Acad. Sci. USA 93, 2083–2088 (1995)
Article MathSciNet MATH Google Scholar
Pincus, S.M.: Approximate Entropy as a measure of system complexity. Proc. Natl. Acad. Sci. USA 88, 2297–2301 (1991)
Article MathSciNet MATH Google Scholar
Pwint, M., Sattar, F.: A Segmentation method for noisy speech using gentic algorithm. In: IEEE International Conference on Acoustic Speech and Signal ICASSP, pp. 521–524 (March 2005)
Google Scholar
Richman, J.S., Moorman, J.R.: Physilogical time series analysis using approximate entropy and sample entropy. Am. J. Physiol. Heart. Circ. Physiol. 278, H2039–H2049 (2000)
Google Scholar
Scheirer, E., Slaney, M.: COnstruction and evaluation of a robust multifeature speec/music discrimiantion. In: Proc. IEEE ICASSP 1997, pp. 1331–1334 (1997)
Google Scholar
Zhang, T., Jay Kuo, C.C.: Audio Content Analysis for Online Audiovisual Data Segmentation and Classification. IEEE Transactions on Speech and Audio Processing 9(4), 441–457 (2001)
Article Google Scholar
Lu, L., Zhang, H.-J., Jiang, H.: Content Analysis for Audio Classification and Segmentation. IEEE Transactions on Speech and Audio Processing 10(7) (October 2002)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Computer Studies Yangon, Myanmar
Ei Mon Mon Swe & Moe Pwint

Authors

Ei Mon Mon Swe
View author publications
You can also search for this author in PubMed Google Scholar
Moe Pwint
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Engineering Science, National Cheng Kung University, No.1, University Road, 701, Tainan City, Taiwan
Yueh-Min Ray Huang
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95, Zhongguancun East Road, 100190, Beijing, China
Changsheng Xu
Institute of Biomedical Engineering, National Cheng Kung University, No. 1, University Road, 701, Tainan City, Taiwan
Kuo-Sheng Cheng
Department of Electrical Engineering, National Cheng Kung University, No. 1, University Road, 701, Tainan City, Taiwan
Jar-Ferr Kevin Yang
Department of Electrical and Computer Engineering, Concordia University, S-EV005.139, 1515 St. Catherine West, Montreal, H4G 2W1, Quebec, Canada
M. N. S. Swamy
Microsoft Research Asia, 5/F, Beijing Sigma Center, No. 49, Zhichun Road, Hai Dian District, 100080, Beijing, China
Shipeng Li
Department of Information Management, National Kaohsiung University of Applied Sciences, No. 415, Jiangong Road, Sanmin District, 80778, Kaohsiung, Taiwan
Jen-Wen Ding

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Swe, E.M.M., Pwint, M. (2008). An Efficient Approach for Classification of Speech and Music. In: Huang, YM.R., et al. Advances in Multimedia Information Processing - PCM 2008. PCM 2008. Lecture Notes in Computer Science, vol 5353. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89796-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-89796-5_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89795-8
Online ISBN: 978-3-540-89796-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics