Segregation of Speech and Songs - A Precursor to Audio Interactive Applications

Mukherjee, Himadri; Phadikar, Santanu; Roy, Kaushik

doi:10.1007/978-981-13-1343-1_5

Himadri Mukherjee¹⁰,
Santanu Phadikar¹¹ &
Kaushik Roy¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 836))

Included in the following conference series:

Annual Convention of the Computer Society of India

827 Accesses

Abstract

Audio interactive applications have eased our lives in numerous ways encompassing speech recognition to song identification. Such applications have helped the common people in using Information Technology by providing them a passage for skipping the complicated user interactivity procedures. Audio based search applications have become very popular nowadays especially for searching songs. A system which can distinguish between speech and songs can help to boost the performance of such applications by minimizing the search space and at the same time decide the method of recognition based on the type of audio. It can also help in music-speech separation from audio for karaoke development. In this paper, a system to segregate songs and speech has been proposed using Line Spectral Pair based features. The system has been tested on a database of 19374 clips and a highest accuracy of 99.88% has been obtained with Ensemble Learning based classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Al-Shoshan, A.I.: Speech and music classification and separation: a review. J. King Saud Univ. 19(1), 95–133 (2006)
Google Scholar
Gao, T., Du, J., Dai, L.R., Lee, C.H.: Joint training of front-end and back-end deep neural networks for robust speech recognition. In: Proceedings of ICASSP-2015, pp. 4375–4379 (2015)
Google Scholar
Giri, R., Seltzer, M.L., Droppo, J., Yu, D.: Improving speech recognition in reverberation using a room-aware deep neural network and multitask learning. In: Proceedings of ICASSP-2015, pp. 5014–5018 (2015)
Google Scholar
Ritter, M., Mueller, M., Stueker, S., Metze, F., Waibel, A.: Training deep neural networks for reverberation robust speech recognition. In: ITG Symposium on Speech Communication, pp. 1–5 (2016)
Google Scholar
Wold, E., Blum, T., Keislar, D., Wheaten, J.: Content-based classification, search, and retrieval of audio. IEEE Multimedia 3(3), 27–36 (1996)
Article Google Scholar
Mazzoni, D., Dannenberg, R.B.: Melody matching directly from audio. In: Proceedings of ISMIR-2001, pp. 17–18 (2001)
Google Scholar
Foote, J.T.: Content-based retrieval of music and audio. In: Multimedia Storage and Archiving Systems II, pp. 138–148 (1997)
Google Scholar
Prakash, K., Hepzibha, R.D.: Blind source separation for speech music and speech speech mixtures. Int. J. Comput. Appl. 110(12), 40–43 (2015)
Google Scholar
Gerhard, D.B.: Computationally measurable differences between speech and song. Doctoral dissertation, School of Computing Science, Simon Fraser University (2003)
Google Scholar
Ghosal, A., Chakraborty, R., Dhara, B.C., Saha, S.K.: A hierarchical approach for speech-instrumental-song classification. SpringerPlus 2(1), 526 (2013)
Article Google Scholar
Rong, F.: Audio classification method based on machine learning. In: Proceedings of ICITBS-2016, pp. 81–84 (2016)
Google Scholar
Saunders, J.: Real-time discrimination of broadcast speech/music. In: Proceedings of ICASSP-1996, vol. 2, pp. 993–996 (1996)
Google Scholar
Sadjadi, S.O., Ahadi, S.M., Hazrati, O.: Unsupervised speech/music classification using one-class support vector machines. In: Proceedings of ICICS-2007, pp. 1–5 (2007)
Google Scholar
Thoshkahna, B., Sudha, V., Ramakrishnan, K.R.: A speech-music discriminator using HILN model based features. In: Proceedings of ICASSP-2006, vol. 5, pp. V 425-V 428 (2006)
Google Scholar
Ethnologue. http://www.ethnologue.com. Accessed 1 Sept 2017
Youtube. http://www.youtube.com. Accessed 1 Sept 2017
Mukherjee, H., Rakshit, P., Phadikar, S., Roy, K.: REARC-a Bangla phoneme recognizer. In: Proceedings of ICADW-2016, pp. 177–180 (2016)
Google Scholar
Paliwal, K.K.: On the use of line spectral frequency parameters for speech recognition. Digit. Signal Process. 2(2), 80–87 (1992)
Article Google Scholar
Breiman, L.: Random forests. Machine Learn. 45(1), 5–32 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, West Bengal State University, Kolkata, India
Himadri Mukherjee & Kaushik Roy
Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, Kolkata, India
Santanu Phadikar

Authors

Himadri Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar
Santanu Phadikar
View author publications
You can also search for this author in PubMed Google Scholar
Kaushik Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Himadri Mukherjee .

Editor information

Editors and Affiliations

Kalyani University, Kalyani, India
Jyotsna Kumar Mandal
University of Calcutta, Kolkata, India
Devadatta Sinha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mukherjee, H., Phadikar, S., Roy, K. (2018). Segregation of Speech and Songs - A Precursor to Audio Interactive Applications. In: Mandal, J., Sinha, D. (eds) Social Transformation – Digital Way. CSI 2018. Communications in Computer and Information Science, vol 836. Springer, Singapore. https://doi.org/10.1007/978-981-13-1343-1_5

Download citation

DOI: https://doi.org/10.1007/978-981-13-1343-1_5
Published: 24 August 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1342-4
Online ISBN: 978-981-13-1343-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics