Skip to main content

Segregation of Speech and Songs - A Precursor to Audio Interactive Applications

  • Conference paper
  • First Online:
Social Transformation – Digital Way (CSI 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 836))

Included in the following conference series:

  • 827 Accesses

Abstract

Audio interactive applications have eased our lives in numerous ways encompassing speech recognition to song identification. Such applications have helped the common people in using Information Technology by providing them a passage for skipping the complicated user interactivity procedures. Audio based search applications have become very popular nowadays especially for searching songs. A system which can distinguish between speech and songs can help to boost the performance of such applications by minimizing the search space and at the same time decide the method of recognition based on the type of audio. It can also help in music-speech separation from audio for karaoke development. In this paper, a system to segregate songs and speech has been proposed using Line Spectral Pair based features. The system has been tested on a database of 19374 clips and a highest accuracy of 99.88% has been obtained with Ensemble Learning based classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Al-Shoshan, A.I.: Speech and music classification and separation: a review. J. King Saud Univ. 19(1), 95–133 (2006)

    Google Scholar 

  2. Gao, T., Du, J., Dai, L.R., Lee, C.H.: Joint training of front-end and back-end deep neural networks for robust speech recognition. In: Proceedings of ICASSP-2015, pp. 4375–4379 (2015)

    Google Scholar 

  3. Giri, R., Seltzer, M.L., Droppo, J., Yu, D.: Improving speech recognition in reverberation using a room-aware deep neural network and multitask learning. In: Proceedings of ICASSP-2015, pp. 5014–5018 (2015)

    Google Scholar 

  4. Ritter, M., Mueller, M., Stueker, S., Metze, F., Waibel, A.: Training deep neural networks for reverberation robust speech recognition. In: ITG Symposium on Speech Communication, pp. 1–5 (2016)

    Google Scholar 

  5. Wold, E., Blum, T., Keislar, D., Wheaten, J.: Content-based classification, search, and retrieval of audio. IEEE Multimedia 3(3), 27–36 (1996)

    Article  Google Scholar 

  6. Mazzoni, D., Dannenberg, R.B.: Melody matching directly from audio. In: Proceedings of ISMIR-2001, pp. 17–18 (2001)

    Google Scholar 

  7. Foote, J.T.: Content-based retrieval of music and audio. In: Multimedia Storage and Archiving Systems II, pp. 138–148 (1997)

    Google Scholar 

  8. Prakash, K., Hepzibha, R.D.: Blind source separation for speech music and speech speech mixtures. Int. J. Comput. Appl. 110(12), 40–43 (2015)

    Google Scholar 

  9. Gerhard, D.B.: Computationally measurable differences between speech and song. Doctoral dissertation, School of Computing Science, Simon Fraser University (2003)

    Google Scholar 

  10. Ghosal, A., Chakraborty, R., Dhara, B.C., Saha, S.K.: A hierarchical approach for speech-instrumental-song classification. SpringerPlus 2(1), 526 (2013)

    Article  Google Scholar 

  11. Rong, F.: Audio classification method based on machine learning. In: Proceedings of ICITBS-2016, pp. 81–84 (2016)

    Google Scholar 

  12. Saunders, J.: Real-time discrimination of broadcast speech/music. In: Proceedings of ICASSP-1996, vol. 2, pp. 993–996 (1996)

    Google Scholar 

  13. Sadjadi, S.O., Ahadi, S.M., Hazrati, O.: Unsupervised speech/music classification using one-class support vector machines. In: Proceedings of ICICS-2007, pp. 1–5 (2007)

    Google Scholar 

  14. Thoshkahna, B., Sudha, V., Ramakrishnan, K.R.: A speech-music discriminator using HILN model based features. In: Proceedings of ICASSP-2006, vol. 5, pp. V 425-V 428 (2006)

    Google Scholar 

  15. Ethnologue. http://www.ethnologue.com. Accessed 1 Sept 2017

  16. Youtube. http://www.youtube.com. Accessed 1 Sept 2017

  17. Mukherjee, H., Rakshit, P., Phadikar, S., Roy, K.: REARC-a Bangla phoneme recognizer. In: Proceedings of ICADW-2016, pp. 177–180 (2016)

    Google Scholar 

  18. Paliwal, K.K.: On the use of line spectral frequency parameters for speech recognition. Digit. Signal Process. 2(2), 80–87 (1992)

    Article  Google Scholar 

  19. Breiman, L.: Random forests. Machine Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Himadri Mukherjee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mukherjee, H., Phadikar, S., Roy, K. (2018). Segregation of Speech and Songs - A Precursor to Audio Interactive Applications. In: Mandal, J., Sinha, D. (eds) Social Transformation – Digital Way. CSI 2018. Communications in Computer and Information Science, vol 836. Springer, Singapore. https://doi.org/10.1007/978-981-13-1343-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-1343-1_5

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-1342-4

  • Online ISBN: 978-981-13-1343-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics