Skip to main content

Audio Source Separation from a Monaural Mixture Using Convolutional Neural Network in the Time Domain

  • Conference paper
  • First Online:
Book cover Advances in Neural Networks - ISNN 2017 (ISNN 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10262))

Included in the following conference series:

Abstract

Audio source separation from a monaural mixture, which is termed as monaural source separation, is an important and challenging problem for applications. In this paper, a monaural source separation method using convolutional neural network in the time domain is proposed. The proposed neural network, input and output of which are both time-domain signals, consists of three convolutional layers, each of which is followed by a max-pooling layer, and two fully-connected layers. There are two key ideas behind the time-domain convolutional network: one is learning features automatically by the convolutional layers instead of extracting features such as spectra; the other is that the phase can be recovered automatically since both the input and output are in the time domain. The proposed approach is evaluated using the TSP speech corpus for monaural source separation, and achieves around 4.31–7.77 SIR gain with respect to the deep neural network, the recurrent neural network and nonnegative matrix factorization, while maintaining better SDR and SAR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://pengzhxyz.github.io/bss-time-cnn.

References

  1. Huang, P.S., Chen, S.D., Smaragdis, P., Hasegawa-Johnson, M.: Singing-voice separation from monaural recordings using robust principal component analysis. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 57–60. IEEE (2012)

    Google Scholar 

  2. Huang, P.S., Kim, M., Hasegawa-Johnson, M., Smaragdis, P.: Deep learning for monaural speech separation. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1562–1566. IEEE (2014)

    Google Scholar 

  3. Huang, P.S., Kim, M., Hasegawa-Johnson, M., Smaragdis, P.: Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Transa. Audio Speech Lang. Process. 23(12), 2136–2147 (2015)

    Article  Google Scholar 

  4. Kabal, P.: TSP Speech Database. McGill University, Database Version 1(0), 09–02 (2002)

    Google Scholar 

  5. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)

    Article  Google Scholar 

  6. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)

    Google Scholar 

  7. Raffel, C., McFee, B., Humphrey, E.J., Salamon, J., Nieto, O., Liang, D., Ellis, D.P., Raffel, C.C.: mir_eval: a transparent implementation of common MIR metrics. In: Proceedings of the 15th International Society for Music Information Retrieval Conference, ISMIR. Citeseer (2014)

    Google Scholar 

  8. Smaragdis, P., Raj, B., Shashanka, M.: A probabilistic latent variable model for acoustic modeling. In: Advances in Models for Acoustic Processing, NIPS 148, 8–1 (2006)

    Google Scholar 

  9. Vincent, E., Gribonval, R., Févotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)

    Article  Google Scholar 

  10. Vinyals, O., Ravuri, S.V., Povey, D.: Revisiting recurrent neural networks for robust ASR. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4085–4088. IEEE (2012)

    Google Scholar 

  11. Wang, Y., Narayanan, A., Wang, D.: On training targets for supervised speech separation. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1849–1858 (2014)

    Article  Google Scholar 

  12. Wang, Y., Wang, D.: Towards scaling up classification-based speech separation. IEEE Trans. Audio Speech Lang. Process. 21(7), 1381–1390 (2013)

    Article  Google Scholar 

  13. Wang, Y., Wang, D.: A deep neural network for time-domain signal reconstruction. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4390–4394. IEEE (2015)

    Google Scholar 

  14. Williamson, D.S., Wang, Y., Wang, D.: Complex ratio masking for joint enhancement of magnitude and phase. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5220–5224. IEEE (2016)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grant 61071208.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaohong Ma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zhang, P., Ma, X., Ding, S. (2017). Audio Source Separation from a Monaural Mixture Using Convolutional Neural Network in the Time Domain. In: Cong, F., Leung, A., Wei, Q. (eds) Advances in Neural Networks - ISNN 2017. ISNN 2017. Lecture Notes in Computer Science(), vol 10262. Springer, Cham. https://doi.org/10.1007/978-3-319-59081-3_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59081-3_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59080-6

  • Online ISBN: 978-3-319-59081-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics