Skip to main content

Joint Optimization of a Perceptual Modified Wiener Filtering Mask and Deep Neural Networks for Monaural Speech Separation

  • Conference paper
  • First Online:
Book cover Advances in Multimedia Information Processing - PCM 2016 (PCM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9917))

Included in the following conference series:

  • 2485 Accesses

Abstract

Due to the powerful feature extraction ability, deep learning has become a new trend towards solving speech separation problems. In this paper, we present a novel Deep Neural Network (DNN) architecture for monaural speech separation. Taking into account the good mask property of the human auditory system, a perceptual modified Wiener filtering masking function is applied in the proposed DNN architecture, which is used to make the residual noise perceptually inaudible. The proposed architecture jointly optimize the perceptual modified Wiener filtering mask and DNN. Evaluation experiments on TIMIT database with 20 noise types at different signal-to-noise ratio (SNR) situations demonstrate the superiority of the proposed method over the reference DNN-based separation methods, no matter whether the noise appeared in the training database or not.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Paliwal, K., Wjcicki, K., Schwerin, B.: Single-channel speech enhancement using spectral subtraction in the short-time modulation domain. Speech Commun. 52(5), 450–475 (2010)

    Article  Google Scholar 

  2. Lim, J.S., Oppenheim, A.V.: Enhancement and bandwidth compression of noisy speech. Proc. IEEE 67(12), 1586–1604 (1979)

    Article  Google Scholar 

  3. Gerkmann, T., Hendriks, R.C.: Unbiased MMSE-based noise power estimation with low complexity and low tracking delay. IEEE Trans. Audio Speech Lang. Process. 20(4), 1383–1393 (2012)

    Article  Google Scholar 

  4. Cohen, I.: Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Trans. Speech Audio Process. 11(5), 466–475 (2003)

    Article  Google Scholar 

  5. Sun, M., Li, Y.N., Gemmeke, J., Zhang, X.W.: Speech enhancement under low SNR conditions via noise estimation using sparse and low-rank NMF with Kullback-Leibler divergence. IEEE/ACM Trans. Audio Speech Lang. Process. 23(7), 1233–1242 (2015)

    Article  Google Scholar 

  6. Mohammadiha, N., Smaragdis, P., Leijon, A.: Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Trans. Audio Speech Lang. Process. 21(10), 2140–2151 (2013)

    Article  Google Scholar 

  7. Xu, Y., Du, J., Dai, L.R., Lee, C.H.: A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 23(1), 7–19 (2015)

    Article  Google Scholar 

  8. Wang, Y.X., Narayanan, A., Wang, D.L.: On training targets for supervised speech separation. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1849–1858 (2014)

    Article  Google Scholar 

  9. Huang, P.S., Kim, M., Johnson, M.H.: Joint optimization of masks and deep recurrent neural network for monaural source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 23(12), 2136–2147 (2015)

    Article  Google Scholar 

  10. Williamson, D.S., Wang, Y.X., Wang, D.L.: Complex ratio masking for monaural speech separation. IEEE/ACM Trans. Audio Speech Lang. Process. 24(3), 483–492 (2016)

    Article  Google Scholar 

  11. Sun, M., Zhang, X.W., Hamme, H.V., Zheng, T.F.: Unseen noise estimation using separable deep auto encoder for speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 24(1), 93–104 (2016)

    Article  Google Scholar 

  12. Xia, B.Y., Bao, C.C.: Wiener filtering based speech enhancement with weighted denoising auto-encoder and noise classification. Speech Commun. 60(2), 13–29 (2014)

    Article  Google Scholar 

  13. Alam, M.J., O’Shaughnessy, D., Selouani, S.A.: Speech enhancement based on novel two-step a priori SNR estimators. In: INTERSPEECH, pp. 565–568 (2008)

    Google Scholar 

  14. Hu, Y., Loizou, P.C.: Incorporating a psychoacoustical model in frequency domain speech enhancement. IEEE Signal Process. Lett. 11(2), 270–273 (2004)

    Article  Google Scholar 

  15. Lin, L., Holmes, W.H., Ambikairajah, E.: Speech denoising using perceptual modification of Wiener filtering. IEE Electron. Lett. 38(23), 1486–1487 (2002)

    Article  Google Scholar 

  16. Amehraye, A., Pastor, D., Tamtaoui, A.: Perceptual improvement of Wiener filtering. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2081–2084 (2008)

    Google Scholar 

Download references

Acknowledgments

This work is supported by NSF of China (Grant No. 61471394, 61402519) and NSF of Jiangsu Province (Grant No. BK20140071, BK20140074).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiongwei Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Han, W., Zhang, X., Yang, J., Sun, M., Min, G. (2016). Joint Optimization of a Perceptual Modified Wiener Filtering Mask and Deep Neural Networks for Monaural Speech Separation. In: Chen, E., Gong, Y., Tie, Y. (eds) Advances in Multimedia Information Processing - PCM 2016. PCM 2016. Lecture Notes in Computer Science(), vol 9917. Springer, Cham. https://doi.org/10.1007/978-3-319-48896-7_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48896-7_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48895-0

  • Online ISBN: 978-3-319-48896-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics