Joint Optimization of a Perceptual Modified Wiener Filtering Mask and Deep Neural Networks for Monaural Speech Separation

Han, Wei; Zhang, Xiongwei; Yang, Jibin; Sun, Meng; Min, Gang

doi:10.1007/978-3-319-48896-7_46

Wei Han¹⁶,
Xiongwei Zhang¹⁶,
Jibin Yang¹⁶,
Meng Sun¹⁶ &
…
Gang Min^16,17

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9917))

Included in the following conference series:

Pacific Rim Conference on Multimedia

2485 Accesses

Abstract

Due to the powerful feature extraction ability, deep learning has become a new trend towards solving speech separation problems. In this paper, we present a novel Deep Neural Network (DNN) architecture for monaural speech separation. Taking into account the good mask property of the human auditory system, a perceptual modified Wiener filtering masking function is applied in the proposed DNN architecture, which is used to make the residual noise perceptually inaudible. The proposed architecture jointly optimize the perceptual modified Wiener filtering mask and DNN. Evaluation experiments on TIMIT database with 20 noise types at different signal-to-noise ratio (SNR) situations demonstrate the superiority of the proposed method over the reference DNN-based separation methods, no matter whether the noise appeared in the training database or not.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Paliwal, K., Wjcicki, K., Schwerin, B.: Single-channel speech enhancement using spectral subtraction in the short-time modulation domain. Speech Commun. 52(5), 450–475 (2010)
Article Google Scholar
Lim, J.S., Oppenheim, A.V.: Enhancement and bandwidth compression of noisy speech. Proc. IEEE 67(12), 1586–1604 (1979)
Article Google Scholar
Gerkmann, T., Hendriks, R.C.: Unbiased MMSE-based noise power estimation with low complexity and low tracking delay. IEEE Trans. Audio Speech Lang. Process. 20(4), 1383–1393 (2012)
Article Google Scholar
Cohen, I.: Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Trans. Speech Audio Process. 11(5), 466–475 (2003)
Article Google Scholar
Sun, M., Li, Y.N., Gemmeke, J., Zhang, X.W.: Speech enhancement under low SNR conditions via noise estimation using sparse and low-rank NMF with Kullback-Leibler divergence. IEEE/ACM Trans. Audio Speech Lang. Process. 23(7), 1233–1242 (2015)
Article Google Scholar
Mohammadiha, N., Smaragdis, P., Leijon, A.: Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Trans. Audio Speech Lang. Process. 21(10), 2140–2151 (2013)
Article Google Scholar
Xu, Y., Du, J., Dai, L.R., Lee, C.H.: A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 23(1), 7–19 (2015)
Article Google Scholar
Wang, Y.X., Narayanan, A., Wang, D.L.: On training targets for supervised speech separation. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1849–1858 (2014)
Article Google Scholar
Huang, P.S., Kim, M., Johnson, M.H.: Joint optimization of masks and deep recurrent neural network for monaural source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 23(12), 2136–2147 (2015)
Article Google Scholar
Williamson, D.S., Wang, Y.X., Wang, D.L.: Complex ratio masking for monaural speech separation. IEEE/ACM Trans. Audio Speech Lang. Process. 24(3), 483–492 (2016)
Article Google Scholar
Sun, M., Zhang, X.W., Hamme, H.V., Zheng, T.F.: Unseen noise estimation using separable deep auto encoder for speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 24(1), 93–104 (2016)
Article Google Scholar
Xia, B.Y., Bao, C.C.: Wiener filtering based speech enhancement with weighted denoising auto-encoder and noise classification. Speech Commun. 60(2), 13–29 (2014)
Article Google Scholar
Alam, M.J., O’Shaughnessy, D., Selouani, S.A.: Speech enhancement based on novel two-step a priori SNR estimators. In: INTERSPEECH, pp. 565–568 (2008)
Google Scholar
Hu, Y., Loizou, P.C.: Incorporating a psychoacoustical model in frequency domain speech enhancement. IEEE Signal Process. Lett. 11(2), 270–273 (2004)
Article Google Scholar
Lin, L., Holmes, W.H., Ambikairajah, E.: Speech denoising using perceptual modification of Wiener filtering. IEE Electron. Lett. 38(23), 1486–1487 (2002)
Article Google Scholar
Amehraye, A., Pastor, D., Tamtaoui, A.: Perceptual improvement of Wiener filtering. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2081–2084 (2008)
Google Scholar

Download references

Acknowledgments

This work is supported by NSF of China (Grant No. 61471394, 61402519) and NSF of Jiangsu Province (Grant No. BK20140071, BK20140074).

Author information

Authors and Affiliations

Lab of Intelligent Information Processing, PLAUST, Nanjing, China
Wei Han, Xiongwei Zhang, Jibin Yang, Meng Sun & Gang Min
Xi’an Communications Institute, Xi’an, China
Gang Min

Authors

Wei Han
View author publications
You can also search for this author in PubMed Google Scholar
Xiongwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jibin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Meng Sun
View author publications
You can also search for this author in PubMed Google Scholar
Gang Min
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiongwei Zhang .

Editor information

Editors and Affiliations

Zhengzhou University, Zhengzhou, China
Enqing Chen
Jiaotong University, Xi’an, China
Yihong Gong
Zhengzhou University, Zhengzhou, China
Yun Tie

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, W., Zhang, X., Yang, J., Sun, M., Min, G. (2016). Joint Optimization of a Perceptual Modified Wiener Filtering Mask and Deep Neural Networks for Monaural Speech Separation. In: Chen, E., Gong, Y., Tie, Y. (eds) Advances in Multimedia Information Processing - PCM 2016. PCM 2016. Lecture Notes in Computer Science(), vol 9917. Springer, Cham. https://doi.org/10.1007/978-3-319-48896-7_46

Download citation

DOI: https://doi.org/10.1007/978-3-319-48896-7_46
Published: 27 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48895-0
Online ISBN: 978-3-319-48896-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics