Advertisement

Speech denoising using Bayesian NMF with online base update

  • Weili Zhou
  • Zhen Zhu
  • Peiying Liang
Article
  • 1 Downloads

Abstract

A new speech denoising method based on online non-negative matrix factorization (NMF) is proposed in this paper. To achieve an efficient model for the temporal dependencies of speech and noise, and to improve the robustness for the actual non-stationary noisy environments, the Bayesian NMF is extended to the proposed model and a new noise basis matrix online update method is exploited. Firstly, the speech basis matrix is pre-trained off-line with the Bayesian NMF method. In speech denoising stage, the noise basis matrix is continuously updated by utilizing the noise frames in the noisy observation with the Bayesian NMF. The noise basis matrix is initialized via a pre-trained universal noise NMF model and the noise data for the matrix adaption are selected using a likelihood ratio test (LRT) speech decision criterion. Then the updated noise basis matrix and the pre-trained speech basis matrix are employed to the enhancement of the noisy signal. Finally, to address the incomplete separation and the speech distortion problem, a speech activity probability based noise suppression filter is presented to further eliminate the residue noise in the enhanced result. The experiment results show that the proposed method outperforms the comparative denoising algorithms in terms of objective measurement.

Keywords

Speech denoising NMF, Bayesian Online base update 

Notes

Acknowledgments

This work is supported by the Foshan University Research Foundation for Advanced Talents (GG07005).

References

  1. 1.
    ‘NOISEX-92 database’, http://www.speech.cs.cmu.edu/, accessed 1 January 2018
  2. 2.
    ‘TIMIT speech corpus’, https://catalog.ldc.upenn.edu/, accessed 20 September 2017
  3. 3.
    Cemgil AT (2009) Bayesian inference for nonnegative matrix factorisation models. Computational Intelligence and Neuroscience 2009(785152):17Google Scholar
  4. 4.
    Chen Y, Shi L, Feng Q et al (2014) Artifact Suppressed Dictionary Learning for Low-dose CT Image Processing. IEEE, Transaction on Medical Imaging 33(12):2271–2292CrossRefGoogle Scholar
  5. 5.
    Chen Y, Zhang Y, Yang J et al (2018) Structure-adaptive Fuzzy Estimation for Random-Valued Impulse Noise Suppression. IEEE Transactions on Circuits and Systems for Video Technology 28(2):414–427CrossRefGoogle Scholar
  6. 6.
    Cohen I (2002) Optimal speech enhancement under signal presence uncertainty using log-spectra amplitude estimator. IEEE Signal Process Lett 9(4):113–116CrossRefGoogle Scholar
  7. 7.
    Cohen I (2005) Relaxed statistical model for speech enhancement and a priori SNR estimation. IEEE Trans. Audio Speech and Lang Process. 13(5):870–881CrossRefGoogle Scholar
  8. 8.
    Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–38MathSciNetzbMATHGoogle Scholar
  9. 9.
    Févotte C, Nancy B, Jean LD (2009) Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis. Neural Comput 21(3):793–830CrossRefGoogle Scholar
  10. 10.
    Guan N, Tao D, Luo Z, Yuan B (2012) Online nonnegative matrix factorization with robust stochastic approximation. IEEE Trans Neural Netw Learn Syst 23(7)Google Scholar
  11. 11.
    Hazan E (2015) Introduction to Online Convex Optimization, Foundations and Trends® in OptimizationGoogle Scholar
  12. 12.
    ITU-T Rec. P.862 (2001) Perceptual Evaluation of Speech Quality (PESQ):An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecsGoogle Scholar
  13. 13.
    Kwon K, Jong WS, Nam SK (2015) NMF-based speech enhancement using bases update. IEEE Sig Process Lett 22(4):450–454CrossRefGoogle Scholar
  14. 14.
    Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791CrossRefGoogle Scholar
  15. 15.
    Lee SJ, Han DK, Ko HS (2017) Single-channel speech enhancement method using reconstructive NMF with spectrotemporal speech presence probabilities. Appl Acoust 117:257–262CrossRefGoogle Scholar
  16. 16.
    Loizou PC (2005) Speech enhancement based on perceptually motivated Bayesian estimators of the speech magnitude spectrum. IEEE Trans. Audio Speech and Lang Process. 13(5):857–869CrossRefGoogle Scholar
  17. 17.
    Loizou PC (2013) Speech enhancement: theory and practice (CRC Press)Google Scholar
  18. 18.
    Mairal J, Bach F, Ponce J, Sapiro G (2010) Online learning for matrix factorization and sparse coding. J Mach Learn Research 11:19–60MathSciNetzbMATHGoogle Scholar
  19. 19.
    Martin R (2005) Speech Enhancement Based on Minimum Mean-Square Error Estimation and Supergaussian Priorsm. IEEE Trans Audio Speech and Lang Process 13(5):845–856CrossRefGoogle Scholar
  20. 20.
    Mohammadiha N, Taghia J, Leijon A (2012) 'Single channel speech enhancement using Bayesian NMF with recursive temporal updates of prior distributions'. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing. 4561–4564Google Scholar
  21. 21.
    Mohammadiha N, Smaragdis P, Leijon A (2013) Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization. IEEE Trans. Audio Speech and Lang Process. 21(10):2140–2151CrossRefGoogle Scholar
  22. 22.
    Mysore GJ, Smaragdis P (2011) A non-negative approach to semisupervised separation of speech from noise with the use of temporal dynamics. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing:1919–1926Google Scholar
  23. 23.
    Rangachari S, Loizou P (2006) A noise estimation algorithm for highly nonstationary environments. Speech Comm 48(2):220–231CrossRefGoogle Scholar
  24. 24.
    Rebhan S, Sharif W, Eggert J (2009) Incremental learning in the non-negative matrix factorization, in Advances in Neuro-Information Processing. Berlin/Heidelberg, Germany: Springer, 960–969Google Scholar
  25. 25.
    Scalart P, Filho J (1996) Speech enhancement based on a priori signal to noiseestimation. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing. 629–632Google Scholar
  26. 26.
    Schmidt MN, Larsen J (2008) Reduction of non-stationary noise using a non-negative latent variable decomposition. Proc. IEEE Workshop on Machine Learning for Signal Process 486–491Google Scholar
  27. 27.
    Sohn J, Sung W (1998) A voice activity detector employing soft decision based noise spectrum adaptation. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing., 365–368Google Scholar
  28. 28.
    Sohn J, Kim NS, Sung W (1999) A statistical model based voice activity detection. IEEE Sig Process Lett 6(1):1–3CrossRefGoogle Scholar
  29. 29.
    Virtanen T (2007) Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans Audio Speech Lang Process 15(3):1066–1074CrossRefGoogle Scholar
  30. 30.
    Wang D, Vipperla R, Evans N (2011) Online pattern learning for nonnegative convolutive sparse coding, Interspeech, 65–68Google Scholar
  31. 31.
    Wilson KW, Raj B, Smaragdis P (2008) Regularized non-negative matrix factorization with temporal dependencies for speech denoising. Interspeech:411–414Google Scholar
  32. 32.
    Zhou WL, He QH, Wang YL et al (2017) Sparse representation-based quasi-clean speech construction for speech quality assessment under complex environments ', IET. Signal Process 11(4):486–493Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Electronic and Information EngineeringFoshan UniversityFoshanPeople’s Republic of China

Personalised recommendations