Non-stationary Noise Cancellation Using Deep Autoencoder Based on Adversarial Learning

  • Kyung-Hyun Lim
  • Jin-Young Kim
  • Sung-Bae ChoEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11871)


Studies have been conducted to get a clean data from non-stationary noisy signal, which is one of the areas in speech enhancement. Since conventional methods rely on first-order statistics, the effort to eliminate noise using deep learning method is intensive. In the real environment, many types of noises are mixed with the target sound, resulting in difficulty to remove only noises. However, most of previous works modeled a small amount of non-stationary noise, which is hard to be applied in real world. To cope with this problem, we propose a novel deep learning model to enhance the auditory signal with adversarial learning of two types of discriminators. One discriminator learns to distinguish a clean signal from the enhanced one by the generator, and the other is trained to recognize the difference between eliminated noise signal and real noise signal. In other words, the second discriminator learns the waveform of noise. Besides, a novel learning method is proposed to stabilize the unstable adversarial learning process. Compared with the previous works, to verify the performance of the propose model, we use 100 kinds of noise. The experimental results show that the proposed model has better performance than other conventional methods including the state-of-the-art model in removing non-stationary noise. To evaluate the performance of our model, the scale-invariant source-to-noise ratio is used as an objective evaluation metric. The proposed model shows a statistically significant performance of 5.91 compared with other methods in t-test.


Speech enhancement Noise cancellation Non-stationary noise Generative adversarial networks Autoencoder 



This work was supported by grant funded by 2019 IT promotion fund (Development of AI based Precision Medicine Emergency System) of the Korean government (Ministry of Science and ICT).


  1. 1.
    Tamura, S., Waibel, A.: Noise reduction using connectionist models. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 553–556 (1988)Google Scholar
  2. 2.
    Xu, Y., Du, J., Dai, L.R., Lee, C.H.: A regression approach to speech enhancement based on deep neural networks. In: IEEE/ACM Trans. on Audio, Speech, and Language Processing, vol. 23, no. 1, pp. 7–19 (2014)Google Scholar
  3. 3.
    Kumar, A., Florencio, D.: Speech Enhancement in Multiple-noise Conditions Using Deep Neural Networks. arXiv:1605.02427 (2016)
  4. 4.
    Weninger, F., et al.: Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In: International Conference on Latent Variable Analysis and Signal Separation, pp. 91–99 (2015)CrossRefGoogle Scholar
  5. 5.
    Kim, J.Y., Bu, S.J., Cho, S.B.: Hybrid deep learning based on GAN for classifying BSR noises from invehicle sensors. In: International Conference on Hybrid Artificial Intelligent Systems, pp. 561–572 (2018)Google Scholar
  6. 6.
    Pascual, S., Bonafonte, A., Serra, J.: SEGAN: Speech Enhancement Generative Adversarial Network. arXiv:1703.09452 (2017)
  7. 7.
    Lim, J., Oppenheim, A.: All-pole modeling of degraded speech. In: IEEE Transactions on Acoustics Speech and Signal Processing, vol. 26, no. 3, pp. 197–210 (1978)Google Scholar
  8. 8.
    Berouti, M., Schwartz, R., Makhoul, J.: Enhancement of speech corrupted by acoustic noise. In: International Conference on Acoustics Speech and Signal Processing, vol. 4, pp. 208–211 (1979)Google Scholar
  9. 9.
    Maas, A.L., Le, Q.V., O’Neil, T.M., Vinyals, O., Nguyen, P., Nguyen, A.Y.: Recurrent neural networks for noise reduction in robust ASR. In: InterSpeech, pp. 22–25 (2012)Google Scholar
  10. 10.
    Sun, L., Du, J., Dai, L.R., Lee, C.H.: Multiple-target deep learning for LSTM-RNN based speech enhancement. In: IEEE Hands-free Speech Communications and Microphone Arrays, pp. 136–140 (2017)Google Scholar
  11. 11.
    Parveen, S., Green, P.: Speech enhancement with missing data techniques using recurrent neural networks. In: IEEE International Conference on Acoustics Speech and Signal Processing, pp. 733–736 (2004)Google Scholar
  12. 12.
    Lu, X., Tsao, Y., Matsuda, S., Hori, C.: Speech enhancement based on deep denoising autoencoder. In: InterSpeech, pp. 436–440 (2013)Google Scholar
  13. 13.
    Cohen, I.: Multichannel post-filtering in nonstationary noise environments. In: IEEE Transactions on Signal Processing, vol. 52, no. 5, pp. 1149–1160 (2004)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Goodfellow, I., et al.: Generative adversarial nets. In: Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
  15. 15.
    Kim, J.Y., Bu, S.J., Cho, S.B.: Zero-day malware detection using transferred generative adversarial networks based on deep autoencoders. In: Information Sciences, vol. 460–461, pp. 83–102 (2018)CrossRefGoogle Scholar
  16. 16.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.: Image-to-Image Translation with Conditional Adversarial Networks. arXiv:1611.07004 (2016)
  17. 17.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  18. 18.
    Hu, G., Wang, D.L.: A tandem algorithm for pitch estimation and voiced speech segregation. In: IEEE Trans. on Audio Speech and Language Processing, vol. 18, pp. 2067–2079 (2010)Google Scholar
  19. 19.
    Valentini, C.: Noisy speech database for training speech enhancement algorithms and TTS models. In: University of Edinburgh, School of Informatics, Centre for Speech Technology Research (2016)Google Scholar
  20. 20.
    Luo, Y., Mesgarani, N.: TasNet: Surpassing Ideal Time-frequency Masking for Speech Separation. arXiv:1809.07454 (2018)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceYonsei UniversitySeoulSouth Korea

Personalised recommendations