Robustness of LSTM Neural Networks for the Enhancement of Spectral Parameters in Noisy Speech Signals

  • Marvin Coto-JiménezEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11289)


In this paper, we carry out a comparative performance analysis of Long Short-term Memory (LSTM) Neural Networks for the task of noise reduction. Recent work in this area has shown the advantages of this kind of network for the enhancement of noisy speech, particularly when the training process is performed for specific Signal-to-Noise (SNR) levels.

For application in real-life environments, it is important to test the robustness of the approach without the a priori knowledge of the SNR noise levels, as classical signal processing-based algorithms do. In our experiments, we conduct the training stage with single and multiple noise conditions and perform the comparison of the results with the specific SNR training presented previously in the literature.

For the first time, results give a measure on the independence of the training conditions for the task of noise suppression in speech signals, and shows remarkable robustness of the LSTM for different SNR levels.


Deep learning LSTM MFCC Neural networks Speech enhancement 



This work was supported by the Universidad de Costa Rica.


  1. 1.
    Abdel-Hamid, O., Mohamed, A.R., Jiang, H., Penn, G.: Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In: Acoustics, Speech and Signal Processing, pp. 4277–4280. IEEE (2012)Google Scholar
  2. 2.
    Bagchi, D., Mandel, M.I., Wang, Z., He, Y., Plummer, A., Fosler-Lussier, E.: Combining spectral feature mapping and multi-channel model-based source separation for noise-robust automatic speech recognition. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 496–503. IEEE (2015)Google Scholar
  3. 3.
    Coto-Jiménez, M., Goddard-Close, J., Martínez-Licona, F.: Improving automatic speech recognition containing additive noise using deep denoising autoencoders of LSTM networks. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 354–361. Springer, Cham (2016). Scholar
  4. 4.
    Deng, L., et al.: Recent advances in deep learning for speech research at Microsoft. In: ICASSP, vol. 26, p. 64 (2013)Google Scholar
  5. 5.
    Du, J., Wang, Q., Gao, T., Xu, Y., Dai, L.R., Lee, C.H.: Robust speech recognition with speech enhanced deep neural networks. In: Association (2014)Google Scholar
  6. 6.
    Erro, D., Sainz, I., Navas, E., Hernáez, I.: Improved HNM-based vocoder for statistical synthesizers. In: Association (2011)Google Scholar
  7. 7.
    Fan, Y., Qian, Y., Xie, F.L., Soong, F.K.: TTS synthesis with bidirectional LSTM based recurrent neural networks. In: Association (2014)Google Scholar
  8. 8.
    Feng, X., Zhang, Y., Glass, J.: Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1759–1763. IEEE (2014)Google Scholar
  9. 9.
    Gers, F.A., Schraudolph, N.N., Schmidhuber, J.: Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 3(Aug), 115–143 (2002)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Graves, A., Fernández, S., Schmidhuber, J.: Bidirectional LSTM networks for improved phoneme classification and recognition. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 799–804. Springer, Heidelberg (2005). Scholar
  11. 11.
    Graves, A., Jaitly, N., Mohamed, A.R.: Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 273–278. IEEE (2013)Google Scholar
  12. 12.
    Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2017)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Han, K., He, Y., Bagchi, D., Fosler-Lussier, E., Wang, D.: Deep neural network based spectral feature mapping for robust speech recognition. In: Association (2015)Google Scholar
  14. 14.
    Hansen, J.H., Pellom, B.L.: An effective quality evaluation protocol for speech enhancement algorithms. In: Fifth International Conference on Spoken Language Processing (1998)Google Scholar
  15. 15.
    Healy, E.W., Yoho, S.E., Wang, Y., Wang, D.: An algorithm to improve speech recognition in noise for hearing-impaired listeners. J. Acoust. Soc. Am. 134(4), 3029–3038 (2013)CrossRefGoogle Scholar
  16. 16.
    Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sign. Process. Mag. 29(6), 82–97 (2012)CrossRefGoogle Scholar
  17. 17.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  18. 18.
    Huang, J., Kingsbury, B.: Audio-visual deep learning for noise robust speech recognition, pp. 7596–7599. IEEE (2013)Google Scholar
  19. 19.
    Ishii, T., Komiyama, H., Shinozaki, T., Horiuchi, Y., Kuroiwa, S. (eds.): In: Interspeech, pp. 3512–3516 (2013)Google Scholar
  20. 20.
    Kominek, J., Black, A.W.: The CMU Arctic speech databases. In: Fifth ISCA Workshop on Speech Synthesis (2004)Google Scholar
  21. 21.
    Kumar, A., Florencio, D.: Speech enhancement in multiple-noise conditions using deep neural networks. arXiv preprint arXiv:1605.02427 (2016)
  22. 22.
    Maas, A.L., Le, Q.V., O’Neil, T.M., Vinyals, O., Nguyen, P., Ng, A.Y.: Recurrent neural networks for noise reduction in robust ASR. In: Association (2012)Google Scholar
  23. 23.
    Narayanan, A., Wang, D.: Ideal ratio mask estimation using deep neural networks for robust speech recognition, pp. 7092–7096. IEEE (2013)Google Scholar
  24. 24.
    Seltzer, M.L., Yu, D., Wang, Y.: An investigation of deep neural networks for noise robust speech recognition, pp. 7398–7402. IEEE (2013)Google Scholar
  25. 25.
    Sertsi, P., Boonkla, S., Chunwijitra, V., Kurpukdee, N., Wutiwiwatchai, C.: Robust voice activity detection based on LSTM recurrent neural networks and modulation spectrum. In: 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 342–346. IEEE (2017)Google Scholar
  26. 26.
    Vincent, E., Watanabe, S., Nugraha, A.A., Barker, J., Marxer, R.: An analysis of environment, microphone and data simulation mismatches in robust speech recognition. Comput. Speech Lang. 46, 535–557 (2017)CrossRefGoogle Scholar
  27. 27.
    Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(Dec), 3371–3408 (2010)MathSciNetzbMATHGoogle Scholar
  28. 28.
    Weninger, F., Geiger, J., Wöllmer, M., Schuller, B., Rigoll, G.: Feature enhancement by deep lstm networks for asr in reverberant multisource environments. Comput. Speech Lang. 28(4), 888–902 (2014)CrossRefGoogle Scholar
  29. 29.
    Weninger, F., Watanabe, S., Tachioka, Y., Schuller, B.: Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4623–4627. IEEE (2014)Google Scholar
  30. 30.
    Xu, Y., Du, J., Dai, L.R., Lee, C.H.: An experimental study on speech enhancement based on deep neural networks. IEEE Sign. Process. Lett. 21(1), 65–68 (2014)CrossRefGoogle Scholar
  31. 31.
    Zen, H., Sak, H.: Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4470–4474. IEEE (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.PRIS-Lab, Escuela de Ingeniería EléctricaSan PedroCosta Rica
  2. 2.Universidad de Costa RicaSan JoséCosta Rica

Personalised recommendations