Abstract
Automatic speech recognition systems (ASR) suffer from performance degradation under noisy conditions. Recent work, using deep neural networks to denoise spectral input features for robust ASR, have proved to be successful. In particular, Long Short-Term Memory (LSTM) autoencoders have outperformed other state of the art denoising systems when applied to the mfcc’s of a speech signal. In this paper we also consider denoising LSTM autoencoders (DLSTMA), but instead use three different DLSTMAs and apply each to the mfcc’s, fundamental frequency, and energy features, respectively. Results are given using several kinds of additive noise at different intensity levels, and show how this collection of DLSTMA’s improves the performance of the ASR in comparison with the LSTM autoencoder.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Weninger, F., Watanabe, S., Tachioka, Y., Schuller, B.: Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4623–4627. IEEE (2014)
Bagchi, D., Mandel, M.I., Wang, Z., He, Y., Plummer, A., Fosler-Lussier, E.: Combining spectral feature mapping and multi-channel model-based source separation for noise-robust automatic speech recognition. In: Proceedings of IEEE ASRU (2015)
Kalinli, O., Seltzer, M.L., Droppo, J., Acero, A.: Noise adaptive training for robust automatic speech recognition. IEEE Trans. Audio Speech Lang. Process. 18(8), 1889–1901 (2010)
Ishii, T., Komiyama, H., Shinozaki, T., Horiuchi, Y., Kuroiwa, S.: Reverberant speech recognition based on denoising autoencoder. In: INTERSPEECH, pp. 3512–3516 (2013)
Zhang, Z., Wang, L., Kai, A., Yamada, T., Li, W., Iwahashi, M.: Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification. EURASIP J. Audio Speech Music Process. 2015(1), 1–13 (2015)
Delcroix, M., Yoshioka, T., Ogawa, A., Kubo, Y., Fujimoto, M., Ito, N., Nakamura, A.: Linear prediction-based dereverberation with advanced speech enhancement and recognition technologies for the REVERB challenge. In: Proceedings of REVERB Workshop (2014)
Kawase, T., Niwa, K., Hioka, Y., Kobayashi, K.: Selection of optimal array noise reduction parameter set for accurate speech recognition in various noisy environments. In: Western Pacific Acoustics Conference (2015)
Zhao, M., Wang, D., Zhang, Z., Zhang, X.: Music removal by denoising autoencoder in speech recognition. In: APSIPA 2015 (2015)
Seltzer, M.L., Yu, D., Wang, Y.: An investigation of deep neural networks for noise robust speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7398–7402 (2013)
Du, J., Wang, Q., Gao, T., Xu, Y., Dai, L.R., Lee, C.H.: Robust speech recognition with speech enhanced deep neural networks. In: INTERSPEECH, pp. 616–620 (2014)
Han, K., He, Y., Bagchi, D., Fosler-Lussier, E., Wang, D.: Deep neural network based spectral feature mapping for robust speech recognition. In: INTERSPEECH, pp. 2484–2488 (2015)
Maas, A.L., Le, Q.V., O’Neil, T.M., Vinyals, O., Nguyen, P., Ng, A.Y.: Recurrent neural networks for noise reduction in robust ASR. In: INTERSPEECH, pp. 22–25 (2012)
Deng, L., Li, J., Huang, J.T., Yao, K., Yu, D., Seide, F., Seltzer, M., Zweig, G., He, X., Williams, J., Gong, Y.: Recent advances in deep learning for speech research at Microsoft. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8604–8608 (2013)
Geiger, J.T., Weninger, F., Gemmeke, J.F., Wollmer, M., Schuller, B., Rigoll, G.: Memory-enhanced neural networks and NMF for robust ASR. IEEE/ACM Trans. Audio Speech Lang. Process. 22(6), 1037–1046 (2014)
Zen, H., Sak, H.: Unidirectional long short-term memory recurrent neural network with recurrent output layer for lowlatency speech synthesis. In: Submitted to ICASSP (2015)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Graves, A., Navdeep, J., Abdel-Rahman, M.: Hybrid speech recognition with deep bidirectional LSTM. In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) (2013)
Graves, A., Fernández, S., Schmidhuber, J.: Bidirectional LSTM networks for improved phoneme classification and recognition. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 799–804. Springer, Heidelberg (2005)
Fan, Y., Qian, Y., Xie, F.L., Soong, F.K.: TTS synthesis with bidirectional LSTM based recurrent neural networks. In: Interspeech, pp. 1964–1968 (2014)
Feng, X., Zhang, Y., Glass, J.: Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1759–1763 (2014)
Kominek, J., Black, A.W.: The CMU Arctic speech databases. In: Fifth ISCA Workshop on Speech Synthesis (2004)
Speechmatics. https://www.speechmatics.com
Erro, D., Sainz, I., Navas, E., Hernaez, I.: Improved HNM-based vocoder for statistical synthesizers. In: INTERSPEECH, pp. 1809–1812 (2011)
Acknowledgements
This work was supported by the SEP and CONACyT under the Program SEP-CONACyT, CB-2012-01, No. 182432, in Mexico, as well as the University of Costa Rica in Costa Rica.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Coto-Jiménez, M., Goddard-Close, J., Martínez-Licona, F. (2016). Improving Automatic Speech Recognition Containing Additive Noise Using Deep Denoising Autoencoders of LSTM Networks. In: Ronzhin, A., Potapova, R., Németh, G. (eds) Speech and Computer. SPECOM 2016. Lecture Notes in Computer Science(), vol 9811. Springer, Cham. https://doi.org/10.1007/978-3-319-43958-7_42
Download citation
DOI: https://doi.org/10.1007/978-3-319-43958-7_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43957-0
Online ISBN: 978-3-319-43958-7
eBook Packages: Computer ScienceComputer Science (R0)