Advertisement

Deep Maxout Networks Applied to Noise-Robust Speech Recognition

  • F. de-la-Calle-Silos
  • A. Gallardo-Antolín
  • C. Peláez-Moreno
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8854)

Abstract

Deep Neural Networks (DNN) have become very popular for acoustic modeling due to the improvements found over traditional Gaussian Mixture Models (GMM). However, not many works have addressed the robustness of these systems under noisy conditions. Recently, the machine learning community has proposed new methods to improve the accuracy of DNNs by using techniques such as dropout and maxout. In this paper, we investigate Deep Maxout Networks (DMN) for acoustic modeling in a noisy automatic speech recognition environment. Experiments show that DMNs improve substantially the recognition accuracy over DNNs and other traditional techniques in both clean and noisy conditions on the TIMIT dataset.

Keywords

noise robustness deep neural networks dropout deep maxout networks speech recognition deep learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bourlard, H., Morgan, N.: Connectionist Speech Recognition: A Hybrid Approach. Kluwer international series in engineering and computer science: VLSI, computer architecture, and digital signal processing. Springer US (1994)Google Scholar
  2. 2.
    Breiman, L.: Bagging predictors. Machine Learning 24(2) (1996)Google Scholar
  3. 3.
    Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech & Language Processing 20(1) (2012)Google Scholar
  4. 4.
    Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L.: DARPA TIMIT acoustic phonetic continuous speech corpus cdrom (1993)Google Scholar
  5. 5.
    Goodfellow, I.J., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout Networks. ArXiv e-prints (2013)Google Scholar
  6. 6.
    Hinton, G.E.: A practical guide to training restricted boltzmann machines. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 599–619. Springer, Heidelberg (2012)Google Scholar
  7. 7.
    Hinton, G.E., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6) (2012)Google Scholar
  8. 8.
    Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. CoRR (2012)Google Scholar
  9. 9.
    Hirsch, G.: Fant - filtering and noise adding tool (2005), http://dnt.kr.hsnr.de/download.html
  10. 10.
    Kim, C., Stern, R.M.: Power-normalized cepstral coefficients (PNCC) for robust speech recognition. IEEE Transactions on Audio, Speech, and Language ProcessingGoogle Scholar
  11. 11.
    Li, J., Deng, L., Gong, Y., Haeb-Umbach, R.: An overview of noise-robust automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22(4) (April 2014)Google Scholar
  12. 12.
    Miao, Y.: Kaldi+PDNN: Building DNN-based ASR systems with Kaldi and PDNN. CoRR (2014)Google Scholar
  13. 13.
    Miao, Y., Metze, F.: Improving low-resource CD-DNN-HMM using dropout and multilingual DNN training. In: INTERSPEECH, pp. 2237–2241. ISCA (2013)Google Scholar
  14. 14.
    Miao, Y., Metze, F., Rawat, S.: Deep maxout networks for low-resurce speech recognition. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, Czech Republic, December 8-12 (2013)Google Scholar
  15. 15.
    Mohamed, A., Dahl, G.E., Hinton, G.E.: Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech & Language Processing 20(1) (2012)Google Scholar
  16. 16.
    Morgan, N.: Deep and wide: Multiple layers in automatic speech recognition. IEEE Transactions on Audio, Speech & Language Processing 20(1) (2012)Google Scholar
  17. 17.
    Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society (2011)Google Scholar
  18. 18.
    Seltzer, M.L., Yu, D., Wang, Y.: An investigation of deep neural networks for noise robust speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2013)Google Scholar
  19. 19.
    Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11, 3371–3408 (2010)zbMATHMathSciNetGoogle Scholar
  20. 20.
    Wan, L., Zeiler, M.D., Zhang, S., LeCun, Y., Fergus, R.: Regularization of neural networks using dropconnect. In: Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, June 16-21 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • F. de-la-Calle-Silos
    • 1
  • A. Gallardo-Antolín
    • 1
  • C. Peláez-Moreno
    • 1
  1. 1.Department of Signal Theory and CommunicationsUniversidad Carlos III de MadridLeganésSpain

Personalised recommendations