Skip to main content

Word Recognition with a Hierarchical Neural Network

  • Conference paper
Advances in Nonlinear Speech Processing (NOLISP 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4885))

Included in the following conference series:

Abstract

In this paper we propose a feedforward neural network for syllable recognition. The core of the recognition system is based on a hierarchical architecture initially developed for visual object recognition. We show that, given the similarities between the primary auditory and visual cortexes, such a system can successfully be used for speech recognition. Syllables are used as basic units for the recognition. Their spectrograms, computed using a Gammatone filterbank, are interpreted as images and subsequently feed into the neural network after a preprocessing step that enhances the formant frequencies and normalizes the length of the syllables. The performance of our system has been analyzed on the recognition of 25 different monosyllabic words. The parameters of the architecture have been optimized using an evolutionary strategy. Compared to the Sphinx-4 speech recognition system, our system achieves better robustness and generalization capabilities in noisy conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lippmann, R.: Speech recognition by machines and humans. Speech Communication 22(1), 1–15 (1997)

    Article  Google Scholar 

  2. Sur, M., Garraghty, P., Roe, A.: Experimentally induced visual projections into auditory thalamus and cortex. Science 242(4884), 1437–1441 (1988)

    Article  Google Scholar 

  3. Shamma, S.: On the role of space and time in auditory processing. Trends in Cognitive Sciences 5(8), 340–348 (2001)

    Article  Google Scholar 

  4. Chih, T., Ru, P., Shamma, S.: Multiresolution spectrotemporal analysis of complex sounds. Journal of the Acoustical Society of America 118, 887–906 (2005)

    Article  Google Scholar 

  5. Elhilali, M., Shamma, S.: A bilogically-inspired approach to the cocktail party problem. In: Proc. ICASSP, vol. 5, pp. 637–640 (2006)

    Google Scholar 

  6. Mesgarani, N., Slaney, M., Shamma, S.: Discrimination of speech from non-speech based on multiscale spectro-temporal modulations. IEEE Transactions on Speech and Audio Processing, 920–930 (2006)

    Google Scholar 

  7. Riesenhuber, M., Poggio, T.: Hierachical models of object recognition in cortex. Nature Neuroscience 2, 1019–1025 (1999)

    Article  Google Scholar 

  8. Wersing, H., Körner, E.: Learning optimized features for hierarchical models of invariant recognition. Neural Computation 15(7), 1559–1588 (2003)

    Article  MATH  Google Scholar 

  9. Kleinschmidt, M., Gelbart, D.: Improving word accuracy with gabor feature extraction. In: ICSLP, Denver (2002)

    Google Scholar 

  10. Slaney, M.: An efficient implementation of the Patterson-Holdsworth auditory filterbank. Technical report, Apple Computer Co, Technical report #35 (1993)

    Google Scholar 

  11. Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research 5, 1457–1469 (2004)

    MathSciNet  Google Scholar 

  12. Schneider, G., Wersing, H., Sendhoff, B., Körner, E.: Evolutionary optimization of a hierarchical object recognition model. IEEE Transaction on Systems, Man and Cybernetics. Part B: Cybernetics 35(3), 426–437 (2005)

    Article  Google Scholar 

  13. Schwefel, H.P.: Evolution and Optimum Seeking. John Wiley and sons, New York (1995)

    Google Scholar 

  14. Bäck, T.: Evolutionary Algorithms in Theory and Practice. Oxford University Press, Oxford (1996)

    MATH  Google Scholar 

  15. Walker, W., Lamere, P., Kwok, P.: Sphinx-4: A flexible open source framework for speech recognition. Technical report, Sun Microsystems Inc. (2004)

    Google Scholar 

  16. Liu, F.H., Stern, R.M., Huang, X., Acero, A.: Efficient cepstral normalization for robust speech recognition. In: HLT 1993: Proceedings of the workshop on Human Language Technology, Morristown, NJ, USA, Association for Computational Linguistics, pp. 69–74 (1993)

    Google Scholar 

  17. Meyer, B., Kleinschmidt, M.: Robust speech recognition based on localized, spectro-temporal features. In: Elektronische Sprachsignalverarbeitung (ESSV) (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Mohamed Chetouani Amir Hussain Bruno Gas Maurice Milgram Jean-Luc Zarader

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Domont, X. et al. (2007). Word Recognition with a Hierarchical Neural Network. In: Chetouani, M., Hussain, A., Gas, B., Milgram, M., Zarader, JL. (eds) Advances in Nonlinear Speech Processing. NOLISP 2007. Lecture Notes in Computer Science(), vol 4885. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77347-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77347-4_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77346-7

  • Online ISBN: 978-3-540-77347-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics