Word Recognition with a Hierarchical Neural Network

Domont, Xavier; Heckmann, Martin; Wersing, Heiko; Joublin, Frank; Menzel, Stefan; Sendhoff, Bernhard; Goerick, Christian

doi:10.1007/978-3-540-77347-4_11

Xavier Domont^1,2,
Martin Heckmann¹,
Heiko Wersing¹,
Frank Joublin¹,
Stefan Menzel¹,
Bernhard Sendhoff¹ &
…
Christian Goerick¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4885))

Included in the following conference series:

International Conference on Nonlinear Speech Processing

581 Accesses
2 Citations

Abstract

In this paper we propose a feedforward neural network for syllable recognition. The core of the recognition system is based on a hierarchical architecture initially developed for visual object recognition. We show that, given the similarities between the primary auditory and visual cortexes, such a system can successfully be used for speech recognition. Syllables are used as basic units for the recognition. Their spectrograms, computed using a Gammatone filterbank, are interpreted as images and subsequently feed into the neural network after a preprocessing step that enhances the formant frequencies and normalizes the length of the syllables. The performance of our system has been analyzed on the recognition of 25 different monosyllabic words. The parameters of the architecture have been optimized using an evolutionary strategy. Compared to the Sphinx-4 speech recognition system, our system achieves better robustness and generalization capabilities in noisy conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lippmann, R.: Speech recognition by machines and humans. Speech Communication 22(1), 1–15 (1997)
Article Google Scholar
Sur, M., Garraghty, P., Roe, A.: Experimentally induced visual projections into auditory thalamus and cortex. Science 242(4884), 1437–1441 (1988)
Article Google Scholar
Shamma, S.: On the role of space and time in auditory processing. Trends in Cognitive Sciences 5(8), 340–348 (2001)
Article Google Scholar
Chih, T., Ru, P., Shamma, S.: Multiresolution spectrotemporal analysis of complex sounds. Journal of the Acoustical Society of America 118, 887–906 (2005)
Article Google Scholar
Elhilali, M., Shamma, S.: A bilogically-inspired approach to the cocktail party problem. In: Proc. ICASSP, vol. 5, pp. 637–640 (2006)
Google Scholar
Mesgarani, N., Slaney, M., Shamma, S.: Discrimination of speech from non-speech based on multiscale spectro-temporal modulations. IEEE Transactions on Speech and Audio Processing, 920–930 (2006)
Google Scholar
Riesenhuber, M., Poggio, T.: Hierachical models of object recognition in cortex. Nature Neuroscience 2, 1019–1025 (1999)
Article Google Scholar
Wersing, H., Körner, E.: Learning optimized features for hierarchical models of invariant recognition. Neural Computation 15(7), 1559–1588 (2003)
Article MATH Google Scholar
Kleinschmidt, M., Gelbart, D.: Improving word accuracy with gabor feature extraction. In: ICSLP, Denver (2002)
Google Scholar
Slaney, M.: An efficient implementation of the Patterson-Holdsworth auditory filterbank. Technical report, Apple Computer Co, Technical report #35 (1993)
Google Scholar
Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research 5, 1457–1469 (2004)
MathSciNet Google Scholar
Schneider, G., Wersing, H., Sendhoff, B., Körner, E.: Evolutionary optimization of a hierarchical object recognition model. IEEE Transaction on Systems, Man and Cybernetics. Part B: Cybernetics 35(3), 426–437 (2005)
Article Google Scholar
Schwefel, H.P.: Evolution and Optimum Seeking. John Wiley and sons, New York (1995)
Google Scholar
Bäck, T.: Evolutionary Algorithms in Theory and Practice. Oxford University Press, Oxford (1996)
MATH Google Scholar
Walker, W., Lamere, P., Kwok, P.: Sphinx-4: A flexible open source framework for speech recognition. Technical report, Sun Microsystems Inc. (2004)
Google Scholar
Liu, F.H., Stern, R.M., Huang, X., Acero, A.: Efficient cepstral normalization for robust speech recognition. In: HLT 1993: Proceedings of the workshop on Human Language Technology, Morristown, NJ, USA, Association for Computational Linguistics, pp. 69–74 (1993)
Google Scholar
Meyer, B., Kleinschmidt, M.: Robust speech recognition based on localized, spectro-temporal features. In: Elektronische Sprachsignalverarbeitung (ESSV) (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Honda Research Institute Europe GmbH, D-63073 Offenbach am Main, Germany
Xavier Domont, Martin Heckmann, Heiko Wersing, Frank Joublin, Stefan Menzel, Bernhard Sendhoff & Christian Goerick
Technische Universität Darmstadt, Control Theory and Robotics Lab, D-64283 Darmstadt, Germany
Xavier Domont

Authors

Xavier Domont
View author publications
You can also search for this author in PubMed Google Scholar
Martin Heckmann
View author publications
You can also search for this author in PubMed Google Scholar
Heiko Wersing
View author publications
You can also search for this author in PubMed Google Scholar
Frank Joublin
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Menzel
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Sendhoff
View author publications
You can also search for this author in PubMed Google Scholar
Christian Goerick
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Mohamed Chetouani Amir Hussain Bruno Gas Maurice Milgram Jean-Luc Zarader

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Domont, X. et al. (2007). Word Recognition with a Hierarchical Neural Network. In: Chetouani, M., Hussain, A., Gas, B., Milgram, M., Zarader, JL. (eds) Advances in Nonlinear Speech Processing. NOLISP 2007. Lecture Notes in Computer Science(), vol 4885. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77347-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-77347-4_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77346-7
Online ISBN: 978-3-540-77347-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics