Abstract
In this paper we propose a feedforward neural network for syllable recognition. The core of the recognition system is based on a hierarchical architecture initially developed for visual object recognition. We show that, given the similarities between the primary auditory and visual cortexes, such a system can successfully be used for speech recognition. Syllables are used as basic units for the recognition. Their spectrograms, computed using a Gammatone filterbank, are interpreted as images and subsequently feed into the neural network after a preprocessing step that enhances the formant frequencies and normalizes the length of the syllables. The performance of our system has been analyzed on the recognition of 25 different monosyllabic words. The parameters of the architecture have been optimized using an evolutionary strategy. Compared to the Sphinx-4 speech recognition system, our system achieves better robustness and generalization capabilities in noisy conditions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lippmann, R.: Speech recognition by machines and humans. Speech Communication 22(1), 1–15 (1997)
Sur, M., Garraghty, P., Roe, A.: Experimentally induced visual projections into auditory thalamus and cortex. Science 242(4884), 1437–1441 (1988)
Shamma, S.: On the role of space and time in auditory processing. Trends in Cognitive Sciences 5(8), 340–348 (2001)
Chih, T., Ru, P., Shamma, S.: Multiresolution spectrotemporal analysis of complex sounds. Journal of the Acoustical Society of America 118, 887–906 (2005)
Elhilali, M., Shamma, S.: A bilogically-inspired approach to the cocktail party problem. In: Proc. ICASSP, vol. 5, pp. 637–640 (2006)
Mesgarani, N., Slaney, M., Shamma, S.: Discrimination of speech from non-speech based on multiscale spectro-temporal modulations. IEEE Transactions on Speech and Audio Processing, 920–930 (2006)
Riesenhuber, M., Poggio, T.: Hierachical models of object recognition in cortex. Nature Neuroscience 2, 1019–1025 (1999)
Wersing, H., Körner, E.: Learning optimized features for hierarchical models of invariant recognition. Neural Computation 15(7), 1559–1588 (2003)
Kleinschmidt, M., Gelbart, D.: Improving word accuracy with gabor feature extraction. In: ICSLP, Denver (2002)
Slaney, M.: An efficient implementation of the Patterson-Holdsworth auditory filterbank. Technical report, Apple Computer Co, Technical report #35 (1993)
Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research 5, 1457–1469 (2004)
Schneider, G., Wersing, H., Sendhoff, B., Körner, E.: Evolutionary optimization of a hierarchical object recognition model. IEEE Transaction on Systems, Man and Cybernetics. Part B: Cybernetics 35(3), 426–437 (2005)
Schwefel, H.P.: Evolution and Optimum Seeking. John Wiley and sons, New York (1995)
Bäck, T.: Evolutionary Algorithms in Theory and Practice. Oxford University Press, Oxford (1996)
Walker, W., Lamere, P., Kwok, P.: Sphinx-4: A flexible open source framework for speech recognition. Technical report, Sun Microsystems Inc. (2004)
Liu, F.H., Stern, R.M., Huang, X., Acero, A.: Efficient cepstral normalization for robust speech recognition. In: HLT 1993: Proceedings of the workshop on Human Language Technology, Morristown, NJ, USA, Association for Computational Linguistics, pp. 69–74 (1993)
Meyer, B., Kleinschmidt, M.: Robust speech recognition based on localized, spectro-temporal features. In: Elektronische Sprachsignalverarbeitung (ESSV) (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Domont, X. et al. (2007). Word Recognition with a Hierarchical Neural Network. In: Chetouani, M., Hussain, A., Gas, B., Milgram, M., Zarader, JL. (eds) Advances in Nonlinear Speech Processing. NOLISP 2007. Lecture Notes in Computer Science(), vol 4885. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77347-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-77347-4_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77346-7
Online ISBN: 978-3-540-77347-4
eBook Packages: Computer ScienceComputer Science (R0)