Modeling Speech Processing and Recognition in the Auditory System Using the Multilevel Hypermap Architecture
The Multilevel Hypermap Architecture (MHA) is an extension of the Hypermap introduced by Kohonen. By means of the MHA it is possible to analyze structured or hierarchical data (data with priorities, data with context, time series, data with varying exactness), which is difficult or impossible to do with known self-organizing maps so far.
In the first section of this chapter the theoretical work of the previous years about the MHA and its learning algorithm are summarized. After discussion of a simple example, which demonstrates the behavior of the MHA, results from MHA applications for classification of moving objects and analysis of images from functional Magnetic Resonance Imaging (fMRI) are given.
In the second section one application using the MHA within a system for speech processing and recognition will be explained in detail. Our approach to the implementation of this system is the simulation of the human auditory system operations in hearing and speech recognition using a multistage auditory system model. The goal of this system is to combine two different abstraction levels, a more biological level for peripheral auditory processing and the abstract behavior of an artificial neural network. The multistage model consists of the coupled models of neural signal processing at three different levels of the auditory system.
A model of peripheral auditory signal processing by the cochlea forms the input stage of the overall model. This model is capable of generating spatio-temporal firing rate patterns of the auditory nerve for simple acoustic as well as speech stimuli.
An uniform lateral inhibitory neural network (LIN) system performs an estimation of the spectrum of the speech stimuli by spatial processing of the cochlear model’s neural response patterns.
Finally, the Multilevel Hypermap Architecture is used for learning and recognition of the spectral representations of the speech stimuli provided by the LIN system.
KeywordsFiring Rate Input Vector Speech Recognition Auditory System Auditory Nerve
Unable to display preview. Download preview PDF.
Bibliography on Chapter 7
- 1.T. Kohonen. The hypermap architecture. In Kohonen et al. , pages 1357–1360.Google Scholar
- 3.B. Bruckner, M. Franz, and A. Richter. A modified hypermap architecture for classification of biological signals. In I. Aleksander and J. Taylor, editors, Artificial Neural Networks,2, pages 1167–1170, Amsterdam, 1992. Elsevier Science Publishers.Google Scholar
- 4.B. Bruckner, T. Wesarg, and C. Blumenstein. Improvements of the modified hypermap architecture for speech recognition. In Proc. Inter. Conf. Neural Networks, volume 5, pages 2891–2895, Perth, Australia, 1995.Google Scholar
- 5.B. Bruckner. Improvements in the analysis of structured data with the multilevel hypermap architecture. In Kasabov et.al., editor, Progress in Connectionist-Based Information Systems, Proceedings of the ICONIP97, volume 1, pages 342–345, Singapore, 1997. Springer-Verlag.Google Scholar
- 6.C. Blumenstein, B. Bruckner, R. Mecke, T. Wesarg, and C. Schauer. Using a modified hypermap for analyzing moving scenes. In Shun ichi Amari et.al., editor, Progress in Neural Information Processing, Proceedings of the ICONIP96, volume 1, pages 428431, Singapore, 1996. Springer-Verlag.Google Scholar
- 7.B. Bruckner, B. Gaschler-Markefski, H. Hofmeister, and H. Scheich. Detection of nonstationarities in functional mri data sets using the multilevel hypermap architecture. In Proceedings of the IJCNN’99, Washington D.C., 1999.Google Scholar
- 10.S. Shamma. Spatial and temporal processing in central auditory networks. In C. Koch and I. Segev, editors, Methods in Neuronal Modeling, pages 247–289, Cambridge, 1989. The MIT Press.Google Scholar
- 12.H. Davis. A mechanoelectrical theory of cochlear action. Ann. Oto-Rhino-Laryngol., 67: 789–801, 1956.Google Scholar
- 13.T. Wesarg, B. Bruckner, and C. Schauer. Modelling speech processing and recognition in the auditory system with a three-stage architecture. In C. von der Malsburg et.al., editor, Artificial Neural Networks–ICANN 96, Lecture Notes in Computer Science, volume 1112, pages 679–684, Berlin, 1996. Springer-Verlag.CrossRefGoogle Scholar
- 14.B. Bruckner and W. Zander. Classification of speech using a modified hypermap architecture. In I. Aleksander and J. Taylor, editors, Proceedings of the WCNN’93, volume III, pages 75–78, Hillsdale, 1993. Lawrence Earlbaum Associates.Google Scholar
- 17.J. Kangas. Time-dependent self-organizing maps for speech recognition. In Kohonen et al. , pages 1591–1594.Google Scholar
- 18.S.A. Shamma. The acoustic features of speech phonemes in a model of the auditory system: Vowels and unvoiced fricatives. J. of Phonetics, 16: 77–91, 1988.Google Scholar
- 19.J. Kangas. the analysis of pattern sequences by self-organizing maps, 1994.Google Scholar
- 20.T. Voegtlin and P.F. Dominey. Contextual self-organizing maps: An adaptive representation of context for sequence learning, 1998–01.Google Scholar
- 22.F. Mehler and P. Wilcox. Self-organizing maps in speech recognition systems. In F. G. Bobel and T. Wagner, editors, Proc. of the First Int. Conf on Appl. Synergetics and Synergetic Engineering (ICASSE’94), pages 20–26, Erlangen, Germany, 1994.Google Scholar