Modeling Speech Processing and Recognition in the Auditory System Using the Multilevel Hypermap Architecture

  • Bernd Brückner
  • Thomas Wesarg
Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 78)


The Multilevel Hypermap Architecture (MHA) is an extension of the Hypermap introduced by Kohonen. By means of the MHA it is possible to analyze structured or hierarchical data (data with priorities, data with context, time series, data with varying exactness), which is difficult or impossible to do with known self-organizing maps so far.

In the first section of this chapter the theoretical work of the previous years about the MHA and its learning algorithm are summarized. After discussion of a simple example, which demonstrates the behavior of the MHA, results from MHA applications for classification of moving objects and analysis of images from functional Magnetic Resonance Imaging (fMRI) are given.

In the second section one application using the MHA within a system for speech processing and recognition will be explained in detail. Our approach to the implementation of this system is the simulation of the human auditory system operations in hearing and speech recognition using a multistage auditory system model. The goal of this system is to combine two different abstraction levels, a more biological level for peripheral auditory processing and the abstract behavior of an artificial neural network. The multistage model consists of the coupled models of neural signal processing at three different levels of the auditory system.

A model of peripheral auditory signal processing by the cochlea forms the input stage of the overall model. This model is capable of generating spatio-temporal firing rate patterns of the auditory nerve for simple acoustic as well as speech stimuli.

An uniform lateral inhibitory neural network (LIN) system performs an estimation of the spectrum of the speech stimuli by spatial processing of the cochlear model’s neural response patterns.

Finally, the Multilevel Hypermap Architecture is used for learning and recognition of the spectral representations of the speech stimuli provided by the LIN system.


Firing Rate Input Vector Speech Recognition Auditory System Auditory Nerve 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Bibliography on Chapter 7

  1. 1.
    T. Kohonen. The hypermap architecture. In Kohonen et al. [23], pages 1357–1360.Google Scholar
  2. 2.
    T. Kohonen. What generalizations of the self-organizing map make sense? In M. Marinaro and P.G. Morasso, editors, ICANN’94, pages 292–297, London, 1994. Springer Verlag.CrossRefGoogle Scholar
  3. 3.
    B. Bruckner, M. Franz, and A. Richter. A modified hypermap architecture for classification of biological signals. In I. Aleksander and J. Taylor, editors, Artificial Neural Networks,2, pages 1167–1170, Amsterdam, 1992. Elsevier Science Publishers.Google Scholar
  4. 4.
    B. Bruckner, T. Wesarg, and C. Blumenstein. Improvements of the modified hypermap architecture for speech recognition. In Proc. Inter. Conf. Neural Networks, volume 5, pages 2891–2895, Perth, Australia, 1995.Google Scholar
  5. 5.
    B. Bruckner. Improvements in the analysis of structured data with the multilevel hypermap architecture. In Kasabov, editor, Progress in Connectionist-Based Information Systems, Proceedings of the ICONIP97, volume 1, pages 342–345, Singapore, 1997. Springer-Verlag.Google Scholar
  6. 6.
    C. Blumenstein, B. Bruckner, R. Mecke, T. Wesarg, and C. Schauer. Using a modified hypermap for analyzing moving scenes. In Shun ichi Amari, editor, Progress in Neural Information Processing, Proceedings of the ICONIP96, volume 1, pages 428431, Singapore, 1996. Springer-Verlag.Google Scholar
  7. 7.
    B. Bruckner, B. Gaschler-Markefski, H. Hofmeister, and H. Scheich. Detection of nonstationarities in functional mri data sets using the multilevel hypermap architecture. In Proceedings of the IJCNN’99, Washington D.C., 1999.Google Scholar
  8. 8.
    B. Gaschler-Markefski, F. Baumgart, C. Tempelmann, F. Schindler, H. J. Heinze, and H. Scheich. Statistical methods in functional magnetic resonance imaging with respect to non-stationary time series: auditory cortex activity. Magn. Reson. Med., 38: 811–820, 1997.CrossRefGoogle Scholar
  9. 9.
    J.M. Kates. A time-domain digital cochlear model. IEEE Transactions on Signal Processing, 39 (12): 2573–2592, 1991.CrossRefGoogle Scholar
  10. 10.
    S. Shamma. Spatial and temporal processing in central auditory networks. In C. Koch and I. Segev, editors, Methods in Neuronal Modeling, pages 247–289, Cambridge, 1989. The MIT Press.Google Scholar
  11. 11.
    J.B. Allen. A hair-cell model of neural response. In E. deBoer and M.A. Viergever, editors, Mechanics of Hearing, pages 193–202. Martinus Nijhoff, Hague, The Netherlands, 1983.CrossRefGoogle Scholar
  12. 12.
    H. Davis. A mechanoelectrical theory of cochlear action. Ann. Oto-Rhino-Laryngol., 67: 789–801, 1956.Google Scholar
  13. 13.
    T. Wesarg, B. Bruckner, and C. Schauer. Modelling speech processing and recognition in the auditory system with a three-stage architecture. In C. von der Malsburg, editor, Artificial Neural Networks–ICANN 96, Lecture Notes in Computer Science, volume 1112, pages 679–684, Berlin, 1996. Springer-Verlag.CrossRefGoogle Scholar
  14. 14.
    B. Bruckner and W. Zander. Classification of speech using a modified hypermap architecture. In I. Aleksander and J. Taylor, editors, Proceedings of the WCNN’93, volume III, pages 75–78, Hillsdale, 1993. Lawrence Earlbaum Associates.Google Scholar
  15. 15.
    T. Kohonen. Self-Organization and Associative Memory. Springer-Verlag, New York, 1988.CrossRefMATHGoogle Scholar
  16. 16.
    T. Kohonen. Self-Organizing Maps. Springer-Verlag, Berlin, 1997.CrossRefMATHGoogle Scholar
  17. 17.
    J. Kangas. Time-dependent self-organizing maps for speech recognition. In Kohonen et al. [23], pages 1591–1594.Google Scholar
  18. 18.
    S.A. Shamma. The acoustic features of speech phonemes in a model of the auditory system: Vowels and unvoiced fricatives. J. of Phonetics, 16: 77–91, 1988.Google Scholar
  19. 19.
    J. Kangas. the analysis of pattern sequences by self-organizing maps, 1994.Google Scholar
  20. 20.
    T. Voegtlin and P.F. Dominey. Contextual self-organizing maps: An adaptive representation of context for sequence learning, 1998–01.Google Scholar
  21. 21.
    T. Graepel, M. Burger, and K. Obermayer. Self-organizing maps: generalizations and new optimization techniques. Neurocomputing, 21: 173–190, 1998.CrossRefMATHGoogle Scholar
  22. 22.
    F. Mehler and P. Wilcox. Self-organizing maps in speech recognition systems. In F. G. Bobel and T. Wagner, editors, Proc. of the First Int. Conf on Appl. Synergetics and Synergetic Engineering (ICASSE’94), pages 20–26, Erlangen, Germany, 1994.Google Scholar
  23. 23.
    T. Kohonen, K. Mäkisara, O. Simula, and J. Kangas, editors. Artificial Neural Networks, Helsinki, 1991. Elsevier Science Publishers.MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Bernd Brückner
  • Thomas Wesarg

There are no affiliations available

Personalised recommendations