A Speech Recognition System using an Auditory Model and TOM Neural Network

  • E. Hartwich
  • F. Alexandre
Conference paper


This paper is devoted to a neurobiologically plausible approach for the design of speech processing systems. The temporal organization map (TOM) neural net model is a connectionist model for time representation. The definition of a generic neural unit, inspired by the neurobiological model of the cortical column, allows the model to be used for problems including the temporal dimension. In the framework of automatic speech recognition, TOM has been previously tested with conventional techniques of signal processing. An auditory model as front-end processor is now used with TOM, in order to test the efficiency and the accuracy of a physiologically based speech recognition system. Preliminary results axe presented for speaker-dependent and speaker-independent speech recognition experiments. The interest of auditory model is the possibility to develop more valuable processing and communication strategies between TOM and the front-end processor, including afferent and efferent information flow.


Speech Recognition Automatic Speech Recognition Basilar Membrane Automatic Gain Control Auditory Nerve Fibre 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    W. A. Ainsworth. Auditory mechanisms for speech perception. In Proc. of Euro speech’95, pages 171–178, Madrid, Spain, 1995.Google Scholar
  2. [2]
    F. Alexandre, F. Guyot, J. P. Haton, and Y. Burnod. The cortical column: a new processing unit for multilayered networks. Neural networks, 4:15–25, 1991.CrossRefGoogle Scholar
  3. [3]
    F. Berthommier. Intégration neuronale dans le système auditif. Modélisation de réseaux neuronaux temporo-dépendants. PhD thesis, Université Joseph Fourier — Grenoble I, 1992.Google Scholar
  4. [4]
    Y. Burnod. An adaptive neural network: The cerebral cortex. Masson Paris, 1988.Google Scholar
  5. [5]
    S. B. Davis and P. Mermelstein. Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on acoustics, speech, and signal processing, ASSP-28(4):357–366, 1980.CrossRefGoogle Scholar
  6. [6]
    S. Durand and F. Alexandre. Spatio-temporal mask learning: application to speech recognition. In D. W. Pearson, N. C. Steele, R. F. Albrecht (editors), Artificial Neural Nets and Genetic Algorithms, pages 132–135, Springer-Verlag, Wien, April 1995.CrossRefGoogle Scholar
  7. [7]
    S. Durand and F. Alexandre. Tom, a new temporal neural net architecture for speech signal processing. In IEEE International Conference on Acoustic Speech and Signal Processing, Atlanta, USA, 1996.Google Scholar
  8. [8]
    J. L. Elman. Finding structure in time. Cognitive Science, 14:179–211, 1990.CrossRefGoogle Scholar
  9. [9]
    B. Fritzke. A growing neural gas network learns topologies. In G. Tesauro, D.S. Touretzky, and T.K. Leen, editors, Advances in Neural Information Processing Systems 7. MIT Press, Cambridge MA, 1995.Google Scholar
  10. [10]
    Y. Gao, T. Huang, S. Chen, and J. P. Haton. Auditory model based speech processing. In Proc. of ICSLP, pages 73–76, Alberta, Canada, 1992.Google Scholar
  11. [11]
    T. Kohonen. Self-Organization and Associative Memory. Springer Series in Information Sciences. Springer-Verlag, third edition, 1989.Google Scholar
  12. [12]
    G. Langner. Periodicity coding in the auditory system. Hearing Research, 60:115–142, 1992.CrossRefGoogle Scholar
  13. [13]
    T. M. Martinetz and K. J. Schulten. A “neural-gas” network learns topologies. In T. Kohonen, K. Mäkisara, O. Simula, and J. Kangas, editors, Artificial Neural Network, pages 397–402. North-Holland, Amsterdam, 1991.Google Scholar
  14. [14]
    S. Seneff. A joint synchrony/mean-rate model of auditory speech processing. Journal of Phonetics, 16:55–76, 1988.Google Scholar
  15. [15]
    A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K J. Lang. Phoneme recognition using time-delay neural networks. IEEE Transaction on Acoustics, Speech and Signal Processing, 37(3):328–339, 1989.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Wien 1998

Authors and Affiliations

  • E. Hartwich
    • 1
  • F. Alexandre
    • 1
  1. 1.CRIN-CNRS/INRIA LorraineVandœuvre-lès-NancyFrance

Personalised recommendations