Spatio-Temporal Mask Learning: Application to Speech Recognition

  • Stéphane Durand
  • Frédéric Alexandre


In this paper, we describe the “spatio-temporal” map which is an original algorithm to learn and recognize dynamic patterns represented by sequences. This work is slanted toward an internal and explicit representation of time which seems to be neuro-biologically relevant. The map involves units with different kinds of links: feed-forward connections, intra-map connections and inter-map connections. This architecture is able to learn sequences robust to noise from an input stream. The learning process is self-organized for the feed-forward links and “pseudo” self-organized for the intra-map links. An application to French spoken digits recognition is presented.


Receptive Field Speech Recognition Input Stream Cortical Column Neural Assembly 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    F. Alexandre. Une modélisation fonctionnelle du cortex: la colonne corticale. Aspects visuels et moteurs. PhD thesis, Université Nancy I, 1990.Google Scholar
  2. [2]
    F. Alexandre, F. Guyot, J P. Haton, and Y. Burnod. The cortical column: a new processing unit for multilayered networks. Neural networks, 4: 15–25, 1991CrossRefGoogle Scholar
  3. [3]
    B. Ans. Modèle neuromimétique du stockage et du rappel de séquences temporelles. t311, série iii, C. R. Acad. Sci. Paris, 1990.Google Scholar
  4. [4]
    B. Colnet and S. Durand. Application of temporal neural networks to source localisation. In ICANNGA, second international conference on artificial neural networks and genetic algorithms, Alès, France, 1995.Google Scholar
  5. [5]
    S. B. Davis and P. Mermelstein. Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on acoustics, speech, and signal processing, ASSP-28(4): 357–366, 1980.CrossRefGoogle Scholar
  6. [6]
    S. Durand and F. Alexandre. A neural network based on sequence learning: Application to spoken digits recognition. In 7th international conference on Neural Networks and Their Applications, pages 290–298, Marseille, 1994.Google Scholar
  7. [7]
    J L. Elman. Finding structure in time. Cognitive Science, 14: 179–211, 1990.CrossRefGoogle Scholar
  8. [8]
    D H. Hubel and T N. Wiesel. Functional architecture of macaque monkey visual cortex. Ferrier Lecture Proc. Roy. Soc. Lond.B, pages 1–59, 1977.Google Scholar
  9. [9]
    M I. Jordan. Attractor dynamics and parallelism in a connectionist sequential machine. In Hillsdale, editor, Proceedings of the Eighth Annual Conference of the Cognitive Science Society. Erlbaum, 1986.Google Scholar
  10. [10]
    T. Kohonen. Self-Organization and Associative Memory. Springer Series in Information Sciences. Springer-Verlag, third edition, 1989.Google Scholar
  11. [11]
    V.I. Nenov and M.G. Dyer. Perceptually grounded language learning: Part1-a neural network architecture for robust sequence association. Connection Science, 5 (2): 115–138, 1993.CrossRefGoogle Scholar
  12. [12]
    A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K J. Lang. Phoneme recognition using time-delay neural networks. IEEE Transaction on Acoustics, Speech and Signal Processing, 37 (3): 328–339, 1989.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag/Wien 1995

Authors and Affiliations

  • Stéphane Durand
    • 1
  • Frédéric Alexandre
    • 1
  1. 1.CRIN-CNRS INRIA LorraineVandoeuvre-lès-NancyFrance

Personalised recommendations