Combined Application of Neural Network and Artificial Intelligence Methods to Automatic Speech Recognition in a Continuous Utterance

  • U. Emiliani
  • P. Podini
  • F. Sani
Conference paper


A very efficient approach to using an artificial supervised neural network in Automatic Speech Recognition in the case of speaker dependent continuous utterance is presented in this paper; it has been tested in the Italian language but in principle not limited to it. An automatic segmentation of the digitized signal and a minimum of human intervention was applied to obtain a phoneme recognition efficiency of 98% on Italian phrases constructed with a limited number of 11 alphabetic classes, defining 20 phonetic subclasses. The efficiency we observed is due to the combined effect of four factors:
  • • the differential of the detected signal of the utterance was digitized;

  • • during the parametrization of segments through Fast Fourier Transform (FFT) and critical band definition the effect of a second derivative was simulated: the higher sensitivity in the higher frequency range of ear complex was thus simulated.

  • • the proper input pattern to be used in the early stages of the training of the neural network was selected by a very sensitive similitude algorithm;

  • • a dynamic and repetitive training procedure was applied through which the generalization shown by the network after training was used to modify and select the input patterns as well as to control the number of the output nodes used in successive training.


Fast Fourier Transform Output Node Automatic Speech Recognition High Frequency Range Trained Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    S. Seneff, A joint synchrony/mean-rate model of auditory speech processing, Journal of Phonetics, 16 (55) Jan. 1988.Google Scholar
  2. [2]
    S. Furui, Digital Speech Processing, Synthesis and Recognition, Marcel Dekker, Inc. New York, 1989Google Scholar
  3. [3]
    D. O’Shaughnessy, Speech Communication, Addison Wesley, 1987Google Scholar
  4. [4]
    A. V. Oppenheim, R.W. Schafer, Digital Signal Processing, Prentice Hall, 1975Google Scholar
  5. [5]
    U. Emiliani, C. Oliosi, P. Podini, to be publishedGoogle Scholar
  6. [6]
    D. E. Rumelhart, J. L. McClelland, Parallel Distributed Processing, Vol. I Exploration In the Microstructure of Cognition, MIT Press, 1986Google Scholar
  7. [7]
    P. Cosi, Y. Bengio, R. de Mori, Phonetically-based Multi-layered Neural Networks for Vowel Classification, Speech Communication 9 (15) 1990CrossRefGoogle Scholar

Copyright information

© Springer-Verlag/Wien 1993

Authors and Affiliations

  • U. Emiliani
    • 1
  • P. Podini
    • 1
  • F. Sani
    • 1
  1. 1.Physic DepartmentUniversity of Parma ItalyParmaItaly

Personalised recommendations