Phoneme Classification using a Rasta-PLP preprocessing algorithm and a Time Delay Neural Network: Performance Studies

  • A. Esposito
  • E. C. Ezin
Conference paper
Part of the Perspectives in Neural Computing book series (PERSPECT.NEURAL)


An automatic system for classifying the English stops [b, d, g, p, t, k] which uses a preprocessing technique based on a modified Rasta-PLP algorithm and a classification algorithm based on a simplified Time Delay Neural Network (TDNN) architecture is proposed. Phonemes, extracted from the TIMIT-NIST database, and produced by 73 speakers were used to train and test the system. The work is intended to study three different aspects of the problem: First, what role play the the preprocessing phase in the performances of the net? Second, what is the optimal number of neurons which balance the trade-off between net performance and computational time? Third, the optimal learning rate must be found through trial and error processes or can be found as a function of the input data? To this aim experiments to tune the preprocessing parameters, the optimal number of hidden neurons in the TDNN, and the learning rate have been performed. Classification percentages on the test data equal to 92.9 for [b], 91.8 for [d], 92.4 for [g], 80.3 for [p], 90.8 for [t], and 94.2 for [k] have been achieved.


Hide Layer Mean Square Error Learning Rate Speech Signal Hide Neuron 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    L. R. Rabiner, A Tutorial on Hidden Markov Models and selected applications in speech recognition, in Reading in Speech Recognition, A. Waibel, K. F. Lee Ed’s, Morgan Kaufmann Publishers, 267–295, (1992).Google Scholar
  2. [2]
    R. P. Lippmann, Review of neural networks for speech recognition, in Reading in Speech Recognition, A. Waibel, K. F. Lee Ed’s, Morgan Kaufmann Publishers, 374–392, (1992).Google Scholar
  3. [3]
    A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, K. J. Lang, phoneme recognition using time-delay neural networks, IEEE Trans. on Acoustics, Speech and Signal Processing, ASSP-37(3), 328–339, (1989).CrossRefGoogle Scholar
  4. [4]
    Y. Bengio, Artificial neural networks and their application to sequence recognition, Ph.D. Thesis, McGill University, Montreal, Canada, (1991).Google Scholar
  5. [5]
    H. Sakoe, R. Isotani, K. Yoshida, K. Iso, T. Watanabe, Speaker-independent word recognition using dynamic programming neural networks, in Proc. of ICASSP-89, Glasgow, UK, 29–32 (1989).Google Scholar
  6. [6]
    H. Hermansky, Perceptual linear predictive (PLP) analysis of speech, JASA, 87(4), 1738–1752, (1990).Google Scholar
  7. [7]
    H. Hermansky, N. Morgan, RASTA processing of speech, IEEE Trans. on Speech and Audio Processing, 2(4), 578–589, (1994).CrossRefGoogle Scholar
  8. [8]
    V. Zue, S. Seneff, J. Glass, Speech database development: TIMIT and beyond, Speech Communication, 9(4), 351–356, (1990).CrossRefGoogle Scholar
  9. [9]
    Y. Bengio, R. De Mori, G. Flammia, H. Kompe, Phonetically motivated acoustic parameters for continuous speech recognition using artificial neural networks, in Proc. Eurospeech91, Genova, Italy, 551–554, (1991).Google Scholar
  10. [10]
    G. Flammia, Speaker independent consonant recognition in continuous speech with distinctive phonetic features, M.Sc. Thesis, McGill University, Montreal, Canada, (1991).Google Scholar
  11. [11]
    S. E. Blumstein, K. N. Stevens, Perceptual invariance and onset spectra for the stop consonants in different vowel environments, JASA, 67(2), 648–662, (1980).Google Scholar
  12. [12]
    R. A. Cole and B. Scott, Toward a theory on speech perception, Psychological Review, 81(4), 348–374, (1974).CrossRefGoogle Scholar
  13. [13]
    L. Lisker, A.S. Abramson, A cross language study of voicing in initial stops: acoustical measurements, Word, 20, 384–422, (1964).Google Scholar
  14. [14]
    L. Lisker, A.S. Abramson, Some effects of context on voice onset time in English stops, Language and Speech, 10(3), 1–28, (1964).Google Scholar
  15. [15]
    A. Waibel, Connectionist glue: modular design of neural speech systems, Proc. of the 1988 Connectionist Models Summer School, Morgan Kaufmann, (1988).Google Scholar
  16. [16]
    A. Waibel, H. Sawai, K. Shikano, 1989, Consonant recognition by modular construction of large phonemic time-delay neural networks, Proc. ICASSP-89, Glasgow, UK, 112–115, (1989).Google Scholar
  17. [17]
    D. E. Rumelhart, J. L. McClelland, Parallel distributed processing: explorations in the microstructure of cognition, MIT Press, (1986).Google Scholar
  18. [18]
    L. Bottou, F. Fogelman Soulie, P. Blanchet, J. S. Lienard, Speaker-independent isolated digit recognition: multilayer perceptrons vs. dynamic time warping, Neural Networks, 3(4), 453–456, (1990).CrossRefGoogle Scholar
  19. [19]
    Gee-Swee Poo, Large vocabulary Mandarin Final recognition based on Two-Level Time Delay Neural Networks (TLTDNN), Speech Communication, 22, n.1, 17–24, (1997).CrossRefGoogle Scholar
  20. [20]
    G. Cybenko, Approximation by superpositions of a sigmoidal function, Tech. Report No. 856, Urbana, IL: University of Illinois Urbana-Champaign, Dept. of Electrical and Computer Engineering, (1988).Google Scholar
  21. [21]
    S. W. Ellacott, Aspects of the numerical analysis of neural networks, Acta Numerica, 3, 145–202, (1994).MathSciNetCrossRefGoogle Scholar
  22. [22]
    R. S. Sutton, Two problems with backpropagation and other steepest descent learning procedures for networks, in Proc. of 8 th Conf. of Cog. Scie. Soc., 823–831 (1986).Google Scholar
  23. [23]
    A. R. Gallant, H. White, There exists a neural network that does not make avoidable mistakes, in Proc. of ICNN-88, San Diego, CA, 1, 657–664, (1988).Google Scholar
  24. [24]
    K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks are universal approximators, Neural Networks, 2, 359–366, (1989).CrossRefGoogle Scholar
  25. [25]
    J. E. Moody, Note on generalization, regularization, and architecture selection in nonlinear learning systems, in Proc. of IEEE/NNSP-91, Los Alamitos, CA, 29–32 (1991).Google Scholar
  26. [26]
    W. H. Press, B. P. Flannery, S. A. Teukolsky, W. T. Vetterling, Numerical recipes: …, Cambridge University Press, (1988).Google Scholar

Copyright information

© Springer-Verlag London Limited 1999

Authors and Affiliations

  • A. Esposito
    • 1
    • 3
  • E. C. Ezin
    • 2
  1. 1.International Institute for Advanced Scientific Studies (I.I.A.S.S.)Vietri sul Mare SalernoItaly
  2. 2.Institut de Mathématiques et de Sciences Physiques (IMSP)Porto-NovoBénin, West Africa
  3. 3.INFM unitá di SalernoItaly

Personalised recommendations