Skip to main content

Phoneme Classification using a Rasta-PLP preprocessing algorithm and a Time Delay Neural Network: Performance Studies

  • Conference paper

Part of the book series: Perspectives in Neural Computing ((PERSPECT.NEURAL))

Abstract

An automatic system for classifying the English stops [b, d, g, p, t, k] which uses a preprocessing technique based on a modified Rasta-PLP algorithm and a classification algorithm based on a simplified Time Delay Neural Network (TDNN) architecture is proposed. Phonemes, extracted from the TIMIT-NIST database, and produced by 73 speakers were used to train and test the system. The work is intended to study three different aspects of the problem: First, what role play the the preprocessing phase in the performances of the net? Second, what is the optimal number of neurons which balance the trade-off between net performance and computational time? Third, the optimal learning rate must be found through trial and error processes or can be found as a function of the input data? To this aim experiments to tune the preprocessing parameters, the optimal number of hidden neurons in the TDNN, and the learning rate have been performed. Classification percentages on the test data equal to 92.9 for [b], 91.8 for [d], 92.4 for [g], 80.3 for [p], 90.8 for [t], and 94.2 for [k] have been achieved.

World Lab., and IMSP.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. L. R. Rabiner, A Tutorial on Hidden Markov Models and selected applications in speech recognition, in Reading in Speech Recognition, A. Waibel, K. F. Lee Ed’s, Morgan Kaufmann Publishers, 267–295, (1992).

    Google Scholar 

  2. R. P. Lippmann, Review of neural networks for speech recognition, in Reading in Speech Recognition, A. Waibel, K. F. Lee Ed’s, Morgan Kaufmann Publishers, 374–392, (1992).

    Google Scholar 

  3. A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, K. J. Lang, phoneme recognition using time-delay neural networks, IEEE Trans. on Acoustics, Speech and Signal Processing, ASSP-37(3), 328–339, (1989).

    Article  Google Scholar 

  4. Y. Bengio, Artificial neural networks and their application to sequence recognition, Ph.D. Thesis, McGill University, Montreal, Canada, (1991).

    Google Scholar 

  5. H. Sakoe, R. Isotani, K. Yoshida, K. Iso, T. Watanabe, Speaker-independent word recognition using dynamic programming neural networks, in Proc. of ICASSP-89, Glasgow, UK, 29–32 (1989).

    Google Scholar 

  6. H. Hermansky, Perceptual linear predictive (PLP) analysis of speech, JASA, 87(4), 1738–1752, (1990).

    Google Scholar 

  7. H. Hermansky, N. Morgan, RASTA processing of speech, IEEE Trans. on Speech and Audio Processing, 2(4), 578–589, (1994).

    Article  Google Scholar 

  8. V. Zue, S. Seneff, J. Glass, Speech database development: TIMIT and beyond, Speech Communication, 9(4), 351–356, (1990).

    Article  Google Scholar 

  9. Y. Bengio, R. De Mori, G. Flammia, H. Kompe, Phonetically motivated acoustic parameters for continuous speech recognition using artificial neural networks, in Proc. Eurospeech91, Genova, Italy, 551–554, (1991).

    Google Scholar 

  10. G. Flammia, Speaker independent consonant recognition in continuous speech with distinctive phonetic features, M.Sc. Thesis, McGill University, Montreal, Canada, (1991).

    Google Scholar 

  11. S. E. Blumstein, K. N. Stevens, Perceptual invariance and onset spectra for the stop consonants in different vowel environments, JASA, 67(2), 648–662, (1980).

    Google Scholar 

  12. R. A. Cole and B. Scott, Toward a theory on speech perception, Psychological Review, 81(4), 348–374, (1974).

    Article  Google Scholar 

  13. L. Lisker, A.S. Abramson, A cross language study of voicing in initial stops: acoustical measurements, Word, 20, 384–422, (1964).

    Google Scholar 

  14. L. Lisker, A.S. Abramson, Some effects of context on voice onset time in English stops, Language and Speech, 10(3), 1–28, (1964).

    Google Scholar 

  15. A. Waibel, Connectionist glue: modular design of neural speech systems, Proc. of the 1988 Connectionist Models Summer School, Morgan Kaufmann, (1988).

    Google Scholar 

  16. A. Waibel, H. Sawai, K. Shikano, 1989, Consonant recognition by modular construction of large phonemic time-delay neural networks, Proc. ICASSP-89, Glasgow, UK, 112–115, (1989).

    Google Scholar 

  17. D. E. Rumelhart, J. L. McClelland, Parallel distributed processing: explorations in the microstructure of cognition, MIT Press, (1986).

    Google Scholar 

  18. L. Bottou, F. Fogelman Soulie, P. Blanchet, J. S. Lienard, Speaker-independent isolated digit recognition: multilayer perceptrons vs. dynamic time warping, Neural Networks, 3(4), 453–456, (1990).

    Article  Google Scholar 

  19. Gee-Swee Poo, Large vocabulary Mandarin Final recognition based on Two-Level Time Delay Neural Networks (TLTDNN), Speech Communication, 22, n.1, 17–24, (1997).

    Article  Google Scholar 

  20. G. Cybenko, Approximation by superpositions of a sigmoidal function, Tech. Report No. 856, Urbana, IL: University of Illinois Urbana-Champaign, Dept. of Electrical and Computer Engineering, (1988).

    Google Scholar 

  21. S. W. Ellacott, Aspects of the numerical analysis of neural networks, Acta Numerica, 3, 145–202, (1994).

    Article  MathSciNet  Google Scholar 

  22. R. S. Sutton, Two problems with backpropagation and other steepest descent learning procedures for networks, in Proc. of 8 th Conf. of Cog. Scie. Soc., 823–831 (1986).

    Google Scholar 

  23. A. R. Gallant, H. White, There exists a neural network that does not make avoidable mistakes, in Proc. of ICNN-88, San Diego, CA, 1, 657–664, (1988).

    Google Scholar 

  24. K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks are universal approximators, Neural Networks, 2, 359–366, (1989).

    Article  Google Scholar 

  25. J. E. Moody, Note on generalization, regularization, and architecture selection in nonlinear learning systems, in Proc. of IEEE/NNSP-91, Los Alamitos, CA, 29–32 (1991).

    Google Scholar 

  26. W. H. Press, B. P. Flannery, S. A. Teukolsky, W. T. Vetterling, Numerical recipes: …, Cambridge University Press, (1988).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag London Limited

About this paper

Cite this paper

Esposito, A., Ezin, E.C. (1999). Phoneme Classification using a Rasta-PLP preprocessing algorithm and a Time Delay Neural Network: Performance Studies. In: Marinaro, M., Tagliaferri, R. (eds) Neural Nets WIRN VIETRI-98. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-0811-5_20

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-0811-5_20

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-1208-2

  • Online ISBN: 978-1-4471-0811-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics