Abstract
An automatic system for classifying the English stops [b, d, g, p, t, k] which uses a preprocessing technique based on a modified Rasta-PLP algorithm and a classification algorithm based on a simplified Time Delay Neural Network (TDNN) architecture is proposed. Phonemes, extracted from the TIMIT-NIST database, and produced by 73 speakers were used to train and test the system. The work is intended to study three different aspects of the problem: First, what role play the the preprocessing phase in the performances of the net? Second, what is the optimal number of neurons which balance the trade-off between net performance and computational time? Third, the optimal learning rate must be found through trial and error processes or can be found as a function of the input data? To this aim experiments to tune the preprocessing parameters, the optimal number of hidden neurons in the TDNN, and the learning rate have been performed. Classification percentages on the test data equal to 92.9 for [b], 91.8 for [d], 92.4 for [g], 80.3 for [p], 90.8 for [t], and 94.2 for [k] have been achieved.
World Lab., and IMSP.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
L. R. Rabiner, A Tutorial on Hidden Markov Models and selected applications in speech recognition, in Reading in Speech Recognition, A. Waibel, K. F. Lee Ed’s, Morgan Kaufmann Publishers, 267–295, (1992).
R. P. Lippmann, Review of neural networks for speech recognition, in Reading in Speech Recognition, A. Waibel, K. F. Lee Ed’s, Morgan Kaufmann Publishers, 374–392, (1992).
A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, K. J. Lang, phoneme recognition using time-delay neural networks, IEEE Trans. on Acoustics, Speech and Signal Processing, ASSP-37(3), 328–339, (1989).
Y. Bengio, Artificial neural networks and their application to sequence recognition, Ph.D. Thesis, McGill University, Montreal, Canada, (1991).
H. Sakoe, R. Isotani, K. Yoshida, K. Iso, T. Watanabe, Speaker-independent word recognition using dynamic programming neural networks, in Proc. of ICASSP-89, Glasgow, UK, 29–32 (1989).
H. Hermansky, Perceptual linear predictive (PLP) analysis of speech, JASA, 87(4), 1738–1752, (1990).
H. Hermansky, N. Morgan, RASTA processing of speech, IEEE Trans. on Speech and Audio Processing, 2(4), 578–589, (1994).
V. Zue, S. Seneff, J. Glass, Speech database development: TIMIT and beyond, Speech Communication, 9(4), 351–356, (1990).
Y. Bengio, R. De Mori, G. Flammia, H. Kompe, Phonetically motivated acoustic parameters for continuous speech recognition using artificial neural networks, in Proc. Eurospeech91, Genova, Italy, 551–554, (1991).
G. Flammia, Speaker independent consonant recognition in continuous speech with distinctive phonetic features, M.Sc. Thesis, McGill University, Montreal, Canada, (1991).
S. E. Blumstein, K. N. Stevens, Perceptual invariance and onset spectra for the stop consonants in different vowel environments, JASA, 67(2), 648–662, (1980).
R. A. Cole and B. Scott, Toward a theory on speech perception, Psychological Review, 81(4), 348–374, (1974).
L. Lisker, A.S. Abramson, A cross language study of voicing in initial stops: acoustical measurements, Word, 20, 384–422, (1964).
L. Lisker, A.S. Abramson, Some effects of context on voice onset time in English stops, Language and Speech, 10(3), 1–28, (1964).
A. Waibel, Connectionist glue: modular design of neural speech systems, Proc. of the 1988 Connectionist Models Summer School, Morgan Kaufmann, (1988).
A. Waibel, H. Sawai, K. Shikano, 1989, Consonant recognition by modular construction of large phonemic time-delay neural networks, Proc. ICASSP-89, Glasgow, UK, 112–115, (1989).
D. E. Rumelhart, J. L. McClelland, Parallel distributed processing: explorations in the microstructure of cognition, MIT Press, (1986).
L. Bottou, F. Fogelman Soulie, P. Blanchet, J. S. Lienard, Speaker-independent isolated digit recognition: multilayer perceptrons vs. dynamic time warping, Neural Networks, 3(4), 453–456, (1990).
Gee-Swee Poo, Large vocabulary Mandarin Final recognition based on Two-Level Time Delay Neural Networks (TLTDNN), Speech Communication, 22, n.1, 17–24, (1997).
G. Cybenko, Approximation by superpositions of a sigmoidal function, Tech. Report No. 856, Urbana, IL: University of Illinois Urbana-Champaign, Dept. of Electrical and Computer Engineering, (1988).
S. W. Ellacott, Aspects of the numerical analysis of neural networks, Acta Numerica, 3, 145–202, (1994).
R. S. Sutton, Two problems with backpropagation and other steepest descent learning procedures for networks, in Proc. of 8 th Conf. of Cog. Scie. Soc., 823–831 (1986).
A. R. Gallant, H. White, There exists a neural network that does not make avoidable mistakes, in Proc. of ICNN-88, San Diego, CA, 1, 657–664, (1988).
K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks are universal approximators, Neural Networks, 2, 359–366, (1989).
J. E. Moody, Note on generalization, regularization, and architecture selection in nonlinear learning systems, in Proc. of IEEE/NNSP-91, Los Alamitos, CA, 29–32 (1991).
W. H. Press, B. P. Flannery, S. A. Teukolsky, W. T. Vetterling, Numerical recipes: …, Cambridge University Press, (1988).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag London Limited
About this paper
Cite this paper
Esposito, A., Ezin, E.C. (1999). Phoneme Classification using a Rasta-PLP preprocessing algorithm and a Time Delay Neural Network: Performance Studies. In: Marinaro, M., Tagliaferri, R. (eds) Neural Nets WIRN VIETRI-98. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-0811-5_20
Download citation
DOI: https://doi.org/10.1007/978-1-4471-0811-5_20
Publisher Name: Springer, London
Print ISBN: 978-1-4471-1208-2
Online ISBN: 978-1-4471-0811-5
eBook Packages: Springer Book Archive