Phoneme Classification using a Rasta-PLP preprocessing algorithm and a Time Delay Neural Network: Performance Studies

Esposito, A.; Ezin, E. C.

doi:10.1007/978-1-4471-0811-5_20

Phoneme Classification using a Rasta-PLP preprocessing algorithm and a Time Delay Neural Network: Performance Studies

A. Esposito^4,6 &
E. C. Ezin⁵

Conference paper

133 Accesses
1 Citations

Part of the book series: Perspectives in Neural Computing ((PERSPECT.NEURAL))

Abstract

An automatic system for classifying the English stops [b, d, g, p, t, k] which uses a preprocessing technique based on a modified Rasta-PLP algorithm and a classification algorithm based on a simplified Time Delay Neural Network (TDNN) architecture is proposed. Phonemes, extracted from the TIMIT-NIST database, and produced by 73 speakers were used to train and test the system. The work is intended to study three different aspects of the problem: First, what role play the the preprocessing phase in the performances of the net? Second, what is the optimal number of neurons which balance the trade-off between net performance and computational time? Third, the optimal learning rate must be found through trial and error processes or can be found as a function of the input data? To this aim experiments to tune the preprocessing parameters, the optimal number of hidden neurons in the TDNN, and the learning rate have been performed. Classification percentages on the test data equal to 92.9 for [b], 91.8 for [d], 92.4 for [g], 80.3 for [p], 90.8 for [t], and 94.2 for [k] have been achieved.

World Lab., and IMSP.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

L. R. Rabiner, A Tutorial on Hidden Markov Models and selected applications in speech recognition, in Reading in Speech Recognition, A. Waibel, K. F. Lee Ed’s, Morgan Kaufmann Publishers, 267–295, (1992).
Google Scholar
R. P. Lippmann, Review of neural networks for speech recognition, in Reading in Speech Recognition, A. Waibel, K. F. Lee Ed’s, Morgan Kaufmann Publishers, 374–392, (1992).
Google Scholar
A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, K. J. Lang, phoneme recognition using time-delay neural networks, IEEE Trans. on Acoustics, Speech and Signal Processing, ASSP-37(3), 328–339, (1989).
Article Google Scholar
Y. Bengio, Artificial neural networks and their application to sequence recognition, Ph.D. Thesis, McGill University, Montreal, Canada, (1991).
Google Scholar
H. Sakoe, R. Isotani, K. Yoshida, K. Iso, T. Watanabe, Speaker-independent word recognition using dynamic programming neural networks, in Proc. of ICASSP-89, Glasgow, UK, 29–32 (1989).
Google Scholar
H. Hermansky, Perceptual linear predictive (PLP) analysis of speech, JASA, 87(4), 1738–1752, (1990).
Google Scholar
H. Hermansky, N. Morgan, RASTA processing of speech, IEEE Trans. on Speech and Audio Processing, 2(4), 578–589, (1994).
Article Google Scholar
V. Zue, S. Seneff, J. Glass, Speech database development: TIMIT and beyond, Speech Communication, 9(4), 351–356, (1990).
Article Google Scholar
Y. Bengio, R. De Mori, G. Flammia, H. Kompe, Phonetically motivated acoustic parameters for continuous speech recognition using artificial neural networks, in Proc. Eurospeech91, Genova, Italy, 551–554, (1991).
Google Scholar
G. Flammia, Speaker independent consonant recognition in continuous speech with distinctive phonetic features, M.Sc. Thesis, McGill University, Montreal, Canada, (1991).
Google Scholar
S. E. Blumstein, K. N. Stevens, Perceptual invariance and onset spectra for the stop consonants in different vowel environments, JASA, 67(2), 648–662, (1980).
Google Scholar
R. A. Cole and B. Scott, Toward a theory on speech perception, Psychological Review, 81(4), 348–374, (1974).
Article Google Scholar
L. Lisker, A.S. Abramson, A cross language study of voicing in initial stops: acoustical measurements, Word, 20, 384–422, (1964).
Google Scholar
L. Lisker, A.S. Abramson, Some effects of context on voice onset time in English stops, Language and Speech, 10(3), 1–28, (1964).
Google Scholar
A. Waibel, Connectionist glue: modular design of neural speech systems, Proc. of the 1988 Connectionist Models Summer School, Morgan Kaufmann, (1988).
Google Scholar
A. Waibel, H. Sawai, K. Shikano, 1989, Consonant recognition by modular construction of large phonemic time-delay neural networks, Proc. ICASSP-89, Glasgow, UK, 112–115, (1989).
Google Scholar
D. E. Rumelhart, J. L. McClelland, Parallel distributed processing: explorations in the microstructure of cognition, MIT Press, (1986).
Google Scholar
L. Bottou, F. Fogelman Soulie, P. Blanchet, J. S. Lienard, Speaker-independent isolated digit recognition: multilayer perceptrons vs. dynamic time warping, Neural Networks, 3(4), 453–456, (1990).
Article Google Scholar
Gee-Swee Poo, Large vocabulary Mandarin Final recognition based on Two-Level Time Delay Neural Networks (TLTDNN), Speech Communication, 22, n.1, 17–24, (1997).
Article Google Scholar
G. Cybenko, Approximation by superpositions of a sigmoidal function, Tech. Report No. 856, Urbana, IL: University of Illinois Urbana-Champaign, Dept. of Electrical and Computer Engineering, (1988).
Google Scholar
S. W. Ellacott, Aspects of the numerical analysis of neural networks, Acta Numerica, 3, 145–202, (1994).
Article MathSciNet Google Scholar
R. S. Sutton, Two problems with backpropagation and other steepest descent learning procedures for networks, in Proc. of 8 ^th Conf. of Cog. Scie. Soc., 823–831 (1986).
Google Scholar
A. R. Gallant, H. White, There exists a neural network that does not make avoidable mistakes, in Proc. of ICNN-88, San Diego, CA, 1, 657–664, (1988).
Google Scholar
K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks are universal approximators, Neural Networks, 2, 359–366, (1989).
Article Google Scholar
J. E. Moody, Note on generalization, regularization, and architecture selection in nonlinear learning systems, in Proc. of IEEE/NNSP-91, Los Alamitos, CA, 29–32 (1991).
Google Scholar
W. H. Press, B. P. Flannery, S. A. Teukolsky, W. T. Vetterling, Numerical recipes: …, Cambridge University Press, (1988).
Google Scholar

Download references

Author information

Authors and Affiliations

International Institute for Advanced Scientific Studies (I.I.A.S.S.), Vietri sul Mare Salerno, Italy
A. Esposito
Institut de Mathématiques et de Sciences Physiques (IMSP), B.P. 613, Porto-Novo, Bénin, West Africa
E. C. Ezin
INFM unitá di Salerno, Italy
A. Esposito

Authors

A. Esposito
View author publications
You can also search for this author in PubMed Google Scholar
E. C. Ezin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Scienze Fisiche “E.R. Caianiello”, Università di Salerno, 84081, Baronissi (SA), Italy
Maria Marinaro
Dipartimento di Informatica ed Applicazioni “R.M. Capocelli”, Università di Salerno, 84081, Baronissi (SA), Italy
Roberto Tagliaferri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Esposito, A., Ezin, E.C. (1999). Phoneme Classification using a Rasta-PLP preprocessing algorithm and a Time Delay Neural Network: Performance Studies. In: Marinaro, M., Tagliaferri, R. (eds) Neural Nets WIRN VIETRI-98. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-0811-5_20

Download citation

DOI: https://doi.org/10.1007/978-1-4471-0811-5_20
Publisher Name: Springer, London
Print ISBN: 978-1-4471-1208-2
Online ISBN: 978-1-4471-0811-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics