Speaker Independent Vowel Recognition

Smith, L. S.; Tang, C.

doi:10.1007/978-94-011-2360-0_11

L. S. Smith³ &
C. Tang³

Part of the book series: BT Telecommunications Series ((BTTS,volume 1))

166 Accesses

Abstract

In designing artificial devices to perform human perceptual functions which map the initial sensory stimuli to their corresponding responses, there are at least three aspects to be considered: the representation of sensory input, the representation of the output or response, and the mechanism which maps the input to desired output. Since Dudley first invented his vocoder more than four decades ago, many vocoders have been designed to develop a representation of speech in an efficient way such that the representation contains all the information necessary for separating signals and at the same time has minimum redundancy [2]. Within the backprop learning connectionist framework, researchers have tried different network architectures — varying the number of layers of the network, and varying the connectivity, such as Harrison’s experiment with single and multilayer perceptrons, and his use of zonal units instead of making the network fully connected between layers [3]. On the output level, McCulloch and Ainsworth tried two types of output representation in their attempt to recognize steady state vowels [2]. One is local representation in which each unit represents a vowel; the other is based on the vowel quadrilateral in which each vowel is represented by a pair of real numbers indicating the first two formant frequencies. The vowel quadrilateral is illustrated in Fig. 1.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

McCulloch N. and Ainsworth W. A.: ‘Speaker independent Vowel Recognition Using a Multi-Layer Perceptron’, Technical report, Research initiative in Pattern Recognition, RSRE (1988).
Google Scholar
Flanagan J. L.: ‘Speech Analysis, Synthesis and Perception’, 2nd edition Springer-Verlag, Berlin (1972).
Book Google Scholar
Harrison T. D.: ‘A Connectionist Framework for Continuous Speech Recognition dissertation’, Sidney Sussex College, Cambridge University (1988).
Google Scholar
Hancock P. J.: ‘Data Representation in Neural Nets: an Empirical Study’, Proceedings of the Connectionist Summer School at Carnegie Mellan University (1988).
Google Scholar
Watson I: Personal Communication, Oxford University (1989).
Google Scholar
Peterson G. E. and Barney H. L.: ‘Control Methods Used in a Study of the Vowels Journal of the Acoustical Society of America, 24, 175–184 (1952).
Article Google Scholar
Pearlmutter P.: ‘Learning State-space Trajectories in Recurrent Nets Neural Computing’, 1, 2, (Summer 1989).
Google Scholar
Waibel A.: ‘Modular Construction of Time-Delay Neural Networks for Speech Recognition Neural Computing’, 1, 1, (Spring 1989).
Google Scholar
Kohonen T.: ‘Self-Organization and Associative Memory’, Springer-Verlag, Berlin (1984).
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Cognitive and Computational Neuroscience, University of Stirling, UK
L. S. Smith & C. Tang

Authors

L. S. Smith
View author publications
You can also search for this author in PubMed Google Scholar
C. Tang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Systems, University of East Anglia, Norwich, UK
R. Linggard (Professor) (Professor)
Broadband and Visual Networks, BT Laboratories, Martlesham Heath, UK
D. J. Myers & C. Nightingale &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Smith, L.S., Tang, C. (1992). Speaker Independent Vowel Recognition. In: Linggard, R., Myers, D.J., Nightingale, C. (eds) Neural Networks for Vision, Speech and Natural Language. BT Telecommunications Series, vol 1. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-2360-0_11

Download citation

DOI: https://doi.org/10.1007/978-94-011-2360-0_11
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-5041-8
Online ISBN: 978-94-011-2360-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics