Abstract
This paper presents a summary of several spectrogram reading experiments designed mainly to uncover the amount of phonetic information that is contained in the speech signal. The task involved identifying the phonetic contents of an utterance only from a visual examination of the spectrogram. The results generally support the notion that there is a great deal of phonetic information in the speech signal that can be extracted by the proper application of phonetic rules. From these results, it is argued that phonetic recognition in speech recognition systems can be improved substantially, and that improved phonetic recognition will lead to speech recognition systems of greatly increased complexity and sophistication.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Blumstein, S.E. and Stevens, K.N. (1979) “Acoustic Invariance in Speech Production: Evidence from Measurements of the Spectral Characteristics of Stop Consonants,” J. Acoust. Soc. Am., Vol. 66, No. 4, 1001–1017.
Cohen, P.S. and Mercer, R.L. (1975) “The Phonological Component of an Automatic Speech Recognition System,” in Speech Recognition: Invited Papers Presented at the 1974 IEEE Symposium, ed. D.R. Reddy, 275–320, (Academic Press, New York).
Cole, R.A. and Zue, V.W. (1980) “Speech as Eyes See It,” Chapter 24 in Attention and Performance VIII, ed. R.S. Nickerson, 475–494 (Lawrence Erlbaum Asso., Hillsdale, New Jersey).
Cole, R.A., Rudnicky, A.I., Zue, V.W., and Reddy, D.R. (1980) “Speech as Patterns on Paper,” Chapter 1 in Perception and Production of Fluent Speech, ed. R.A, Cole, 3–50 (Lawrence Erlbaum Asso., Hillsdale, New Jersey).
Cutler, A. and Foss, D.J. (1977) “On the Role of Sentence Stress in Sentence Processing,” Language and Speech, Vol. 20, 1–10.
Fant, G. (1962) “Descriptive Analysis of the Acoustic Aspects of Speech,” Logos, Vol. 5, 3–17.
Hyde, S.R. (1972) “Automatic Speech Recognition: A Critical Survey and Discussion of the Literature,” in Human Communication: A unified View, edited by E.E. David and P.B. Denes (McGraw-Hill, New York).
Kameny, I. (1975) “Comparison of Formant Spaces of Retroflexed and Nonretroflexed Vowels,” IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-23, 38–49.
Kiang, N. S.-Y. (1980) “Processing of Speech by the Auditory Nervous System,” J. Acoust. Soc. Am., Vol. 68, 830–835.
Klatt, D.H. (1975) “Voice Onset Time, Frication and Aspiration in Word-Initial Consonant Clusters,” J. Speech and Hearing Research, Vol. 18, 686–706.
Klatt, D.H. (1976) “Linguistic Uses of Segmental Duration in English: Acoustic and Perceptual Evidence,” J, Acoust. Soc. Am., Vol. 59, No. 5, 1208–1221.
Klatt, D.H. (1977) “Review of the ARPA Speech Understanding Project,” J. Acoust. Soc. Am., Vol. 62, No. 6, 1345–1366.
Klatt, D.H. and Stevens, K.N. (1973) “On the Automatic Recognition of Continuous Speech; Implications from a Spectrogram-Reading Experiment,” IEEE Transactions on Audio and Electroacoustics, AU-21, 210–217.
Koenig, W., Dunn, H.K., and Lacey, L.Y. (1946) “The Sound Spectrograph,” J. Acoust. Soc. Am., Vol.18, 19–49.
Lea, W.A. (1980) Trends in Speech Recognition, (Prentice-Hall, Englewood Cliffs, New Jersey).
Liberman, A.M., Cooper, F.S., Shankweiler, D.P., and Studdert-Kennedy, M. (1968) “Why Are Speech Spectrograms Hard to Read?” American Annals for the Deaf, 1968, Vol. 113, 127–133.
Lindblom, B.E.F. and Svensson, S.G. (1973) “Interaction between Segmental and Nonsegmental Factors in Speech Recognition,” IEEE Transactions on Audio and Electroacoustics, AU-21, 536–545.
Newell, A., Barnett, J., Forgie, J.W., Green, C.C., Klatt, D.H., Licklider, J.C.R., Munson, J., Reddy, D.R., and Woods, W.A. (1973) Speech Understanding Systems: Final Report of a Study Group (North-Holland/American Elsevier, Amsterdam).
Oshika, B.T., Zue, Y.W., Weeks, R.V., Nue, H., and Aurbach, J. (1975) “The Role of Phonological Rules in Speech Understanding Research,” IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-23, 104–112.
Potter, R., Kopp, G., and Green, H. (1947) Visible Speech, (van Nostrand, New York).
Seneff, S. (1979) “A Spectrogram Reading Experiment,” Term paper submitted for a Graduate Course on Sound, Speech, and Hearing, Massachusetts Institute of Technology.
Svensson, S.G. (1974) Prosody and Grammar in Speech Perception, Monographs from the Institute of Linguistics, University of Stockholm, (MILOS), Vol. 2.
Umeda, N. (1975) “Vowel Duration in American English,” J. Acoust. Soc. Am., Vol. 58, 434–445.
Umeda, N. (1977) “Consonant Duration in American English,” J. Acoust. Soc. Am., Vol. 61, 846–858.
Zue, V.W. (1976) “Acoustic Characteristics of Stop Consonants: A Controlled Study,” Sc.D. Thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology; Also published by the University of Indiana Linguistic Club.
Zue, V.W. and Laferriere, M. (1979) “Acoustic Study of Medial /t,d/ in American English,” J. Acoust. Soc. Am., Vol. 66, No. 4, 1039–1050.
Zue, V.W. and Shattuck-Hufnagel S. (1980) “Palatalization of /s/ in American English: When is a /š7 not a /š/?” J. Acoust. Soc. Am., Vol. 67, S27.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1982 D. Reidel Publishing Company, Dordrecht, Holland
About this paper
Cite this paper
Zue, V.W. (1982). Acoustic-Phonetic Knowledge Representation: Implications from Spectrogram Reading Experiments. In: Haton, JP. (eds) Automatic Speech Analysis and Recognition. NATO Advanced Study Institutes Series, vol 88. Springer, Dordrecht. https://doi.org/10.1007/978-94-009-7879-9_5
Download citation
DOI: https://doi.org/10.1007/978-94-009-7879-9_5
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-009-7881-2
Online ISBN: 978-94-009-7879-9
eBook Packages: Springer Book Archive