Acoustic-Phonetic Knowledge Representation: Implications from Spectrogram Reading Experiments

Zue, Victor W.

doi:10.1007/978-94-009-7879-9_5

Victor W. Zue²

Part of the book series: NATO Advanced Study Institutes Series ((ASIC,volume 88))

158 Accesses
5 Citations

Abstract

This paper presents a summary of several spectrogram reading experiments designed mainly to uncover the amount of phonetic information that is contained in the speech signal. The task involved identifying the phonetic contents of an utterance only from a visual examination of the spectrogram. The results generally support the notion that there is a great deal of phonetic information in the speech signal that can be extracted by the proper application of phonetic rules. From these results, it is argued that phonetic recognition in speech recognition systems can be improved substantially, and that improved phonetic recognition will lead to speech recognition systems of greatly increased complexity and sophistication.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blumstein, S.E. and Stevens, K.N. (1979) “Acoustic Invariance in Speech Production: Evidence from Measurements of the Spectral Characteristics of Stop Consonants,” J. Acoust. Soc. Am., Vol. 66, No. 4, 1001–1017.
Article Google Scholar
Cohen, P.S. and Mercer, R.L. (1975) “The Phonological Component of an Automatic Speech Recognition System,” in Speech Recognition: Invited Papers Presented at the 1974 IEEE Symposium, ed. D.R. Reddy, 275–320, (Academic Press, New York).
Google Scholar
Cole, R.A. and Zue, V.W. (1980) “Speech as Eyes See It,” Chapter 24 in Attention and Performance VIII, ed. R.S. Nickerson, 475–494 (Lawrence Erlbaum Asso., Hillsdale, New Jersey).
Google Scholar
Cole, R.A., Rudnicky, A.I., Zue, V.W., and Reddy, D.R. (1980) “Speech as Patterns on Paper,” Chapter 1 in Perception and Production of Fluent Speech, ed. R.A, Cole, 3–50 (Lawrence Erlbaum Asso., Hillsdale, New Jersey).
Google Scholar
Cutler, A. and Foss, D.J. (1977) “On the Role of Sentence Stress in Sentence Processing,” Language and Speech, Vol. 20, 1–10.
Google Scholar
Fant, G. (1962) “Descriptive Analysis of the Acoustic Aspects of Speech,” Logos, Vol. 5, 3–17.
Google Scholar
Hyde, S.R. (1972) “Automatic Speech Recognition: A Critical Survey and Discussion of the Literature,” in Human Communication: A unified View, edited by E.E. David and P.B. Denes (McGraw-Hill, New York).
Google Scholar
Kameny, I. (1975) “Comparison of Formant Spaces of Retroflexed and Nonretroflexed Vowels,” IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-23, 38–49.
Article Google Scholar
Kiang, N. S.-Y. (1980) “Processing of Speech by the Auditory Nervous System,” J. Acoust. Soc. Am., Vol. 68, 830–835.
Article Google Scholar
Klatt, D.H. (1975) “Voice Onset Time, Frication and Aspiration in Word-Initial Consonant Clusters,” J. Speech and Hearing Research, Vol. 18, 686–706.
Google Scholar
Klatt, D.H. (1976) “Linguistic Uses of Segmental Duration in English: Acoustic and Perceptual Evidence,” J, Acoust. Soc. Am., Vol. 59, No. 5, 1208–1221.
Article Google Scholar
Klatt, D.H. (1977) “Review of the ARPA Speech Understanding Project,” J. Acoust. Soc. Am., Vol. 62, No. 6, 1345–1366.
Article Google Scholar
Klatt, D.H. and Stevens, K.N. (1973) “On the Automatic Recognition of Continuous Speech; Implications from a Spectrogram-Reading Experiment,” IEEE Transactions on Audio and Electroacoustics, AU-21, 210–217.
Article Google Scholar
Koenig, W., Dunn, H.K., and Lacey, L.Y. (1946) “The Sound Spectrograph,” J. Acoust. Soc. Am., Vol.18, 19–49.
Article Google Scholar
Lea, W.A. (1980) Trends in Speech Recognition, (Prentice-Hall, Englewood Cliffs, New Jersey).
Google Scholar
Liberman, A.M., Cooper, F.S., Shankweiler, D.P., and Studdert-Kennedy, M. (1968) “Why Are Speech Spectrograms Hard to Read?” American Annals for the Deaf, 1968, Vol. 113, 127–133.
Google Scholar
Lindblom, B.E.F. and Svensson, S.G. (1973) “Interaction between Segmental and Nonsegmental Factors in Speech Recognition,” IEEE Transactions on Audio and Electroacoustics, AU-21, 536–545.
Article Google Scholar
Newell, A., Barnett, J., Forgie, J.W., Green, C.C., Klatt, D.H., Licklider, J.C.R., Munson, J., Reddy, D.R., and Woods, W.A. (1973) Speech Understanding Systems: Final Report of a Study Group (North-Holland/American Elsevier, Amsterdam).
Google Scholar
Oshika, B.T., Zue, Y.W., Weeks, R.V., Nue, H., and Aurbach, J. (1975) “The Role of Phonological Rules in Speech Understanding Research,” IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-23, 104–112.
Article Google Scholar
Potter, R., Kopp, G., and Green, H. (1947) Visible Speech, (van Nostrand, New York).
Google Scholar
Seneff, S. (1979) “A Spectrogram Reading Experiment,” Term paper submitted for a Graduate Course on Sound, Speech, and Hearing, Massachusetts Institute of Technology.
Google Scholar
Svensson, S.G. (1974) Prosody and Grammar in Speech Perception, Monographs from the Institute of Linguistics, University of Stockholm, (MILOS), Vol. 2.
Google Scholar
Umeda, N. (1975) “Vowel Duration in American English,” J. Acoust. Soc. Am., Vol. 58, 434–445.
Article Google Scholar
Umeda, N. (1977) “Consonant Duration in American English,” J. Acoust. Soc. Am., Vol. 61, 846–858.
Article Google Scholar
Zue, V.W. (1976) “Acoustic Characteristics of Stop Consonants: A Controlled Study,” Sc.D. Thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology; Also published by the University of Indiana Linguistic Club.
Google Scholar
Zue, V.W. and Laferriere, M. (1979) “Acoustic Study of Medial /t,d/ in American English,” J. Acoust. Soc. Am., Vol. 66, No. 4, 1039–1050.
Article Google Scholar
Zue, V.W. and Shattuck-Hufnagel S. (1980) “Palatalization of /s/ in American English: When is a /š7 not a /š/?” J. Acoust. Soc. Am., Vol. 67, S27.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, 02139, USA
Victor W. Zue

Authors

Victor W. Zue
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CRIN, Université de Nancy I, France
Jean-Paul Haton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zue, V.W. (1982). Acoustic-Phonetic Knowledge Representation: Implications from Spectrogram Reading Experiments. In: Haton, JP. (eds) Automatic Speech Analysis and Recognition. NATO Advanced Study Institutes Series, vol 88. Springer, Dordrecht. https://doi.org/10.1007/978-94-009-7879-9_5

Download citation

DOI: https://doi.org/10.1007/978-94-009-7879-9_5
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-009-7881-2
Online ISBN: 978-94-009-7879-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics