Skip to main content

Acoustic-Phonetic Knowledge Representation: Implications from Spectrogram Reading Experiments

  • Conference paper
Automatic Speech Analysis and Recognition

Part of the book series: NATO Advanced Study Institutes Series ((ASIC,volume 88))

Abstract

This paper presents a summary of several spectrogram reading experiments designed mainly to uncover the amount of phonetic information that is contained in the speech signal. The task involved identifying the phonetic contents of an utterance only from a visual examination of the spectrogram. The results generally support the notion that there is a great deal of phonetic information in the speech signal that can be extracted by the proper application of phonetic rules. From these results, it is argued that phonetic recognition in speech recognition systems can be improved substantially, and that improved phonetic recognition will lead to speech recognition systems of greatly increased complexity and sophistication.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Blumstein, S.E. and Stevens, K.N. (1979) “Acoustic Invariance in Speech Production: Evidence from Measurements of the Spectral Characteristics of Stop Consonants,” J. Acoust. Soc. Am., Vol. 66, No. 4, 1001–1017.

    Article  Google Scholar 

  • Cohen, P.S. and Mercer, R.L. (1975) “The Phonological Component of an Automatic Speech Recognition System,” in Speech Recognition: Invited Papers Presented at the 1974 IEEE Symposium, ed. D.R. Reddy, 275–320, (Academic Press, New York).

    Google Scholar 

  • Cole, R.A. and Zue, V.W. (1980) “Speech as Eyes See It,” Chapter 24 in Attention and Performance VIII, ed. R.S. Nickerson, 475–494 (Lawrence Erlbaum Asso., Hillsdale, New Jersey).

    Google Scholar 

  • Cole, R.A., Rudnicky, A.I., Zue, V.W., and Reddy, D.R. (1980) “Speech as Patterns on Paper,” Chapter 1 in Perception and Production of Fluent Speech, ed. R.A, Cole, 3–50 (Lawrence Erlbaum Asso., Hillsdale, New Jersey).

    Google Scholar 

  • Cutler, A. and Foss, D.J. (1977) “On the Role of Sentence Stress in Sentence Processing,” Language and Speech, Vol. 20, 1–10.

    Google Scholar 

  • Fant, G. (1962) “Descriptive Analysis of the Acoustic Aspects of Speech,” Logos, Vol. 5, 3–17.

    Google Scholar 

  • Hyde, S.R. (1972) “Automatic Speech Recognition: A Critical Survey and Discussion of the Literature,” in Human Communication: A unified View, edited by E.E. David and P.B. Denes (McGraw-Hill, New York).

    Google Scholar 

  • Kameny, I. (1975) “Comparison of Formant Spaces of Retroflexed and Nonretroflexed Vowels,” IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-23, 38–49.

    Article  Google Scholar 

  • Kiang, N. S.-Y. (1980) “Processing of Speech by the Auditory Nervous System,” J. Acoust. Soc. Am., Vol. 68, 830–835.

    Article  Google Scholar 

  • Klatt, D.H. (1975) “Voice Onset Time, Frication and Aspiration in Word-Initial Consonant Clusters,” J. Speech and Hearing Research, Vol. 18, 686–706.

    Google Scholar 

  • Klatt, D.H. (1976) “Linguistic Uses of Segmental Duration in English: Acoustic and Perceptual Evidence,” J, Acoust. Soc. Am., Vol. 59, No. 5, 1208–1221.

    Article  Google Scholar 

  • Klatt, D.H. (1977) “Review of the ARPA Speech Understanding Project,” J. Acoust. Soc. Am., Vol. 62, No. 6, 1345–1366.

    Article  Google Scholar 

  • Klatt, D.H. and Stevens, K.N. (1973) “On the Automatic Recognition of Continuous Speech; Implications from a Spectrogram-Reading Experiment,” IEEE Transactions on Audio and Electroacoustics, AU-21, 210–217.

    Article  Google Scholar 

  • Koenig, W., Dunn, H.K., and Lacey, L.Y. (1946) “The Sound Spectrograph,” J. Acoust. Soc. Am., Vol.18, 19–49.

    Article  Google Scholar 

  • Lea, W.A. (1980) Trends in Speech Recognition, (Prentice-Hall, Englewood Cliffs, New Jersey).

    Google Scholar 

  • Liberman, A.M., Cooper, F.S., Shankweiler, D.P., and Studdert-Kennedy, M. (1968) “Why Are Speech Spectrograms Hard to Read?” American Annals for the Deaf, 1968, Vol. 113, 127–133.

    Google Scholar 

  • Lindblom, B.E.F. and Svensson, S.G. (1973) “Interaction between Segmental and Nonsegmental Factors in Speech Recognition,” IEEE Transactions on Audio and Electroacoustics, AU-21, 536–545.

    Article  Google Scholar 

  • Newell, A., Barnett, J., Forgie, J.W., Green, C.C., Klatt, D.H., Licklider, J.C.R., Munson, J., Reddy, D.R., and Woods, W.A. (1973) Speech Understanding Systems: Final Report of a Study Group (North-Holland/American Elsevier, Amsterdam).

    Google Scholar 

  • Oshika, B.T., Zue, Y.W., Weeks, R.V., Nue, H., and Aurbach, J. (1975) “The Role of Phonological Rules in Speech Understanding Research,” IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-23, 104–112.

    Article  Google Scholar 

  • Potter, R., Kopp, G., and Green, H. (1947) Visible Speech, (van Nostrand, New York).

    Google Scholar 

  • Seneff, S. (1979) “A Spectrogram Reading Experiment,” Term paper submitted for a Graduate Course on Sound, Speech, and Hearing, Massachusetts Institute of Technology.

    Google Scholar 

  • Svensson, S.G. (1974) Prosody and Grammar in Speech Perception, Monographs from the Institute of Linguistics, University of Stockholm, (MILOS), Vol. 2.

    Google Scholar 

  • Umeda, N. (1975) “Vowel Duration in American English,” J. Acoust. Soc. Am., Vol. 58, 434–445.

    Article  Google Scholar 

  • Umeda, N. (1977) “Consonant Duration in American English,” J. Acoust. Soc. Am., Vol. 61, 846–858.

    Article  Google Scholar 

  • Zue, V.W. (1976) “Acoustic Characteristics of Stop Consonants: A Controlled Study,” Sc.D. Thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology; Also published by the University of Indiana Linguistic Club.

    Google Scholar 

  • Zue, V.W. and Laferriere, M. (1979) “Acoustic Study of Medial /t,d/ in American English,” J. Acoust. Soc. Am., Vol. 66, No. 4, 1039–1050.

    Article  Google Scholar 

  • Zue, V.W. and Shattuck-Hufnagel S. (1980) “Palatalization of /s/ in American English: When is a /š7 not a /š/?” J. Acoust. Soc. Am., Vol. 67, S27.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1982 D. Reidel Publishing Company, Dordrecht, Holland

About this paper

Cite this paper

Zue, V.W. (1982). Acoustic-Phonetic Knowledge Representation: Implications from Spectrogram Reading Experiments. In: Haton, JP. (eds) Automatic Speech Analysis and Recognition. NATO Advanced Study Institutes Series, vol 88. Springer, Dordrecht. https://doi.org/10.1007/978-94-009-7879-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-94-009-7879-9_5

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-009-7881-2

  • Online ISBN: 978-94-009-7879-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics