Skip to main content

Nature and Perception of Speech Sounds

  • Chapter
Robustness in Automatic Speech Recognition

Part of the book series: The Kluwer International Series in Engineering and Computer Science ((SECS,volume 341))

  • 205 Accesses

Summary

This chapter reviews the fundamentals of speech production, acoustics and phonetics of speech sounds as well as their time-frequency representation. Then, the basic structure of the auditory system and the main mechanisms influencing speech perception are briefly described. Throughout this chapter, we also emphasize the influence of noise on speech production and perception. By introducing basic characteristics of speech sounds and how they are produced and perceived, we intend to provide the essential knowledge needed to understand the following chapters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Ainsworth, W. (1976). Mechanisms of Speech Recognition. Pergamon Press.

    Google Scholar 

  • Anglade, Y. (1994). Robustesse de la Reconnaissance Automatique de la Parole: Etude et Application dans un Système d’Aide Vocal pour une Standardiste Mal-Voyante. Ph.D. thesis. Université Henri Poincaré, Nancy I.

    Google Scholar 

  • Atkinson, J. (1978). Correlation analysis of the physiological factors controlling fundamental voice frequency. J. Acoust. Soc. Am., 63(1):211–222.

    Article  Google Scholar 

  • Bond, Z., Moore, T., and Gable, B. (1989). Acoustic-phonetic characteristics of speech produced in noise and while wearing an oxygen mask. J. Acoust. Soc. Am., 85(2):907–912.

    Article  Google Scholar 

  • Byrd, D. (1993). 54,000 American stops. Technical report, UCLA Working Papers in Phonetics.

    Google Scholar 

  • Calliope (1989). La Parole et son Traitement Automatique. Masson.

    Google Scholar 

  • Chiba, T. and Kajiyama, M. (1941). The Vowel, its Nature and Structure. Kaseikan.

    Google Scholar 

  • Chomsky, N. and Halle, M. (1968). The Sound Pattern of English. Harper and Row.

    Google Scholar 

  • Coker, C. and Umeda, N. (1975). The importance of spectral details in initial-final constrasts of voiced stops. Journal of Phonetics, 3:63–68.

    Google Scholar 

  • Datta, A., Ganguli, N., and Majumder, D. (1981). Acoustic features of consonants: A study based on Telugu speech sounds. Acustica, 47(2):72–82.

    Google Scholar 

  • Deng, L. and Sun, D. (1994). Phonetic classification and recognition using HMM representation of overlapping articulatory features for all classes of English sounds. In ICASSP, pages 45–48.

    Google Scholar 

  • Draegert, G. (1951). Relationships between voice variables and speech intelligibility in high level noise. Speech Monograph.

    Google Scholar 

  • Draper, M., Ladefoged, P., and Whiteridge, D. (1959). Respiratory muscles in speech. Journal of Speech and Hearing Research, 2:16–27.

    Google Scholar 

  • Dreher, J. and O’Neill, J. (1957). Effects of ambient noise on speaker intelligibility for words and phrases. J. Acoust. Soc. Am., 29:1320–1323.

    Article  Google Scholar 

  • Dunn, H. (1950). The calculation of vowel resonances, and an electrical vocal tract. J. Acoust. Soc. Am., 22:151–166.

    Article  Google Scholar 

  • Elliot, L. (1962). Backward and forward masking of probe tones of different frequencies. J. Acoust. Soc. Am., 34:1116–1117.

    Article  Google Scholar 

  • Fant, G. (1960). Acoustic Theory of Speech Production. Mouton.

    Google Scholar 

  • Fant, G. (1973). Speech Sounds and Features. M.I.T. Press.

    Google Scholar 

  • Farnsworth, D. (1940). High speed motion pictures of the human vocal cords. Technical report, Bell Lab. Record.

    Google Scholar 

  • Flanagan, J. (1958). Some properties of the glottal sound source. Journal of Speech and Hearing Research, 1:99–116.

    Google Scholar 

  • Flanagan, J. (1972). Speech Analysis Synthesis and Perception. Springer-Verlag, 2nd ed.

    Google Scholar 

  • Fletcher, H. and Munson, W. (1933). Loudness, its definition, measurement, and calculation. J. Acoust. Soc. Am., 5:82–108.

    Article  Google Scholar 

  • Fletcher, H. and Munson, W. (1937). Relation between loudness and masking. J. Acoust. Soc. Am., 9(1).

    Google Scholar 

    Google Scholar 

  • Fujimura, O. (1962). Analysis of nasal consonants. J. Acoust. Soc. Am., 34:1865–1875.

    Article  Google Scholar 

  • Fujisaki, H. and Kunisaki, O. (1978). Analysis, recognition, and perception of voiceless fricative consonants in Japanese. IEEE Trans. ASSP, 26(l):21–27.

    Article  Google Scholar 

  • Hansen, J. (1988). Analysis and compensation of stressed and noisy speech with application to robust automatic recognition. Ph.D. thesis. Georgia Institute of Technology.

    Google Scholar 

  • Harris, D. and Dallos, P. (1979). Forward masking of auditory nerve fiber responses. Journal of Neurophysiology, 42:1083–1107.

    Google Scholar 

  • Heinz, J. and Stevens, K. (1961). On the properties of voiceless fricative consonants. J. Acoust. Soc. Am., 33(5):589–596.

    Article  Google Scholar 

  • Hirano, M. (1976). Structure and vibratory behavior of the vocal folds. In Sawashima, M. and Cooper, F.-S., editors, U.S.-Japan Joint Seminar on Dynamics Aspects of Speech Production, pages 13–27. Univ. of Tokyo Press.

    Google Scholar 

  • Houtgast, T. (1972). Psychophysical evidence for lateral inhibition in hearing. J. Acoust. Soc. Am., 51(6.2): 1885–1894.

    Article  Google Scholar 

  • Jakobson, R., Fant, G., and Halle, M. (1952). Preliminaries to Speech Analysis, 1st edition. M.I.T. Press.

    Google Scholar 

  • Jakobson, R., Fant, G., and Halle, M. (1961). Preliminaries to Speech Analysis: The Distinctive Features and Their Correlates. M.I.T. Press.

    Google Scholar 

  • Javel, E., McGee, J., Walsh, E., Farley, G., and Gorga, M. (1983). Suppression of auditory-nerve responses. Suppression threshold and growth, iso-suppression contours. J. Acoust. Soc. Am., 74(3):801–813.

    Article  Google Scholar 

  • Junqua, J.-C. (1989). Toward robustness in isolated-word automatic speech recognition. Ph.D. thesis. University of Nancy I, STL Monograph.

    Google Scholar 

  • Junqua, J.-C. (1993). The Lombard reflex and its role on human listeners and automatic speech recognizers. J. Acoust. Soc. Am., 93(1):510–524.

    Article  Google Scholar 

  • Junqua, J.-C. and Wakita, H. (1989). A comparative study of cepstral lifters and distance measures for all-pole models of speech in noise. In ICASSP, pages 476–479.

    Google Scholar 

  • Kiang, N. (1968). A survey of recent developments in the study of auditory physiology. Ann. Otol. Rhinol. Laryngol, 77:656–675.

    Google Scholar 

  • Kiang, N., Watanabe, T., Thomas, E., and Clark, L. (1965). Discharge Patterns of Single Fibres in the Cat’s Auditory Nerve. M.I.T. Press.

    Google Scholar 

  • Koenig, W., Dunn, H., and Lacey, L. (1946). The sound spectrograph. J. Acoust. Soc. Am., 18:19–49.

    Article  Google Scholar 

  • Ladefoged, P. (1985). The phonetic basis for computer speech processing. In Fallside, F. and Woods, W. A., editors, Computer Speech Processing, pages 3–27. Prentice Hall.

    Google Scholar 

  • Lahiri, A., Gewirth, L., and Blumstein, S. (1984). A reconsideration of acoustic invariance for place of articulation in diffuse stop consonants: Evidence from a cross-language study. J. Acoust. Soc. Am., 76(2):391–404.

    Article  Google Scholar 

  • Lamel, L. (1988). Formalizing Knowledge Used in Spectrogram Reading: Acoustic and Perceptual Evidence of Stops. Ph.D. thesis. Massachusetts Institute of Technology.

    Google Scholar 

  • Lane, H. and Tranel, B. (1971). The Lombard sign and the role of hearing in speech. Journal of Speech and Hearing Research, 14:677–709.

    Google Scholar 

  • Lane, H., Tranel, B., and Sisson, C. (1970). Regulation of voice communication by sensory dynamics. J. Acoust. Soc. Am., 47(2):618–624.

    Article  Google Scholar 

  • Lombard, E. (1911). Le signe de l’élévation de la voix. Ann. Maladies Oreille, Larynx, Nez, Pharynx, 37:101–119.

    Google Scholar 

  • Makhoul, J. and Cosell, L. (1976). LPCW: An LPC vocoder with linear predictive warping. In ICASSP, pages 466–469.

    Google Scholar 

  • Moller, A. (1961). Network model of the middle ear. J. Acoust. Soc. Am., 33:168–176.

    Article  Google Scholar 

  • Olive, J., Greenwood, A., and Coleman, J. (1993). Acoustics of American English Speech. A Dynamic Approach. Springer-Verlag.

    Google Scholar 

  • O’Shaughnessy, D. (1987). Speech Communication: Human and Machine. Addison-Wesley.

    Google Scholar 

  • Peterson, G. and Barney, H. (1952). Control methods used in a study of vowels. J. Acoust. Soc. Am., 24(2): 175–184.

    Article  Google Scholar 

  • Picheny, M., Durlach, N., and Braida, L. (1985). Speaking clearly for the hearing impaired I: Intelligibility differences between clear and conversational speech. Journal of Speech and Hearing Research, 28:96–103.

    Google Scholar 

  • Picheny, M., Durlach, N., and Braida, L. (1986). Speaking clearly for the hard of hearing TL: Acoustic characteristics of clear and conversational speech. Journal of Speech and Hearing Research, 29:434–446.

    Google Scholar 

  • Pick, H., Siegel J., Fox, P., Garber, S., and Kearney, J. (1989). Inhibiting the Lombard effect. J. Acoust. Soc. Am., 85(2):894–900.

    Article  Google Scholar 

  • Pickett, J. (1956). Effects of vocal force on the intelligibility of speech sounds. J. Acoust. Soc. Am., 28(5):902–905.

    Article  Google Scholar 

  • Pickett, J. (1980). The Sounds of Speech Communication. University Park Press.

    Google Scholar 

  • Pisoni, D., Bernacki, R., Nusbaum, H., and Yuchtman, M. (1985). Some acoustic-phonetic correlates of speech produced in noise. In ICASSP, pages 1581–1584.

    Google Scholar 

  • Rose, J., Brugge, J., Anderson, D., and Hind, J. (1967). Phase-locked response to low-frequency tones in single auditory nerve fibers of the squirrel monkey. J. Neu-rophysiol., 30:769–793.

    Google Scholar 

  • Rose, J., Hind, J., Anderson, D., and Brugge, J. (1971). Some effects of stimulus intensity on response of auditory nerve fibers in the squirrel monkey. Neurophysiol, 34:685–699.

    Google Scholar 

  • Rostolland, D. (1982a). Acoustic features of shouted voice. Acustica, 50(2): 118–125.

    Google Scholar 

  • Rostolland, D. (1982b). Phonetic structure of shouted voice. Acustica, 51(2):80–89.

    Google Scholar 

  • Rostolland, D. (1985). Intelligibility of shouted voice. Acustica, 57(3): 104–121.

    Google Scholar 

  • Schulman, R. (1985). Articulatory targeting and perceptual constancy of loud speech. Technical report, PERILUS IV, Stockholm University.

    Google Scholar 

  • Schulman, R. (1989). Articulatory dynamics of loud and normal speech. J. Acoust. Soc. Am., 85(1):295–312.

    Article  Google Scholar 

  • Stanton, B., Jamieson, L., and Allen, G. (1988). Acoustic-phonetic analysis of loud and Lombard speech in simulated cockpit conditions. In ICASSP, pages 331–334.

    Google Scholar 

  • Stevens, K. (1956). Stop consonants. Technical report, Acoustic Lab., Massachusetts Institute of Technology.

    Google Scholar 

  • Stevens, K. (1971). Airflow and turbulent noise for fricative and stop consonants: Statistic considerations. J. Acoust. Soc. Am., 50:1180–1192.

    Article  Google Scholar 

  • Stevens, S. and Volkmann, J. (1940). The relation of pitch to frequency. Am. J. Psychol., 53(4, part 2):329.

    Article  Google Scholar 

  • Strevens, P. (1960). Spectra of fricative noise in human speech. Language & Speech, 3:32–49.

    Google Scholar 

  • Summers, W., Pisoni, D., Bernacki, R., Pedlow, R., and Stokes, M. (1988). Effects of noise on speech production: Acoustic and perceptual analyses. J. Acoust. Soc. Am., 84(3):917–928.

    Article  Google Scholar 

  • Traunmüller, H. (1985). The role of the fundamental and the higher formants in the perception of speaker size, vocal effort, and vowel openess. Technical report, Stockholm University.

    Google Scholar 

  • Ungeheuer, G. (1962). Elemente Einer Akustischen Theorie der Vokalarticulation. Springer-Verlag.

    Google Scholar 

  • von Békésy, G. (1960). Experiments in Hearing. McGraw-Hill.

    Google Scholar 

  • Whitehead, R., Metz, D., and Whitehead, B. (1984). Vibration patterns of the vocal folds during pulse register phonation. J. Acoust. Soc. Am., 75(4): 1293–1996.

    Article  Google Scholar 

  • Wickelgren, W. A. (1966). Distinctive features and errors in short-term memory for English consonants. J. Acoust. Soc. Am., 39:388–398.

    Article  Google Scholar 

  • Zahorian, S. and Rothenberg, M. (1981). Principal-component analysis for low-redundancy encoding of speech spectra. J. Acoust. Soc. Am., 69(3):832–845.

    Article  Google Scholar 

  • Zwicker, E. and Feldtkeller, R. (1981). Psychoacoustique: L’oreille Récepteur d’Informations. Masson.

    Google Scholar 

  • Zwicker, E. and Terhardt, E. (1980). Analytical expressions for critical band rate and critical bandwidth as a function of frequency. J. Acoust. Soc. Am., 68(5): 1523–1525.

    Article  Google Scholar 

  • Zwislocki, J. (1959). Electrical model of the middle ear. J. Acoust. Soc. Am., 31:841

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Kluwer Academic Publishers

About this chapter

Cite this chapter

Junqua, JC., Haton, JP. (1996). Nature and Perception of Speech Sounds. In: Robustness in Automatic Speech Recognition. The Kluwer International Series in Engineering and Computer Science, vol 341. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1297-0_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-1297-0_1

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4612-8555-7

  • Online ISBN: 978-1-4613-1297-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics