Nature and Perception of Speech Sounds

Junqua, Jean-Claude; Haton, Jean-Paul

doi:10.1007/978-1-4613-1297-0_1

Jean-Claude Junqua³ &
Jean-Paul Haton⁴

Part of the book series: The Kluwer International Series in Engineering and Computer Science ((SECS,volume 341))

205 Accesses

Summary

This chapter reviews the fundamentals of speech production, acoustics and phonetics of speech sounds as well as their time-frequency representation. Then, the basic structure of the auditory system and the main mechanisms influencing speech perception are briefly described. Throughout this chapter, we also emphasize the influence of noise on speech production and perception. By introducing basic characteristics of speech sounds and how they are produced and perceived, we intend to provide the essential knowledge needed to understand the following chapters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ainsworth, W. (1976). Mechanisms of Speech Recognition. Pergamon Press.
Google Scholar
Anglade, Y. (1994). Robustesse de la Reconnaissance Automatique de la Parole: Etude et Application dans un Système d’Aide Vocal pour une Standardiste Mal-Voyante. Ph.D. thesis. Université Henri Poincaré, Nancy I.
Google Scholar
Atkinson, J. (1978). Correlation analysis of the physiological factors controlling fundamental voice frequency. J. Acoust. Soc. Am., 63(1):211–222.
Article Google Scholar
Bond, Z., Moore, T., and Gable, B. (1989). Acoustic-phonetic characteristics of speech produced in noise and while wearing an oxygen mask. J. Acoust. Soc. Am., 85(2):907–912.
Article Google Scholar
Byrd, D. (1993). 54,000 American stops. Technical report, UCLA Working Papers in Phonetics.
Google Scholar
Calliope (1989). La Parole et son Traitement Automatique. Masson.
Google Scholar
Chiba, T. and Kajiyama, M. (1941). The Vowel, its Nature and Structure. Kaseikan.
Google Scholar
Chomsky, N. and Halle, M. (1968). The Sound Pattern of English. Harper and Row.
Google Scholar
Coker, C. and Umeda, N. (1975). The importance of spectral details in initial-final constrasts of voiced stops. Journal of Phonetics, 3:63–68.
Google Scholar
Datta, A., Ganguli, N., and Majumder, D. (1981). Acoustic features of consonants: A study based on Telugu speech sounds. Acustica, 47(2):72–82.
Google Scholar
Deng, L. and Sun, D. (1994). Phonetic classification and recognition using HMM representation of overlapping articulatory features for all classes of English sounds. In ICASSP, pages 45–48.
Google Scholar
Draegert, G. (1951). Relationships between voice variables and speech intelligibility in high level noise. Speech Monograph.
Google Scholar
Draper, M., Ladefoged, P., and Whiteridge, D. (1959). Respiratory muscles in speech. Journal of Speech and Hearing Research, 2:16–27.
Google Scholar
Dreher, J. and O’Neill, J. (1957). Effects of ambient noise on speaker intelligibility for words and phrases. J. Acoust. Soc. Am., 29:1320–1323.
Article Google Scholar
Dunn, H. (1950). The calculation of vowel resonances, and an electrical vocal tract. J. Acoust. Soc. Am., 22:151–166.
Article Google Scholar
Elliot, L. (1962). Backward and forward masking of probe tones of different frequencies. J. Acoust. Soc. Am., 34:1116–1117.
Article Google Scholar
Fant, G. (1960). Acoustic Theory of Speech Production. Mouton.
Google Scholar
Fant, G. (1973). Speech Sounds and Features. M.I.T. Press.
Google Scholar
Farnsworth, D. (1940). High speed motion pictures of the human vocal cords. Technical report, Bell Lab. Record.
Google Scholar
Flanagan, J. (1958). Some properties of the glottal sound source. Journal of Speech and Hearing Research, 1:99–116.
Google Scholar
Flanagan, J. (1972). Speech Analysis Synthesis and Perception. Springer-Verlag, 2nd ed.
Google Scholar
Fletcher, H. and Munson, W. (1933). Loudness, its definition, measurement, and calculation. J. Acoust. Soc. Am., 5:82–108.
Article Google Scholar
Fletcher, H. and Munson, W. (1937). Relation between loudness and masking. J. Acoust. Soc. Am., 9(1).
Google Scholar
Google Scholar
Fujimura, O. (1962). Analysis of nasal consonants. J. Acoust. Soc. Am., 34:1865–1875.
Article Google Scholar
Fujisaki, H. and Kunisaki, O. (1978). Analysis, recognition, and perception of voiceless fricative consonants in Japanese. IEEE Trans. ASSP, 26(l):21–27.
Article Google Scholar
Hansen, J. (1988). Analysis and compensation of stressed and noisy speech with application to robust automatic recognition. Ph.D. thesis. Georgia Institute of Technology.
Google Scholar
Harris, D. and Dallos, P. (1979). Forward masking of auditory nerve fiber responses. Journal of Neurophysiology, 42:1083–1107.
Google Scholar
Heinz, J. and Stevens, K. (1961). On the properties of voiceless fricative consonants. J. Acoust. Soc. Am., 33(5):589–596.
Article Google Scholar
Hirano, M. (1976). Structure and vibratory behavior of the vocal folds. In Sawashima, M. and Cooper, F.-S., editors, U.S.-Japan Joint Seminar on Dynamics Aspects of Speech Production, pages 13–27. Univ. of Tokyo Press.
Google Scholar
Houtgast, T. (1972). Psychophysical evidence for lateral inhibition in hearing. J. Acoust. Soc. Am., 51(6.2): 1885–1894.
Article Google Scholar
Jakobson, R., Fant, G., and Halle, M. (1952). Preliminaries to Speech Analysis, 1st edition. M.I.T. Press.
Google Scholar
Jakobson, R., Fant, G., and Halle, M. (1961). Preliminaries to Speech Analysis: The Distinctive Features and Their Correlates. M.I.T. Press.
Google Scholar
Javel, E., McGee, J., Walsh, E., Farley, G., and Gorga, M. (1983). Suppression of auditory-nerve responses. Suppression threshold and growth, iso-suppression contours. J. Acoust. Soc. Am., 74(3):801–813.
Article Google Scholar
Junqua, J.-C. (1989). Toward robustness in isolated-word automatic speech recognition. Ph.D. thesis. University of Nancy I, STL Monograph.
Google Scholar
Junqua, J.-C. (1993). The Lombard reflex and its role on human listeners and automatic speech recognizers. J. Acoust. Soc. Am., 93(1):510–524.
Article Google Scholar
Junqua, J.-C. and Wakita, H. (1989). A comparative study of cepstral lifters and distance measures for all-pole models of speech in noise. In ICASSP, pages 476–479.
Google Scholar
Kiang, N. (1968). A survey of recent developments in the study of auditory physiology. Ann. Otol. Rhinol. Laryngol, 77:656–675.
Google Scholar
Kiang, N., Watanabe, T., Thomas, E., and Clark, L. (1965). Discharge Patterns of Single Fibres in the Cat’s Auditory Nerve. M.I.T. Press.
Google Scholar
Koenig, W., Dunn, H., and Lacey, L. (1946). The sound spectrograph. J. Acoust. Soc. Am., 18:19–49.
Article Google Scholar
Ladefoged, P. (1985). The phonetic basis for computer speech processing. In Fallside, F. and Woods, W. A., editors, Computer Speech Processing, pages 3–27. Prentice Hall.
Google Scholar
Lahiri, A., Gewirth, L., and Blumstein, S. (1984). A reconsideration of acoustic invariance for place of articulation in diffuse stop consonants: Evidence from a cross-language study. J. Acoust. Soc. Am., 76(2):391–404.
Article Google Scholar
Lamel, L. (1988). Formalizing Knowledge Used in Spectrogram Reading: Acoustic and Perceptual Evidence of Stops. Ph.D. thesis. Massachusetts Institute of Technology.
Google Scholar
Lane, H. and Tranel, B. (1971). The Lombard sign and the role of hearing in speech. Journal of Speech and Hearing Research, 14:677–709.
Google Scholar
Lane, H., Tranel, B., and Sisson, C. (1970). Regulation of voice communication by sensory dynamics. J. Acoust. Soc. Am., 47(2):618–624.
Article Google Scholar
Lombard, E. (1911). Le signe de l’élévation de la voix. Ann. Maladies Oreille, Larynx, Nez, Pharynx, 37:101–119.
Google Scholar
Makhoul, J. and Cosell, L. (1976). LPCW: An LPC vocoder with linear predictive warping. In ICASSP, pages 466–469.
Google Scholar
Moller, A. (1961). Network model of the middle ear. J. Acoust. Soc. Am., 33:168–176.
Article Google Scholar
Olive, J., Greenwood, A., and Coleman, J. (1993). Acoustics of American English Speech. A Dynamic Approach. Springer-Verlag.
Google Scholar
O’Shaughnessy, D. (1987). Speech Communication: Human and Machine. Addison-Wesley.
Google Scholar
Peterson, G. and Barney, H. (1952). Control methods used in a study of vowels. J. Acoust. Soc. Am., 24(2): 175–184.
Article Google Scholar
Picheny, M., Durlach, N., and Braida, L. (1985). Speaking clearly for the hearing impaired I: Intelligibility differences between clear and conversational speech. Journal of Speech and Hearing Research, 28:96–103.
Google Scholar
Picheny, M., Durlach, N., and Braida, L. (1986). Speaking clearly for the hard of hearing TL: Acoustic characteristics of clear and conversational speech. Journal of Speech and Hearing Research, 29:434–446.
Google Scholar
Pick, H., Siegel J., Fox, P., Garber, S., and Kearney, J. (1989). Inhibiting the Lombard effect. J. Acoust. Soc. Am., 85(2):894–900.
Article Google Scholar
Pickett, J. (1956). Effects of vocal force on the intelligibility of speech sounds. J. Acoust. Soc. Am., 28(5):902–905.
Article Google Scholar
Pickett, J. (1980). The Sounds of Speech Communication. University Park Press.
Google Scholar
Pisoni, D., Bernacki, R., Nusbaum, H., and Yuchtman, M. (1985). Some acoustic-phonetic correlates of speech produced in noise. In ICASSP, pages 1581–1584.
Google Scholar
Rose, J., Brugge, J., Anderson, D., and Hind, J. (1967). Phase-locked response to low-frequency tones in single auditory nerve fibers of the squirrel monkey. J. Neu-rophysiol., 30:769–793.
Google Scholar
Rose, J., Hind, J., Anderson, D., and Brugge, J. (1971). Some effects of stimulus intensity on response of auditory nerve fibers in the squirrel monkey. Neurophysiol, 34:685–699.
Google Scholar
Rostolland, D. (1982a). Acoustic features of shouted voice. Acustica, 50(2): 118–125.
Google Scholar
Rostolland, D. (1982b). Phonetic structure of shouted voice. Acustica, 51(2):80–89.
Google Scholar
Rostolland, D. (1985). Intelligibility of shouted voice. Acustica, 57(3): 104–121.
Google Scholar
Schulman, R. (1985). Articulatory targeting and perceptual constancy of loud speech. Technical report, PERILUS IV, Stockholm University.
Google Scholar
Schulman, R. (1989). Articulatory dynamics of loud and normal speech. J. Acoust. Soc. Am., 85(1):295–312.
Article Google Scholar
Stanton, B., Jamieson, L., and Allen, G. (1988). Acoustic-phonetic analysis of loud and Lombard speech in simulated cockpit conditions. In ICASSP, pages 331–334.
Google Scholar
Stevens, K. (1956). Stop consonants. Technical report, Acoustic Lab., Massachusetts Institute of Technology.
Google Scholar
Stevens, K. (1971). Airflow and turbulent noise for fricative and stop consonants: Statistic considerations. J. Acoust. Soc. Am., 50:1180–1192.
Article Google Scholar
Stevens, S. and Volkmann, J. (1940). The relation of pitch to frequency. Am. J. Psychol., 53(4, part 2):329.
Article Google Scholar
Strevens, P. (1960). Spectra of fricative noise in human speech. Language & Speech, 3:32–49.
Google Scholar
Summers, W., Pisoni, D., Bernacki, R., Pedlow, R., and Stokes, M. (1988). Effects of noise on speech production: Acoustic and perceptual analyses. J. Acoust. Soc. Am., 84(3):917–928.
Article Google Scholar
Traunmüller, H. (1985). The role of the fundamental and the higher formants in the perception of speaker size, vocal effort, and vowel openess. Technical report, Stockholm University.
Google Scholar
Ungeheuer, G. (1962). Elemente Einer Akustischen Theorie der Vokalarticulation. Springer-Verlag.
Google Scholar
von Békésy, G. (1960). Experiments in Hearing. McGraw-Hill.
Google Scholar
Whitehead, R., Metz, D., and Whitehead, B. (1984). Vibration patterns of the vocal folds during pulse register phonation. J. Acoust. Soc. Am., 75(4): 1293–1996.
Article Google Scholar
Wickelgren, W. A. (1966). Distinctive features and errors in short-term memory for English consonants. J. Acoust. Soc. Am., 39:388–398.
Article Google Scholar
Zahorian, S. and Rothenberg, M. (1981). Principal-component analysis for low-redundancy encoding of speech spectra. J. Acoust. Soc. Am., 69(3):832–845.
Article Google Scholar
Zwicker, E. and Feldtkeller, R. (1981). Psychoacoustique: L’oreille Récepteur d’Informations. Masson.
Google Scholar
Zwicker, E. and Terhardt, E. (1980). Analytical expressions for critical band rate and critical bandwidth as a function of frequency. J. Acoust. Soc. Am., 68(5): 1523–1525.
Article Google Scholar
Zwislocki, J. (1959). Electrical model of the middle ear. J. Acoust. Soc. Am., 31:841
Article Google Scholar

Download references

Author information

Authors and Affiliations

Speech Technology Laboratory, USA
Jean-Claude Junqua
CRIN - INRIA, France
Jean-Paul Haton

Authors

Jean-Claude Junqua
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Paul Haton
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Junqua, JC., Haton, JP. (1996). Nature and Perception of Speech Sounds. In: Robustness in Automatic Speech Recognition. The Kluwer International Series in Engineering and Computer Science, vol 341. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1297-0_1

Download citation

DOI: https://doi.org/10.1007/978-1-4613-1297-0_1
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4612-8555-7
Online ISBN: 978-1-4613-1297-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics