Summary
This chapter reviews the fundamentals of speech production, acoustics and phonetics of speech sounds as well as their time-frequency representation. Then, the basic structure of the auditory system and the main mechanisms influencing speech perception are briefly described. Throughout this chapter, we also emphasize the influence of noise on speech production and perception. By introducing basic characteristics of speech sounds and how they are produced and perceived, we intend to provide the essential knowledge needed to understand the following chapters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ainsworth, W. (1976). Mechanisms of Speech Recognition. Pergamon Press.
Anglade, Y. (1994). Robustesse de la Reconnaissance Automatique de la Parole: Etude et Application dans un Système d’Aide Vocal pour une Standardiste Mal-Voyante. Ph.D. thesis. Université Henri Poincaré, Nancy I.
Atkinson, J. (1978). Correlation analysis of the physiological factors controlling fundamental voice frequency. J. Acoust. Soc. Am., 63(1):211–222.
Bond, Z., Moore, T., and Gable, B. (1989). Acoustic-phonetic characteristics of speech produced in noise and while wearing an oxygen mask. J. Acoust. Soc. Am., 85(2):907–912.
Byrd, D. (1993). 54,000 American stops. Technical report, UCLA Working Papers in Phonetics.
Calliope (1989). La Parole et son Traitement Automatique. Masson.
Chiba, T. and Kajiyama, M. (1941). The Vowel, its Nature and Structure. Kaseikan.
Chomsky, N. and Halle, M. (1968). The Sound Pattern of English. Harper and Row.
Coker, C. and Umeda, N. (1975). The importance of spectral details in initial-final constrasts of voiced stops. Journal of Phonetics, 3:63–68.
Datta, A., Ganguli, N., and Majumder, D. (1981). Acoustic features of consonants: A study based on Telugu speech sounds. Acustica, 47(2):72–82.
Deng, L. and Sun, D. (1994). Phonetic classification and recognition using HMM representation of overlapping articulatory features for all classes of English sounds. In ICASSP, pages 45–48.
Draegert, G. (1951). Relationships between voice variables and speech intelligibility in high level noise. Speech Monograph.
Draper, M., Ladefoged, P., and Whiteridge, D. (1959). Respiratory muscles in speech. Journal of Speech and Hearing Research, 2:16–27.
Dreher, J. and O’Neill, J. (1957). Effects of ambient noise on speaker intelligibility for words and phrases. J. Acoust. Soc. Am., 29:1320–1323.
Dunn, H. (1950). The calculation of vowel resonances, and an electrical vocal tract. J. Acoust. Soc. Am., 22:151–166.
Elliot, L. (1962). Backward and forward masking of probe tones of different frequencies. J. Acoust. Soc. Am., 34:1116–1117.
Fant, G. (1960). Acoustic Theory of Speech Production. Mouton.
Fant, G. (1973). Speech Sounds and Features. M.I.T. Press.
Farnsworth, D. (1940). High speed motion pictures of the human vocal cords. Technical report, Bell Lab. Record.
Flanagan, J. (1958). Some properties of the glottal sound source. Journal of Speech and Hearing Research, 1:99–116.
Flanagan, J. (1972). Speech Analysis Synthesis and Perception. Springer-Verlag, 2nd ed.
Fletcher, H. and Munson, W. (1933). Loudness, its definition, measurement, and calculation. J. Acoust. Soc. Am., 5:82–108.
Fletcher, H. and Munson, W. (1937). Relation between loudness and masking. J. Acoust. Soc. Am., 9(1).
Fujimura, O. (1962). Analysis of nasal consonants. J. Acoust. Soc. Am., 34:1865–1875.
Fujisaki, H. and Kunisaki, O. (1978). Analysis, recognition, and perception of voiceless fricative consonants in Japanese. IEEE Trans. ASSP, 26(l):21–27.
Hansen, J. (1988). Analysis and compensation of stressed and noisy speech with application to robust automatic recognition. Ph.D. thesis. Georgia Institute of Technology.
Harris, D. and Dallos, P. (1979). Forward masking of auditory nerve fiber responses. Journal of Neurophysiology, 42:1083–1107.
Heinz, J. and Stevens, K. (1961). On the properties of voiceless fricative consonants. J. Acoust. Soc. Am., 33(5):589–596.
Hirano, M. (1976). Structure and vibratory behavior of the vocal folds. In Sawashima, M. and Cooper, F.-S., editors, U.S.-Japan Joint Seminar on Dynamics Aspects of Speech Production, pages 13–27. Univ. of Tokyo Press.
Houtgast, T. (1972). Psychophysical evidence for lateral inhibition in hearing. J. Acoust. Soc. Am., 51(6.2): 1885–1894.
Jakobson, R., Fant, G., and Halle, M. (1952). Preliminaries to Speech Analysis, 1st edition. M.I.T. Press.
Jakobson, R., Fant, G., and Halle, M. (1961). Preliminaries to Speech Analysis: The Distinctive Features and Their Correlates. M.I.T. Press.
Javel, E., McGee, J., Walsh, E., Farley, G., and Gorga, M. (1983). Suppression of auditory-nerve responses. Suppression threshold and growth, iso-suppression contours. J. Acoust. Soc. Am., 74(3):801–813.
Junqua, J.-C. (1989). Toward robustness in isolated-word automatic speech recognition. Ph.D. thesis. University of Nancy I, STL Monograph.
Junqua, J.-C. (1993). The Lombard reflex and its role on human listeners and automatic speech recognizers. J. Acoust. Soc. Am., 93(1):510–524.
Junqua, J.-C. and Wakita, H. (1989). A comparative study of cepstral lifters and distance measures for all-pole models of speech in noise. In ICASSP, pages 476–479.
Kiang, N. (1968). A survey of recent developments in the study of auditory physiology. Ann. Otol. Rhinol. Laryngol, 77:656–675.
Kiang, N., Watanabe, T., Thomas, E., and Clark, L. (1965). Discharge Patterns of Single Fibres in the Cat’s Auditory Nerve. M.I.T. Press.
Koenig, W., Dunn, H., and Lacey, L. (1946). The sound spectrograph. J. Acoust. Soc. Am., 18:19–49.
Ladefoged, P. (1985). The phonetic basis for computer speech processing. In Fallside, F. and Woods, W. A., editors, Computer Speech Processing, pages 3–27. Prentice Hall.
Lahiri, A., Gewirth, L., and Blumstein, S. (1984). A reconsideration of acoustic invariance for place of articulation in diffuse stop consonants: Evidence from a cross-language study. J. Acoust. Soc. Am., 76(2):391–404.
Lamel, L. (1988). Formalizing Knowledge Used in Spectrogram Reading: Acoustic and Perceptual Evidence of Stops. Ph.D. thesis. Massachusetts Institute of Technology.
Lane, H. and Tranel, B. (1971). The Lombard sign and the role of hearing in speech. Journal of Speech and Hearing Research, 14:677–709.
Lane, H., Tranel, B., and Sisson, C. (1970). Regulation of voice communication by sensory dynamics. J. Acoust. Soc. Am., 47(2):618–624.
Lombard, E. (1911). Le signe de l’élévation de la voix. Ann. Maladies Oreille, Larynx, Nez, Pharynx, 37:101–119.
Makhoul, J. and Cosell, L. (1976). LPCW: An LPC vocoder with linear predictive warping. In ICASSP, pages 466–469.
Moller, A. (1961). Network model of the middle ear. J. Acoust. Soc. Am., 33:168–176.
Olive, J., Greenwood, A., and Coleman, J. (1993). Acoustics of American English Speech. A Dynamic Approach. Springer-Verlag.
O’Shaughnessy, D. (1987). Speech Communication: Human and Machine. Addison-Wesley.
Peterson, G. and Barney, H. (1952). Control methods used in a study of vowels. J. Acoust. Soc. Am., 24(2): 175–184.
Picheny, M., Durlach, N., and Braida, L. (1985). Speaking clearly for the hearing impaired I: Intelligibility differences between clear and conversational speech. Journal of Speech and Hearing Research, 28:96–103.
Picheny, M., Durlach, N., and Braida, L. (1986). Speaking clearly for the hard of hearing TL: Acoustic characteristics of clear and conversational speech. Journal of Speech and Hearing Research, 29:434–446.
Pick, H., Siegel J., Fox, P., Garber, S., and Kearney, J. (1989). Inhibiting the Lombard effect. J. Acoust. Soc. Am., 85(2):894–900.
Pickett, J. (1956). Effects of vocal force on the intelligibility of speech sounds. J. Acoust. Soc. Am., 28(5):902–905.
Pickett, J. (1980). The Sounds of Speech Communication. University Park Press.
Pisoni, D., Bernacki, R., Nusbaum, H., and Yuchtman, M. (1985). Some acoustic-phonetic correlates of speech produced in noise. In ICASSP, pages 1581–1584.
Rose, J., Brugge, J., Anderson, D., and Hind, J. (1967). Phase-locked response to low-frequency tones in single auditory nerve fibers of the squirrel monkey. J. Neu-rophysiol., 30:769–793.
Rose, J., Hind, J., Anderson, D., and Brugge, J. (1971). Some effects of stimulus intensity on response of auditory nerve fibers in the squirrel monkey. Neurophysiol, 34:685–699.
Rostolland, D. (1982a). Acoustic features of shouted voice. Acustica, 50(2): 118–125.
Rostolland, D. (1982b). Phonetic structure of shouted voice. Acustica, 51(2):80–89.
Rostolland, D. (1985). Intelligibility of shouted voice. Acustica, 57(3): 104–121.
Schulman, R. (1985). Articulatory targeting and perceptual constancy of loud speech. Technical report, PERILUS IV, Stockholm University.
Schulman, R. (1989). Articulatory dynamics of loud and normal speech. J. Acoust. Soc. Am., 85(1):295–312.
Stanton, B., Jamieson, L., and Allen, G. (1988). Acoustic-phonetic analysis of loud and Lombard speech in simulated cockpit conditions. In ICASSP, pages 331–334.
Stevens, K. (1956). Stop consonants. Technical report, Acoustic Lab., Massachusetts Institute of Technology.
Stevens, K. (1971). Airflow and turbulent noise for fricative and stop consonants: Statistic considerations. J. Acoust. Soc. Am., 50:1180–1192.
Stevens, S. and Volkmann, J. (1940). The relation of pitch to frequency. Am. J. Psychol., 53(4, part 2):329.
Strevens, P. (1960). Spectra of fricative noise in human speech. Language & Speech, 3:32–49.
Summers, W., Pisoni, D., Bernacki, R., Pedlow, R., and Stokes, M. (1988). Effects of noise on speech production: Acoustic and perceptual analyses. J. Acoust. Soc. Am., 84(3):917–928.
Traunmüller, H. (1985). The role of the fundamental and the higher formants in the perception of speaker size, vocal effort, and vowel openess. Technical report, Stockholm University.
Ungeheuer, G. (1962). Elemente Einer Akustischen Theorie der Vokalarticulation. Springer-Verlag.
von Békésy, G. (1960). Experiments in Hearing. McGraw-Hill.
Whitehead, R., Metz, D., and Whitehead, B. (1984). Vibration patterns of the vocal folds during pulse register phonation. J. Acoust. Soc. Am., 75(4): 1293–1996.
Wickelgren, W. A. (1966). Distinctive features and errors in short-term memory for English consonants. J. Acoust. Soc. Am., 39:388–398.
Zahorian, S. and Rothenberg, M. (1981). Principal-component analysis for low-redundancy encoding of speech spectra. J. Acoust. Soc. Am., 69(3):832–845.
Zwicker, E. and Feldtkeller, R. (1981). Psychoacoustique: L’oreille Récepteur d’Informations. Masson.
Zwicker, E. and Terhardt, E. (1980). Analytical expressions for critical band rate and critical bandwidth as a function of frequency. J. Acoust. Soc. Am., 68(5): 1523–1525.
Zwislocki, J. (1959). Electrical model of the middle ear. J. Acoust. Soc. Am., 31:841
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 1996 Kluwer Academic Publishers
About this chapter
Cite this chapter
Junqua, JC., Haton, JP. (1996). Nature and Perception of Speech Sounds. In: Robustness in Automatic Speech Recognition. The Kluwer International Series in Engineering and Computer Science, vol 341. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1297-0_1
Download citation
DOI: https://doi.org/10.1007/978-1-4613-1297-0_1
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4612-8555-7
Online ISBN: 978-1-4613-1297-0
eBook Packages: Springer Book Archive