Nonlinear auditory models yield new insights into representations of vowels

  • Laurel H. CarneyEmail author
  • Joyce M. McDonough
Perceptual/Cognitive Constraints on the Structure of Speech Communication: In Honor of Randy Diehl


Studies of vowel systems regularly appeal to the need to understand how the auditory system encodes and processes the information in the acoustic signal. The goal of this study is to present computational models to address this need, and to use the models to illustrate responses to vowels at two levels of the auditory pathway. Many of the models previously used to study auditory representations of speech are based on linear filter banks simulating the tuning of the inner ear. These models do not incorporate key nonlinear response properties of the inner ear that influence responses at conversational-speech sound levels. These nonlinear properties shape neural representations in ways that are important for understanding responses in the central nervous system. The model for auditory-nerve (AN) fibers used here incorporates realistic nonlinear properties associated with the basilar membrane, inner hair cells (IHCs), and the IHC-AN synapse. These nonlinearities set up profiles of f0-related fluctuations that vary in amplitude across the population of frequency-tuned AN fibers. Amplitude fluctuations in AN responses are smallest near formant peaks and largest at frequencies between formants. These f0-related fluctuations strongly excite or suppress neurons in the auditory midbrain, the first level of the auditory pathway where tuning for low-frequency fluctuations in sounds occurs. Formant-related amplitude fluctuations provide representations of the vowel spectrum in discharge rates of midbrain neurons. These representations in the midbrain are robust across a wide range of sound levels, including the entire range of conversational-speech levels, and in the presence of realistic background noise levels.


Audition Speech perception Physiological psychology 



Supported by National Institutes of Health Grant # NIDCD R01-001641. This project received a boost of energy from a fascinating conversation with Professor Björn Lindblom at the University of Stockholm. He also arranged for us to attend the workshop in honor of Professor Randy Diehl at the University of Texas at Austin, which further inspired this effort. Professor Kenneth Henry at the University of Rochester suggested the modification of the midbrain model for convenient BMF tuning.


  1. Becker-Kristal, R. (2010). Acoustic typology of vowel inventories and dispersion theory: Insights from a large cross-linguistic corpus. Unpublished dissertation, University of California, Los Angeles.Google Scholar
  2. Byrne, D., Dillon, H., Tran, K., Arlinger, S., Wilbraham, K., Cox, R., … Kiessling, J. (1994). An international comparison of long-term average speech spectra. The Journal of the Acoustical Society of America, 96(4), 2108–2120.CrossRefGoogle Scholar
  3. Carlson, R., & Granström, B. (1982). Towards an auditory spectrograph. In R. Carlson & B. Granström (Eds.), The representation of speech in the peripheral auditory system (pp. 109–114), Amsterdam: Elsevier.Google Scholar
  4. Carney, L. H. (1993). A model for the responses of low-frequency auditory-nerve fibers in cat. The Journal of the Acoustical Society of America, 93(1), 401–417.CrossRefGoogle Scholar
  5. Carney, L. H. (2018). Supra-threshold hearing and fluctuation profiles: Implications for sensorineural and hidden hearing loss. Journal of the Association for Research in Otolaryngology, 19(4), 331–352.CrossRefGoogle Scholar
  6. Carney, L. H., Kim, D. O., & Kuwada, S. (2016). Speech coding in the midbrain: Effects of sensorineural hearing loss. In P. van Dijk, D. Başkent, E. Gaudrain, E. de Kleine, A. Wagner, & C. Lanting (Eds.), Physiology, psychoacoustics and cognition in normal and impaired hearing (pp. 427–435). New York: Springer.CrossRefGoogle Scholar
  7. Carney, L. H., Li, T., & McDonough, J. M. (2015). Speech coding in the brain: representation of vowel formants by midbrain neurons tuned to sound fluctuations. Eneuro, 2, ENEURO-0004.Google Scholar
  8. Cody, A. R. (1992). Acoustic lesions in the mammalian cochlea: Implications for the spatial distribution of the ‘active process’. Hearing Research, 62(2), 166–172.CrossRefGoogle Scholar
  9. Crothers, J. (1978). Typology and universals of vowel systems. In J. H. Greenberg, C. A. Ferguson & E. A. Moravcsik (Eds.), Universals of human language (Vol. 2), pp. 99–152). Stanford: Stanford University Press.Google Scholar
  10. Dallos, P. (1985). Response characteristics of mammalian cochlear hair cells. Journal of Neuroscience, 5, 1591–1608.CrossRefGoogle Scholar
  11. Dallos, P. (1986). Neurobiology of cochlear inner and outer hair cells: Intracellular recordings. Hearing Research, 22, 185–198.CrossRefGoogle Scholar
  12. Delgutte, B. (1987). Peripheral auditory processing of speech information: Implications from a physiological study of intensity discrimination. In M. E. Schouten (Ed.), The psychophysics of speech perception (pp 333–353). Amsterdam: Springer.CrossRefGoogle Scholar
  13. Delgutte, B. (1996). Physiological models for basic auditory percepts. In H. L. Hawkins, T. A. McMullen, & R. R. Fay (Eds.), Auditory computation (pp. 157–220). New York: Springer.CrossRefGoogle Scholar
  14. Delgutte, B., & Kiang, N. Y. (1984). Speech coding in the auditory nerve: I. Vowel-like sounds. The Journal of the Acoustical Society of America, 75, 866–878.CrossRefGoogle Scholar
  15. Deng, L., Geisler, C. D., & Greenberg, S. (1987). Responses of auditory-nerve fibers to multiple-tone complexes. The Journal of the Acoustical Society of America, 82, 1989–2000.CrossRefGoogle Scholar
  16. Diehl, R. (2000). Searching for an auditory description of vowel categories. Phonetica, 57, 267–274.CrossRefGoogle Scholar
  17. Diehl, R., Kluender, K., Walsh, M. & Parker, E. (1991). Auditory enhancement in speech perception and phonology. In R. Hoffman & D. Palermo (Eds.), Cognition and the symbolic processes, Vol 3: Applied and ecological perspectives (pp. 59–76). Hillsdale: Erlbaum.Google Scholar
  18. Diehl, R., & Lindblom, B. (2004). Explaining the structure of feature and phoneme inventories. In S. Greenberg, W. Ainsworth, A. Popper, R. Fay (Eds.), Speech processing in the auditory system (101–162). New York: Springer-Verlag.CrossRefGoogle Scholar
  19. Diehl, R., Lindblom, B., & Creeger, C. (2003). Increasing realism of auditory representations yields further insights into vowel phonetics. Causal Publications Adelaide (Vol. 2, pp. 1381–1384).Google Scholar
  20. Diehl, R. L. (2008). Acoustic and auditory phonetics: The adaptive design of speech sound systems. Philosophical Transactions of the Royal Society London B: Biological Science, 363, 965–978.CrossRefGoogle Scholar
  21. Diehl, R. L., & Kluender, K. (1989). On the objects of speech perception. Ecological Psychology, 1, 121–144.CrossRefGoogle Scholar
  22. Ghosh, P. K., Goldstein, L. M., & Narayanan, S. S. (2011). Processing speech signal using auditory-like filterbank provides least uncertainty about articulatory gestures. The Journal of the Acoustical Society of America, 129(6), 4014–4022. doi: CrossRefGoogle Scholar
  23. Guinan, J. J. (2011). Physiology of the medial and lateral olivocochlear systems. In D. K. Ryugo, R. R. Fay, & A. N. Popper (Eds.), Auditory and vestibular efferents (pp. 39–81). New York: Springer.CrossRefGoogle Scholar
  24. Henry, K. S., Abrams, K. S., Forst, J., Mender, M. J., Neilans, E. G., Idrobo, F., & Carney, L. H. (2017). Midbrain synchrony to envelope structure supports behavioral sensitivity to single-formant vowel-like sounds in noise. Journal of the Association for Research in Otolaryngology, 18, 165–181.CrossRefGoogle Scholar
  25. Hillenbrand, J., Getty, L. A., Clark, M. J., & Wheeler, K. (1995). Acoustic characteristics of American English vowels. The Journal of the Acoustical society of America, 97(5), 3099–3111.CrossRefGoogle Scholar
  26. Hillenbrand, J. M., & Nearey, T. M. (1999). Identification of resynthesized /hVd/ utterances: Effects of formant contour. The Journal of the Acoustical Society of America, 105, 3509–3523.CrossRefGoogle Scholar
  27. Howard, J., Roberts, W. M., & Hudspeth, A. J. (1988). Mechanoelectrical transduction by hair cells. Annual Review of Biophysics and Biophysical Chemistry, 17(1), 99–124.CrossRefGoogle Scholar
  28. Hudspeth, A. J. (2014). Integrating the active process of hair cells with cochlear function. Nature Reviews Neuroscience, 15(9), 600.CrossRefGoogle Scholar
  29. Ibrahim, R. A., & Bruce, I. C. (2010). Effects of peripheral tuning on the auditory nerve’s representation of speech envelope and temporal fine structure cues. In E. Lopez-Poveda, A. Palmer, & R. Meddis (Eds.), The neurophysiological bases of auditory perception (pp. 429–438). New York: Springer.CrossRefGoogle Scholar
  30. Joris, P. X., Schreiner, C. E., & Rees, A. (2004). Neural processing of amplitude-modulated sounds. Physiological Reviews, 84, 541–577.CrossRefGoogle Scholar
  31. Keithley, E. M., & Schreiber, R. C. (1987). Frequency map of the spiral ganglion in the cat. The Journal of the Acoustical Society of America, 81(4), 1036–1042.CrossRefGoogle Scholar
  32. Kim, D. O. (1986). Active and nonlinear cochlear biomechanics and the role of outer-hair-cell subsystem in the mammalian auditory system. Hearing Research, 22(1/3), 105–114.CrossRefGoogle Scholar
  33. Kingston, J., & Diehl, R. L. (1994). Phonetic knowledge. Language, 70, 419–454.CrossRefGoogle Scholar
  34. Langner, G., & Schreiner, C. E. (1988). Periodicity coding in the inferior colliculus of the cat: I. Neuronal mechanisms. Journal of Neurophysiology, 60, 1799–1822.CrossRefGoogle Scholar
  35. Liberman, M. C. (1978). Auditory-nerve response from cats raised in a low-noise chamber. The Journal of the Acoustical Society of America, 63, 442–455.CrossRefGoogle Scholar
  36. Liljencrants, J., & Lindblom, B. (1972). Numerical simulation of vowel quality systems: The role of perceptual contrast. Language, 48, 839–862.CrossRefGoogle Scholar
  37. Lindblom, B. (1986). Phonetic universals in vowel systems. In J. J. Ohala & J. J. Jaeger (Eds.), Experimental phonology (pp. 13–44). Orlando: Academic Press.Google Scholar
  38. Lindblom, B., & Maddieson, I. (1988). Phonetic universals in consonant systems. In L. M. Hyman & C. N. Li (Eds.), Language, speech and mind: Studies in honor of Victoria Fromkin (pp. 62–78). London: Routledge.Google Scholar
  39. Maddieson, I. (1984). Patterns of sounds (Cambridge Studies in Speech Science and Communication) Cambridge: Cambridge University Press. doi: Google Scholar
  40. Miller, R. L., Schilling, J. R., Franck, K. R., & Young, E. D. (1997). Effects of acoustic trauma on the representation of the vowel /ε/ in cat auditory nerve fibers. The Journal of the Acoustical Society of America, 101, 3602–3616.CrossRefGoogle Scholar
  41. Nearey, T. M. (1997). Speech perception as pattern recognition. Journal of the Acoustical Society of America, 101(6), 3241–3256.CrossRefGoogle Scholar
  42. Nelson, P. C., & Carney, L. H. (2004). A phenomenological model of peripheral and central neural responses to amplitude-modulated tones. The Journal of the Acoustical Society of America 116, 2173–2186.CrossRefGoogle Scholar
  43. Nelson, P. C., & Carney, L. H. (2007). Neural rate and timing cues for detection and discrimination of amplitude-modulated tones in the awake rabbit inferior colliculus. Journal of Neurophysiology, 97, 522–539.CrossRefGoogle Scholar
  44. Plomp, R. (1970). Timbre as a multidimensional attribute of complex tones. In R. Plomp & G. F. Smoorenburg (Eds.), Frequency analysis and periodicity detection in hearing (pp. 397–414). Sijthoff, Leiden.Google Scholar
  45. Rao, A., & Carney, L. H. (2014). Speech enhancement for listeners with hearing loss based on a model for vowel coding in the auditory midbrain. IEEE Transactions on Bio-Medical Engineering, 61(7), 2081–2091.CrossRefGoogle Scholar
  46. Ruggero, M. A., Robles, L., & Rich, N. C. (1992). Two-tone suppression in the basilar membrane of the cochlea: Mechanical basis of auditory-nerve rate suppression. Journal of Neurophysiology, 68(4), 1087–1099.CrossRefGoogle Scholar
  47. Russell, I. J., Richardson, G. P., & Cody, A. R. (1986). Mechanosensitivity of mammalian auditory hair cells in vitro. Nature, 321, 517–519.CrossRefGoogle Scholar
  48. Russell, I. J., & Sellick, P. M. (1983). Low-frequency characteristics of intracellularly recorded receptor potentials in guinea-pig cochlear hair cells. The Journal of Physiology, 338, 179–206.CrossRefGoogle Scholar
  49. Sachs, M. B., & Young, E. D. (1980). Effects of nonlinearities on speech encoding in the auditory nerve, The Journal of the Acoustical Society of America, 68, 858–875.CrossRefGoogle Scholar
  50. Schwartz, J. L., Boë, L. J., Vallée, N., & Abry, C. (1997a). Major trends in vowel system inventories. J. Phonetics 25: 233-253.CrossRefGoogle Scholar
  51. Schwartz, J. L., Boë, L. J., Vallée, N., & Abry, C. (1997b). The dispersion-focalization theory of vowel systems. Journal of Phonetics, 25, 255–286.CrossRefGoogle Scholar
  52. Shera, C. A., Guinan, J. J., & Oxenham, A. J. (2002). Revised estimates of human cochlear tuning from otoacoustic and behavioral measurements Proceedings of the National Academy of Sciences, 99(5), 3318–3323.Google Scholar
  53. Stevens, K. (1989). On the quantal nature of speech. Journal of Phonetics, 17, 3–46.Google Scholar
  54. Stevens, K. N. (1972). The quantal nature of speech: evidence from articulatory–acoustic data. In E. E. David & P. B. Denes (Eds.), Human communication: A unified view (pp. 51–66). New York: McGraw-Hill.Google Scholar
  55. Terreros, G., & Delano, P. H. (2015). Corticofugal modulation of peripheral auditory responses. Frontiers in Systems Neuroscience, 9(134), 1–8.Google Scholar
  56. Zeddies, D. G., & Siegel, J. H. (2004). A biophysical model of an inner hair cell. The Journal of the Acoustical Society of America, 116(1), 426–441.CrossRefGoogle Scholar
  57. Zhang, X., Heinz, M. G., Bruce, I. C., & Carney, L. H. (2001). A phenomenological model for the responses of auditory-nerve fibers: I. Nonlinear tuning with compression and suppression. The Journal of the Acoustical Society of America, 109(2), 648-670.CrossRefGoogle Scholar
  58. Zilany, M. S., & Bruce, I. C. (2006). Modeling auditory-nerve responses for high sound pressure levels in the normal and impaired auditory periphery. The Journal of the Acoustical Society of America, 120(3), 1446–1466.CrossRefGoogle Scholar
  59. Zilany, M. S., Bruce, I. C., Carney, L. H. (2014). Updated parameters and expanded simulation options for a model of the auditory periphery. The Journal of the Acoustical Society of America, 135, 283–286.CrossRefGoogle Scholar
  60. Zilany, M.S., Bruce, I.C., Nelson, P.C., Carney, L.H. (2009). A phenomenological model of the synapse between the inner hair cell and auditory nerve: long-term adaptation with power-law dynamics. The Journal of the Acoustical Society of America 126:2390-2412.CrossRefGoogle Scholar
  61. Zilany, M. S. A., & Bruce, I. C. (2007). Representation of the vowel /ε/ in normal and impaired auditory nerve fibers: Model predictions of responses in cats. The Journal of the Acoustical Society of America, 122, 402–417.CrossRefGoogle Scholar

Copyright information

© The Psychonomic Society, Inc. 2018

Authors and Affiliations

  1. 1.Departments of Biomedical Engineering and NeuroscienceUniversity of RochesterRochesterUSA
  2. 2.Department of LinguisticsUniversity of RochesterRochesterUSA

Personalised recommendations