Size Matters in Hearing: How the Auditory System Normalizes the Sounds of Speech and Music for Source Size
The sounds that mammals use to communicate, including the voiced parts of speech, have a very special “pulse resonance” form. In 1992, we drew attention to the fascinating time-interval patterns that these sounds produce at the output of a gammatone auditory filter bank (GT-AFB), and we described how to construct stabilized auditory images (SAIs) in which the time-interval patterns appear and evolve as distinctive auditory events. Since that time, the filter bank work has been extended to determine the “optimal” form of level-dependent AFB, and the SAI work has been extended to demonstrate that the stabilized time-interval patterns play a role in auditory perception. These two streams of research are presented as appendices in Sections 5 and 4 of this chapter, respectively.
The mathematics of the optimal AFB drew our attention to the fact that auditory perception is largely scale invariant; humans can understand people no matter what their size. We describe why size invariance is important in Section 1, and show how the auditory system might construct a scale invariant version of the SAI in Section 2. In Section 3, we describe research intended to demonstrate the value of scale invariance in the perception of speech and music, and to argue that machine processing of speech and music would be enhanced if feature extraction were based on a size-invariant SAI rather than a spectrographic representation of sound.
KeywordsVocal Tract Auditory Perception Just Noticeable Difference Auditory Event Auditory Filter
- Gabor, D. (1946). Theory of communication. Journal of the Institute of Electronic Engineers (London), 93, 429–457.Google Scholar
- Irino, T., & Patterson, R. D. (1997). A time-domain level-dependent auditory filter: The gammachirp. Journal of the Acoustical Society of America, 101, 412–419.Google Scholar
- Irino, T., & Patterson, R. D. (2006). A dynamic compressive gammachirp auditory filterbank. IEEE Transactions of Audio Speech & Language Processing, 14, 2222–2232.Google Scholar
- Patterson, R. D., van Dinther, R., & Irino, T. (2007). The robustness of bio-acoustic communication and the role of normalization. In Proceedings of the 19th International Congress on Acoustics (Madrid), pp. a-07–011.Google Scholar
- Patterson, R. D., Smith, D. R. R., van Dinther, R., & Walters, T. C. (2008). Size information in the production and perception of communication sounds. In W. A. Yost, A. N. Popper, & R. R. Fay (Eds.), Auditory perception of sound sources (pp. 43–75). New York: Springer Science + Business Media.Google Scholar
- Walters, T. C. (2011). Auditory-based processing of communication sounds. Ph.D. dissertation, University of Cambridge.Google Scholar