Abstract
Recent studies have demonstrated that listeners can extract information about the length of a speaker’s vocal tract from voiced vowels (Smith et al., 2005) and syllables (Ives et al. 2005). Smith and Patterson (2005) have also reported that they can extract size information from unvoiced vowels. In this paper, we extend the observations to words recorded in natural conversation mode to demonstrate that size perception is robust to changes in the mode of vocal excitation. We used a high-quality vocoder, STRAIGHT to produce scaled versions of voiced words with natural F0-contours, and scaled versions of unvoiced words that sounded like they were whispered. Size discrimination performance was measured for five reference speakers, using a two-alternative, forced-choice paradigm (2AFC) with the method of constant stimuli. The listener was asked to choose the interval with the word, or words, spoken by the smaller person. The just-noticeable-difference (JND) discrimination of speaker size was found to be about 5 % independent to the mode of vocal excitation. This value is a little greater than that reported in the previous syllable experiments and a little smaller than that reported in the vowel experiments. Moreover, the natural F0-contour in the voiced words and noise excitation in the whispered words did not affect the JND value. The results support the hypothesis that the auditory system segregates size and shape information at an early stage in the processing (Irino and Patterson, 2002).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aoki Y, Irino T, Kawahara H, Patterson RD (2008a) “Speaker size discrimination for acoustically scaled versions of naturally spoken words,” in Abstracts of ARO 31th Midwinter meeting, Phoenix, AZ, USA
Aoki Y, Irino T, Kawahara H, Patterson RD (2008b) Speaker size discrimination for acoustically scaled versions of whispered words. J Acoust Soc Am 123(5 Pt 2):3718
van Dinther R, Patterson RD (2006) Perception of acoustic scale and size in musical instruments sounds. J Acoust Soc Am 120:2158–2177
Fujimura O, Lindqvist J (1971) Sweep-tone measurements of vocal-tract characteristics. J Acoust Soc Am 49:541–558
Irino T, Patterson RD (2002) Segregating information about the size and shape of vocal tract using a time-domain auditory model: the stabilized wavelet-Mellin transform. Speech Commun 36:181–203
Irino T, Aoki Y, Hayashi Y, Kawahara H, Patterson RD (2007) “Discrimination and recognition of scaled word sounds.” In: Proc Interspeech 2007, pp 378-381, Antwerp, Belgium
Ives DT, Smith DRR, Patterson RD (2005) Discrimination of speaker size from syllable phrases. J Acoust Soc Am 118(6):3816–3822
Kawahara H (2006) STRAIGHT, exploitation of the other aspect of VOCODER: perceptually isomorphic decomposition of speech sounds. Acoust Sci Tech 27(6):349–353
Kawahara H, Masuda-Kasuse I, de Cheveigne A (1999) Restructuring speech representations using pitch-adaptive time-frequency smoothing and instantaneous-frequency-based F0 extraction: possible role of repetitive structure in sounds. Speech Commun 27(3-4):187–207
Peterson GE, Barney HL (1952) Control methods used in the study of vowels. J Acoust Soc Am 24:175–184
Smith DRR, Patterson RD, Turner R, Kawahara H, Irino T (2005) The processing and perception of size information in speech sounds. J Acoust Soc Am 117:305–318
Smith DRR, Patterson RD (2005) “The perception of scale in whispered vowels,” Meeting of the British Society of Audiology, Cardiff, Wales. http://www.pdn.cam.ac.uk/groups/cnbh/research/posters_talks/BSA2005/SPbsa05.pdf
Turner RE, Al-Hames MA, Smith DRR, Kawahara H, Irino T, Patterson RD (2006) Vowel normalization: time-domain processing of the internal dynamics of speech. In: Divenyi P, Greenberg S, Meyer G (eds) Dynamics of speech production and perception, NATO science series, series A: Life sciences. IOS Press, Amsterdam, pp 153–170
Turner RE, Walters TC, Mongaghan JJM, Patterson RD (2009) A statistical, formant-pattern model for segregating vowel type and vocal-tract length in developmental formant data, J Acoust Soc Am 125:2374–2386
Wichmann FA, Hill NJ (2001) The psychometric function: I. Fitting, sampling, and goodness of fit. Perception and Psychophysics, 63:1293–1313
Acknowledgments
This research was supported by the Japan Society for the Promotion of Science Grant-in-Aid (B18300060) and the UK Medical Research Council (G0500221).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this paper
Cite this paper
Irino, T., Aoki, Y., Kawahara, H., Patterson, R.D. (2010). Size Perception for Acoustically Scaled Sounds of Naturally Pronounced and Whispered Words. In: Lopez-Poveda, E., Palmer, A., Meddis, R. (eds) The Neurophysiological Bases of Auditory Perception. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-5686-6_22
Download citation
DOI: https://doi.org/10.1007/978-1-4419-5686-6_22
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-5685-9
Online ISBN: 978-1-4419-5686-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)