Size Perception for Acoustically Scaled Sounds of Naturally Pronounced and Whispered Words

Irino, Toshio; Aoki, Yoshie; Kawahara, Hideki; Patterson, Roy D.

doi:10.1007/978-1-4419-5686-6_22

Toshio Irino⁴,
Yoshie Aoki,
Hideki Kawahara &
…
Roy D. Patterson

1283 Accesses

Abstract

Recent studies have demonstrated that listeners can extract information about the length of a speaker’s vocal tract from voiced vowels (Smith et al., 2005) and syllables (Ives et al. 2005). Smith and Patterson (2005) have also reported that they can extract size information from unvoiced vowels. In this paper, we extend the observations to words recorded in natural conversation mode to demonstrate that size perception is robust to changes in the mode of vocal excitation. We used a high-quality vocoder, STRAIGHT to produce scaled versions of voiced words with natural F0-contours, and scaled versions of unvoiced words that sounded like they were whispered. Size discrimination performance was measured for five reference speakers, using a two-alternative, forced-choice paradigm (2AFC) with the method of constant stimuli. The listener was asked to choose the interval with the word, or words, spoken by the smaller person. The just-noticeable-difference (JND) discrimination of speaker size was found to be about 5 % independent to the mode of vocal excitation. This value is a little greater than that reported in the previous syllable experiments and a little smaller than that reported in the vowel experiments. Moreover, the natural F0-contour in the voiced words and noise excitation in the whispered words did not affect the JND value. The results support the hypothesis that the auditory system segregates size and shape information at an early stage in the processing (Irino and Patterson, 2002).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aoki Y, Irino T, Kawahara H, Patterson RD (2008a) “Speaker size discrimination for acoustically scaled versions of naturally spoken words,” in Abstracts of ARO 31th Midwinter meeting, Phoenix, AZ, USA
Google Scholar
Aoki Y, Irino T, Kawahara H, Patterson RD (2008b) Speaker size discrimination for acoustically scaled versions of whispered words. J Acoust Soc Am 123(5 Pt 2):3718
Article Google Scholar
van Dinther R, Patterson RD (2006) Perception of acoustic scale and size in musical instruments sounds. J Acoust Soc Am 120:2158–2177
Article PubMed Google Scholar
Fujimura O, Lindqvist J (1971) Sweep-tone measurements of vocal-tract characteristics. J Acoust Soc Am 49:541–558
Article PubMed Google Scholar
Irino T, Patterson RD (2002) Segregating information about the size and shape of vocal tract using a time-domain auditory model: the stabilized wavelet-Mellin transform. Speech Commun 36:181–203
Article Google Scholar
Irino T, Aoki Y, Hayashi Y, Kawahara H, Patterson RD (2007) “Discrimination and recognition of scaled word sounds.” In: Proc Interspeech 2007, pp 378-381, Antwerp, Belgium
Google Scholar
Ives DT, Smith DRR, Patterson RD (2005) Discrimination of speaker size from syllable phrases. J Acoust Soc Am 118(6):3816–3822
Article PubMed Google Scholar
Kawahara H (2006) STRAIGHT, exploitation of the other aspect of VOCODER: perceptually isomorphic decomposition of speech sounds. Acoust Sci Tech 27(6):349–353
Article Google Scholar
Kawahara H, Masuda-Kasuse I, de Cheveigne A (1999) Restructuring speech representations using pitch-adaptive time-frequency smoothing and instantaneous-frequency-based F0 extraction: possible role of repetitive structure in sounds. Speech Commun 27(3-4):187–207
Article Google Scholar
Peterson GE, Barney HL (1952) Control methods used in the study of vowels. J Acoust Soc Am 24:175–184
Article Google Scholar
Smith DRR, Patterson RD, Turner R, Kawahara H, Irino T (2005) The processing and perception of size information in speech sounds. J Acoust Soc Am 117:305–318
Article PubMed Google Scholar
Smith DRR, Patterson RD (2005) “The perception of scale in whispered vowels,” Meeting of the British Society of Audiology, Cardiff, Wales. http://www.pdn.cam.ac.uk/groups/cnbh/research/posters_talks/BSA2005/SPbsa05.pdf
Turner RE, Al-Hames MA, Smith DRR, Kawahara H, Irino T, Patterson RD (2006) Vowel normalization: time-domain processing of the internal dynamics of speech. In: Divenyi P, Greenberg S, Meyer G (eds) Dynamics of speech production and perception, NATO science series, series A: Life sciences. IOS Press, Amsterdam, pp 153–170
Google Scholar
Turner RE, Walters TC, Mongaghan JJM, Patterson RD (2009) A statistical, formant-pattern model for segregating vowel type and vocal-tract length in developmental formant data, J Acoust Soc Am 125:2374–2386
Article PubMed Google Scholar
Wichmann FA, Hill NJ (2001) The psychometric function: I. Fitting, sampling, and goodness of fit. Perception and Psychophysics, 63:1293–1313
Article PubMed CAS Google Scholar

Download references

Acknowledgments

This research was supported by the Japan Society for the Promotion of Science Grant-in-Aid (B18300060) and the UK Medical Research Council (G0500221).

Author information

Authors and Affiliations

Faculty of Systems Engineering, Wakayama University, 930 Sakaedani, Wakayama, 640-8510, Japan
Toshio Irino

Authors

Toshio Irino
View author publications
You can also search for this author in PubMed Google Scholar
Yoshie Aoki
View author publications
You can also search for this author in PubMed Google Scholar
Hideki Kawahara
View author publications
You can also search for this author in PubMed Google Scholar
Roy D. Patterson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Toshio Irino .

Editor information

Editors and Affiliations

Inst. Neurociencias de Castilla y León, Universidad de Salamanca, Av. Alfonso X El Sabio s/n, Salamanca, 37007, Spain
Enrique A. Lopez-Poveda
MRC Inst.of Hearing Research, University Park, Nottingham, NG7 2RD, United Kingdom
Alan R. Palmer
University of Essex, Wivenhoe Park, Colchester, Essex, CO4 3SQ, United Kingdom
Ray Meddis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Irino, T., Aoki, Y., Kawahara, H., Patterson, R.D. (2010). Size Perception for Acoustically Scaled Sounds of Naturally Pronounced and Whispered Words. In: Lopez-Poveda, E., Palmer, A., Meddis, R. (eds) The Neurophysiological Bases of Auditory Perception. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-5686-6_22

Download citation

DOI: https://doi.org/10.1007/978-1-4419-5686-6_22
Published: 16 February 2010
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-5685-9
Online ISBN: 978-1-4419-5686-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics