Skip to main content

Size Perception for Acoustically Scaled Sounds of Naturally Pronounced and Whispered Words

  • Conference paper
  • First Online:
The Neurophysiological Bases of Auditory Perception
  • 1283 Accesses

Abstract

Recent studies have demonstrated that listeners can extract information about the length of a speaker’s vocal tract from voiced vowels (Smith et al., 2005) and syllables (Ives et al. 2005). Smith and Patterson (2005) have also reported that they can extract size information from unvoiced vowels. In this paper, we extend the observations to words recorded in natural conversation mode to demonstrate that size perception is robust to changes in the mode of vocal excitation. We used a high-quality vocoder, STRAIGHT to produce scaled versions of voiced words with natural F0-contours, and scaled versions of unvoiced words that sounded like they were whispered. Size discrimination performance was measured for five reference speakers, using a two-alternative, forced-choice paradigm (2AFC) with the method of constant stimuli. The listener was asked to choose the interval with the word, or words, spoken by the smaller person. The just-noticeable-difference (JND) discrimination of speaker size was found to be about 5 % independent to the mode of vocal excitation. This value is a little greater than that reported in the previous syllable experiments and a little smaller than that reported in the vowel experiments. Moreover, the natural F0-contour in the voiced words and noise excitation in the whispered words did not affect the JND value. The results support the hypothesis that the auditory system segregates size and shape information at an early stage in the processing (Irino and Patterson, 2002).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Aoki Y, Irino T, Kawahara H, Patterson RD (2008a) “Speaker size discrimination for acoustically scaled versions of naturally spoken words,” in Abstracts of ARO 31th Midwinter meeting, Phoenix, AZ, USA

    Google Scholar 

  • Aoki Y, Irino T, Kawahara H, Patterson RD (2008b) Speaker size discrimination for acoustically scaled versions of whispered words. J Acoust Soc Am 123(5 Pt 2):3718

    Article  Google Scholar 

  • van Dinther R, Patterson RD (2006) Perception of acoustic scale and size in musical instruments sounds. J Acoust Soc Am 120:2158–2177

    Article  PubMed  Google Scholar 

  • Fujimura O, Lindqvist J (1971) Sweep-tone measurements of vocal-tract characteristics. J Acoust Soc Am 49:541–558

    Article  PubMed  Google Scholar 

  • Irino T, Patterson RD (2002) Segregating information about the size and shape of vocal tract using a time-domain auditory model: the stabilized wavelet-Mellin transform. Speech Commun 36:181–203

    Article  Google Scholar 

  • Irino T, Aoki Y, Hayashi Y, Kawahara H, Patterson RD (2007) “Discrimination and recognition of scaled word sounds.” In: Proc Interspeech 2007, pp 378-381, Antwerp, Belgium

    Google Scholar 

  • Ives DT, Smith DRR, Patterson RD (2005) Discrimination of speaker size from syllable phrases. J Acoust Soc Am 118(6):3816–3822

    Article  PubMed  Google Scholar 

  • Kawahara H (2006) STRAIGHT, exploitation of the other aspect of VOCODER: perceptually isomorphic decomposition of speech sounds. Acoust Sci Tech 27(6):349–353

    Article  Google Scholar 

  • Kawahara H, Masuda-Kasuse I, de Cheveigne A (1999) Restructuring speech representations using pitch-adaptive time-frequency smoothing and instantaneous-frequency-based F0 extraction: possible role of repetitive structure in sounds. Speech Commun 27(3-4):187–207

    Article  Google Scholar 

  • Peterson GE, Barney HL (1952) Control methods used in the study of vowels. J Acoust Soc Am 24:175–184

    Article  Google Scholar 

  • Smith DRR, Patterson RD, Turner R, Kawahara H, Irino T (2005) The processing and perception of size information in speech sounds. J Acoust Soc Am 117:305–318

    Article  PubMed  Google Scholar 

  • Smith DRR, Patterson RD (2005) “The perception of scale in whispered vowels,” Meeting of the British Society of Audiology, Cardiff, Wales. http://www.pdn.cam.ac.uk/groups/cnbh/research/posters_talks/BSA2005/SPbsa05.pdf

  • Turner RE, Al-Hames MA, Smith DRR, Kawahara H, Irino T, Patterson RD (2006) Vowel normalization: time-domain processing of the internal dynamics of speech. In: Divenyi P, Greenberg S, Meyer G (eds) Dynamics of speech production and perception, NATO science series, series A: Life sciences. IOS Press, Amsterdam, pp 153–170

    Google Scholar 

  • Turner RE, Walters TC, Mongaghan JJM, Patterson RD (2009) A statistical, formant-pattern model for segregating vowel type and vocal-tract length in developmental formant data, J Acoust Soc Am 125:2374–2386

    Article  PubMed  Google Scholar 

  • Wichmann FA, Hill NJ (2001) The psychometric function: I. Fitting, sampling, and goodness of fit. Perception and Psychophysics, 63:1293–1313

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

This research was supported by the Japan Society for the Promotion of Science Grant-in-Aid (B18300060) and the UK Medical Research Council (G0500221).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Toshio Irino .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this paper

Cite this paper

Irino, T., Aoki, Y., Kawahara, H., Patterson, R.D. (2010). Size Perception for Acoustically Scaled Sounds of Naturally Pronounced and Whispered Words. In: Lopez-Poveda, E., Palmer, A., Meddis, R. (eds) The Neurophysiological Bases of Auditory Perception. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-5686-6_22

Download citation

Publish with us

Policies and ethics