Voice Processing and Voice-Identity Recognition

Mathias, Samuel Robert; von Kriegstein, Katharina

doi:10.1007/978-3-030-14832-4_7

Samuel Robert Mathias²¹ &
Katharina von Kriegstein^22,23

Part of the book series: Springer Handbook of Auditory Research ((SHAR,volume 69))

2080 Accesses
2 Citations
8 Altmetric

Abstract

The human voice is the most important sound source in our environment, not only because it produces speech, but also because it conveys information about the speaker. In many situations, listeners understand the speech message and recognize the speaker with minimal effort. Psychophysical studies have investigated which voice qualities (such as vocal timbre) distinguish speakers and allow listeners to recognize speakers. Glottal and vocal tract characteristics strongly influence perceived similarity between speakers and serve as cues for voice-identity recognition. However, the importance of a particular voice quality for voice-identity recognition depends on the speaker and the stimulus. Voice-identity recognition relies on a network of brain regions comprising a core system of auditory regions within the temporal lobe (including regions dedicated to processing glottal and vocal tract characteristics and regions that play more abstract roles) and an extended system of nonauditory regions representing information associated with specific voice identities (e.g., faces and names). This brain network is supported by early, direct connections between the core voice system and an analogous core face system. Precisely how all these brain regions work together to accomplish voice-identity recognition remains an open question; answering it will require rigorous testing of hypotheses derived from theoretical accounts of voice processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

a:: anterior
BOLD:: blood-oxygen-level-dependent
d:: distance measure
FFA:: fusiform face area
fMRI:: functional magnetic resonance imaging
FRU:: facial recognition units
GPR:: glottal-pulse rate
HG:: Hechl’s gyrus
HNR:: harmonics-to-noise ratio
IFG:: inferior frontal gyrus
IPL:: inferior parietal lobe
JND:: just noticeable difference
M:: middle
MEG:: magnetoencephalography
P:: posterior
PIN:: person-identity nodes
PT:: planum temporale
STG:: superior temporal gyrus
STS:: superior temporal sulcus
Th:: perceptual threshold
TVA:: temporal voice areas
VLPFC:: ventrolateral prefrontal cortex
VRU:: voice recognition units
VTL:: vocal-tract length

References

Agus TR, Paquette S, Suied C et al (2017) Voice selectivity in the temporal voice area despite matched low-level acoustic cues. Sci Rep 7(1):11526
Article PubMed PubMed Central CAS Google Scholar
Andics A, Gácsi M, Faragó T et al (2014) Voice-sensitive regions in the dog and human brain are revealed by comparative fMRI. Curr Biol 24(5):574–578
Article CAS PubMed Google Scholar
Baumann O, Belin P (2010) Perceptual scaling of voice identity: common dimensions for different vowels and speakers. Psychol Res 74(1):110–120
Article PubMed Google Scholar
Bartlett FC (1932) Remembering: a study in experimental and social psychology. Cambridge University Press, Cambridge
Google Scholar
Belin P, Bestelmeyer PEG, Latinus M, Watson R (2011) Understanding voice perception. Br J Psychol 102(4):711–725
Article PubMed Google Scholar
Belin P, Zatorre RJ, Ahad P (2002) Human temporal-lobe response to vocal sounds. Brain Res Cogn Brain Res 13(1):17–26
Article PubMed Google Scholar
Belin P, Zatorre RJ (2003) Adaptation to speaker’s voice in right anterior temporal lobe. Neuroreport 14(16):2105–2109
Article PubMed Google Scholar
Belin P, Zatorre RJ, Lafaille P et al (2000) Voice-sensitive areas in human auditory cortex. Nature 403(6767):309–312
Article CAS PubMed Google Scholar
Blank H, Anwander A, von Kriegstein K (2011) Direct structural connections between voice- and face-recognition areas. J Neurosci 31(36):12906–12915
Article CAS PubMed PubMed Central Google Scholar
Blank H, Wieland N, von Kriegstein K (2014) Person recognition and the brain: merging evidence from patients and healthy individuals. Neurosci Biobehav Rev 47:717–734
Article PubMed Google Scholar
Bodamer J (1947) Die Prosop-Agnosie (Prosopagnosia) Archiv für Psychiatrie und Nervenkrankheiten (Archive for Psychiatry and Neurological Diseases) 179(1–2):6–53
Google Scholar
Bruce V, Young A (1986) Understanding face recognition. Br J Psychol 77(3):305–327
Article PubMed Google Scholar
Ellis H, Jones D, Mosdell N (1997) Intra- and inter-modal repetition priming of familiar faces and voices. Br J Psychol 88(1):143–156
Article PubMed Google Scholar
Fecteau S, Armony JL, Joanette Y, Belin P (2004) Is voice processing species-specific in human auditory cortex? An fMRI study. NeuroImage 23(3):840–848
Article PubMed Google Scholar
Fitch WT, Giedd J (1999) Morphology and development of the human vocal tract: a study using magnetic resonance imaging. J Acoust Soc Am 106(3):1511–1522
Article CAS PubMed Google Scholar
Formisano E, De Martino F, Bonte M, Goebel R (2008) “Who” is saying “what”? Brain-based decoding of human voice and speech. Science 322(5903):970–973
Article CAS PubMed Google Scholar
Fouquet M, Pisanski K, Mathevon N, Reby D (2016) Seven and up: individual differences in male voice fundamental frequency emerge before puberty and remain stable throughout adulthood. R Soc Open Sci. https://doi.org/10.1098/rsos.160395
Article PubMed PubMed Central CAS Google Scholar
Frühholz S, Trost W, Kotz SA (2016) The sound of emotions — Towards a unifying neural network perspective of affective sound processing. Neurosci Biobehav Rev 68:96–110
Article PubMed Google Scholar
Gainotti G, Barbier A, Marra C (2003) Slowly progressive defect in recognition of familiar people in a patient with right anterior temporal atrophy. Brain 126(4):792–803
Article PubMed Google Scholar
Garrido L, Eisner F, McGettigan C et al (2009) Developmental phonagnosia: a sensitive deficit of vocal identity recognition. Neuropsychologia 47:123–131
Article PubMed Google Scholar
Gaudrain E, Li S, Ban V, Patterson RD (2009) The role of glottal pulse rate and vocal tract length in the perception of speaker identity. Paper presented at Interspeech 2009: 10th annual conference of the international speech communication association, 1–5, 148–151
Google Scholar
Gilbert HR, Weismer GG (1974) The effects of smoking on the speaking fundamental frequency of adult women. J Psycholinguist Res 3(3):225–231
Article Google Scholar
Gray H (1918) Anatomy of the human body. Lea Febiger, Philadelphia
Book Google Scholar
Griffiths TD, Hall DA (2012) Mapping pitch representation in neural ensembles with fMRI. J Neurosci 32(39):13343–13347
Article CAS PubMed PubMed Central Google Scholar
Hailstone JC, Ridgway GR, Bartlett JW et al (2011) Voice processing in dementia: a neuropsychological and neuroanatomical analysis. Brain 134:2535–2547
Article PubMed PubMed Central Google Scholar
Hautamäki R, Kinnunen T, Hautamäki V, Laukkanen A-M (2015) Automatic versus human speaker verification: the case of voice mimicry. Speech Comm 72:13–31
Article Google Scholar
Haxby JV, Hoffman EA, Gobbini MI (2000) The distributed human neural system for face perception. Trends Cogn Sci 4(6):223–233
Article CAS PubMed Google Scholar
Hickok G, Costanzo M, Capasso R, Miceli G (2011) The role of Broca’s area in speech perception: evidence from aphasia revisited. Brain Lang 119(3):214–220
Article PubMed PubMed Central Google Scholar
Hillenbrand J, Getty LA, Clark MJ, Wheeler K (1995) Acoustic characteristics of American English vowels. J Acoust Soc Am 97(5):3099–3111
Article CAS PubMed Google Scholar
Hillenbrand JM, Clark MJ (2009) The role of f0 and formant frequencies in distinguishing the voices of men and women. Atten Percept Psychophys 71(5):1150–1166
Article PubMed Google Scholar
Hölig C, Föcker J, Best A et al (2017) Activation in the angular gyrus and in the pSTS is modulated by face primes during voice recognition. Hum Brain Mapp 38(5):2553–2565
Article PubMed PubMed Central Google Scholar
Hollien H, Shipp T (1972) Speaking fundamental frequency and chronologic age in males. J Speech Lang Hear Res 15(1):155–159
Article CAS Google Scholar
Jiang J, Liu F, Wan X, Jiang CM (2015) Perception of melodic contour and intonation in autism spectrum disorder: evidence from Mandarin speakers. J Autism Dev Disord 45:2067–2075
Article PubMed Google Scholar
Johnson K (2005) Speaker normalization in speech perception. In: Pisoni DP, Remez RR (eds) The handbook of speech perception. Blackwell Publishing Ltd, Malden, pp 363–389
Chapter Google Scholar
Kanwisher N, Yovel G (2006) The fusiform face area: a cortical region specialized for the perception of faces. Philos Trans R Soc Lond Ser B Biol Sci 361(1476):2109–2128
Article Google Scholar
Kell AJ, Yamins DL, Shook EN et al (2018) A task-optimized neural network replicates human auditory behavior predicts brain responses and reveals a cortical processing hierarchy. Neuron 98:630–644
Article PubMed CAS Google Scholar
Kitaoka N, Enami D, Nakagawa S (2014) Effect of acoustic and linguistic contexts on human and machine speech recognition. Comput Speech Lang 28(3):769–787
Article Google Scholar
Kreiman J, Vanlancker-Sidtis D, Gerratt BR (2005) Perception of voice quality. In: Pisoni DP, Remez RR (eds) The handbook of speech perception. Blackwell Publishing Ltd., Malden, pp 338–362
Chapter Google Scholar
Kreiman J, Gerratt BR (1998) Validity of rating scale measures of voice quality. J Acoust Soc Am 104(3):1598–1608
Article CAS PubMed Google Scholar
Kreitewolf J, Gaudrain E, von Kriegstein K (2014) A neural mechanism for recognizing speech spoken by different speakers. NeuroImage 91:375–385
Article PubMed Google Scholar
Kreitewolf J, Mathias SR, von Kriegstein K (2017) Implicit talker training improves comprehension of auditory speech in noise. Front Psychol. https://doi.org/10.3389/fpsyg.201701584
Künzel HJ (1989) How well does average fundamental frequency correlate with speaker height and weight? Phonetica 46(1–3):117–125
Article PubMed Google Scholar
Latinus M, Belin P (2011) Anti-voice adaptation suggests prototype-based coding of voice identity. Front Psychol 2:175
Google Scholar
Latinus M, McAleer P, Bestelmeyer PEG, Belin P (2013) Norm-based coding of voice identity in human auditory cortex. Curr Biol 23(12):1075–1080
Article CAS PubMed PubMed Central Google Scholar
Laver J (1980) The phonetic description of voice quality. Cambridge University Press, Cambridge
Google Scholar
Lavner Y, Gath I, Rosenhouse J (2000) The effects of acoustic modifications on the identification of familiar voices speaking isolated vowels. Speech Comm 30:9–26
Article Google Scholar
Lavner Y, Rosenhouse J, Gath I (2001) The prototype model in speaker identification by human listeners. Int J Speech Technol 4(1):63–74
Article Google Scholar
López S, Riera P, Assaneo MF et al (2013) Vocal caricatures reveal signatures of speaker identity. Sci Rep. https://doi.org/10.1038/srep03407
Luzzi S, Coccia M, Polonara G et al (2018) Sensitive associative phonagnosia after right anterior temporal stroke. Neuropsychologia 116:154–161. https://doi.org/10.1016/j.neuropsychologia.2017.05.016
Article PubMed Google Scholar
Maguinness C, Roswandowitz C, von Kriegstein K (2018) Understanding the mechanisms of familiar voice-identity recognition in the human brain. Neuropsychologia 166:179–193
Article Google Scholar
Mathias SR, von Kriegstein K (2014) How do we recognise who is speaking. Front Biosci S6:92–109
Article Google Scholar
Mullennix JW, Ross A, Smith C, Kuykendall K, Conrad J, Barb S (2011) Typicality effects on memory for voice: implications for earwitness testimony. Appl Cogn Psychol 25(1):29–34
Article Google Scholar
Murray T, Singh S (1980) Multidimensional analysis of male and female voices. J Acoust Soc Am 68(5):1294–1300
Article Google Scholar
Neuner F, Schweinberger SR (2000) Neuropsychological impairments in the recognition of faces voices and personal names. Brain Cogn 44(3):342–366
Article CAS PubMed Google Scholar
Nosofsky RM (1986) Choice similarity and the context theory of classification. J Exp Psychol Learn Mem Cogn 10:104–114
Article Google Scholar
O’Scalaidhe SP, Wilson FA, Goldman-Rakic PS (1997) Areal segregation of face-processing neurons in prefrontal cortex. Science 278(5340):1135–1138
Article Google Scholar
Petkov CI, Kayser C, Steudel T et al (2008) A voice region in the monkey brain. Nat Neurosci 11(3):367–374
Article CAS PubMed Google Scholar
Pernet CR, McAleer P, Latinus M et al (2015) The human voice areas: spatial organization and inter-individual variability in temporal and extra-temporal cortices. NeuroImage 119:164–174
Article PubMed Google Scholar
Perrodin C, Kayser C, Logothetis NK, Petkov CI (2011) Voice cells in the primate temporal lobe. Curr Biol 21(16):1408–1415
Article CAS PubMed PubMed Central Google Scholar
Peterson GE, Barney HL (1952) Control methods used in a study of the vowels. J Acoust Soc Am 24(4):175–184
Article Google Scholar
Plack CJ, Oxenham AJ (2005) The psychophysics of pitch. In: Plack CJ, Oxenham AJ, Popper AN, Fay RR (eds) Pitch: neural coding and perception. Springer Handbook of Auditory Research, vol 24. Springer, New York, pp 7–55
Chapter Google Scholar
Remez RE, Fellowes JM, Rubin PE (1997) Talker identification based on phonetic information. J Exp Psychol Hum Percept Perform 23(3):651–666
Article CAS PubMed Google Scholar
Romanski LM, Goldman-Rakic PS (2002) An auditory domain in primate prefrontal cortex. Nat Neurosci 5(1):15–16
Article CAS PubMed PubMed Central Google Scholar
Roswandowitz C, Kappes C, Obrig H, von Kriegstein K (2018a) Obligatory and facultative brain regions for voice-identity recognition. Brain 141(1):234–247
Article PubMed Google Scholar
Roswandowitz C, Maguinness C, von Kriegstein K (2018b) Deficits in voice-identity processing: acquired and developmental phonagnosia. In: Frühholz S, Belin P (eds) The oxford handbook of voice perception. Oxford University Press, Oxford
Google Scholar
Roswandowitz C, Mathias SR, Hintz F et al (2014) Two cases of sensitive developmental voice-recognition impairments. Curr Biol 24(19):2348–2353
Article CAS PubMed Google Scholar
Roswandowitz C, Schelinski S, von Kriegstein K (2017) Developmental phonagnosia: linking neural mechanisms with the behavioural phenotype. NeuroImage 155:97–112
Article PubMed Google Scholar
Saslove H, Yarmey AD (1980) Long-term auditory memory: Speaker identification. J Appl Psychol 65(1):111–116
Article CAS PubMed Google Scholar
Schall S, Kiebel SJ, Maess B, von Kriegstein K (2013) Early auditory sensory processing of voices is facilitated by visual mechanisms. NeuroImage 77:237–245
Article PubMed Google Scholar
Schall S, Kiebel SJ, Maess B, von Kriegstein K (2014) Voice identity recognition: functional division of the right STS and its behavioral relevance. J Cogn Neurosci 27(2):280–291
Article Google Scholar
Schall S, Kiebel SJ, Maess B, von Kriegstein K (2015) Voice identity recognition: functional division of the right STS and its behavioral relevance. J Cogn Neurosci 27(2):280–291
Article PubMed Google Scholar
Schelinski S, Roswandowitz C, von Kriegstein K (2017) Voice identity processing in autism spectrum disorder. Autism Res 10(1):155–168
Article PubMed Google Scholar
Sheffert SM, Pisoni DB, Fellowes JM, Remez RE (2002) Learning to recognize talkers from natural sinewave and reversed speech samples. J Exp Psychol Hum Percept Perform 28(6):1447–1469
Article PubMed PubMed Central Google Scholar
Smith DRR, Patterson RD (2005) The interaction of glottal-pulse rate and vocal-tract length in judgements of speaker size, sex, and age. J Acoust Soc Am 118(5):3177–3186
Article PubMed Google Scholar
Smith DRR, Patterson RD, Turner R et al (2005) The processing and perception of size information in speech sounds. J Acoust Soc Am 117(1):305–318
Article PubMed Google Scholar
Stevenage SV, Clarke G, McNeill A (2012) The “other-accent” effect in voice recognition. J Cogn Psychol 24(6):647–653
Article Google Scholar
Stoicheff ML (1981) Speaking fundamental frequency characteristics of nonsmoking female adults. J Speech Lang Hear Res 24(3):437–441
Article CAS Google Scholar
Sugihara T, Diltz MD, Averbeck BB, Romanski LM (2006) Integration of auditory and visual communication information in the primate ventrolateral prefrontal cortex. J Neurosci 26(43):11138–11147
Article CAS PubMed PubMed Central Google Scholar
Talavage TM, Johnsrude IS, Gonzalez-Castillo J (2012) In: Poeppel D, Overath T, Popper AN, Fay RR (eds) The human auditory cortex. Springer handbook of auditory research, vol 43. Springer, New York, pp 129–164
Google Scholar
Titze I (1989) Physiologic and acoustic differences between male and female voices. J Acoust Soc Am 85(4):1699–1707
Article CAS PubMed Google Scholar
van Lancker D, Kreiman J, Emmorey K (1985) Familiar voice recognition: patterns and parameters. Part I Recognition of backward voices. J Phon 13:19–38
Google Scholar
van Lancker DR, Canter GJ (1982) Impairment of voice and face recognition in patients with hemispheric damage. Brain Cogn 1:185–195
Article PubMed Google Scholar
van Lancker DR, Kreiman J, Cummings J (1989) Voice perception deficits: neuroanatomical correlates of phonagnosia. J Clin Exp Neuropsychol 11(5):665–674
Article PubMed Google Scholar
von Kriegstein K (2011) A multisensory perspective on human auditory communication. In: Murray MM, Wallace MT (eds) The neural bases of multisensory processes. CRC Press, Boca Raton, pp 683–700
Chapter Google Scholar
von Kriegstein K, Dogan O, Grüter M et al (2008) Simulation of talking faces in the human brain improves auditory speech recognition. Proc Natl Acad Sci U S A 105(18):6747–6752
Google Scholar
von Kriegstein K, Kleinschmidt A, Giraud A (2006) Voice recognition and cross-modal responses to familiar speakers’ voices in prosopagnosia. Cereb Cortex 16(9):1314–1322
Google Scholar
von Kriegstein K, Eger E, Kleinschmidt A, Giraud A-L (2003) Modulation of neural responses to speech by directing attention to voices or verbal content. Cogn Brain Res 17(1):48–55
Article Google Scholar
von Kriegstein K, Giraud A-L (2004) Distinct functional substrates along the right superior temporal sulcus for the processing of voices. NeuroImage 22(2):948–955
Article PubMed Google Scholar
von Kriegstein K, Giraud A-L (2006) Implicit multisensory associations influence voice recognition. PLoS Biol 4(10). https://doi.org/10.1371/journal.pbio.0040326
Article PubMed PubMed Central CAS Google Scholar
von Kriegstein K, Kleinschmidt A, Sterzer P, Giraud A-L (2005) Interaction of face and voice areas during speaker recognition. J Cogn Neurosci 17(3):367–376
Article Google Scholar
von Kriegstein K, Kleinschmidt A, Giraud A (2006) Voice recognition and cross-modal responses to familiar speakers’ voices in prosopagnosia. Cereb Cortex 16(9):1314–1322
Google Scholar
von Kriegstein K, Smith DRR, Patterson RD et al (2007) Neural representation of auditory size in the human voice and in sounds from other resonant sources. Curr Biol 17(13):1123–1128
Article CAS Google Scholar
von Kriegstein K, Smith DRR, Patterson RD et al (2010) How the human brain recognizes speech in the context of changing speakers. J Neurosci 30(2):629–638
Article CAS Google Scholar
Wester M (2012) Talker discrimination across languages. Speech Comm 54:781–790
Article Google Scholar
Wilding J, Cook S (2000) Sex differences and individual consistency in voice identification. Percept Mot Skills 91(2):535–538
Article CAS PubMed Google Scholar
Xu X, Biederman I, Shilowich BE et al (2015) Developmental phonagnosia: Neural correlates and a behavioral marker. Brain Lang 149:106–117
Article PubMed Google Scholar
Yarmey AD (2007) The psychology of speaker identification and earwitness memory. In: Lindsay RCL, Ross DF, Read JD, Toglia MP (eds) The handbook of eyewitness psychology vol II: memory for people. Lawrence Erlbaum Associates, Mahwah, pp 101–136
Google Scholar
Zäske R, Hasan BAS, Belin P (2017) It doesn’t matter what you say: fMRI correlates of voice learning and recognition independent of speech content. Cortex 94:100–112
Article PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

Neurocognition, Neurocomputation and Neurogenetics Division, Yale University School of Medicine, New Haven, CT, USA
Samuel Robert Mathias
Technische Universität Dresden, Dresden, Germany
Katharina von Kriegstein
Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
Katharina von Kriegstein

Authors

Samuel Robert Mathias
View author publications
You can also search for this author in PubMed Google Scholar
Katharina von Kriegstein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samuel Robert Mathias .

Editor information

Editors and Affiliations

Department of Medical Physics and Acoustics, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
Kai Siedenburg
Audio Communication Group, Technische Universität Berlin, Berlin, Germany
Charalampos Saitis
Schulich School of Music, McGill University, Montreal, QC, Canada
Stephen McAdams
Department of Biology, University of Maryland, Collage Park, MD, USA
Arthur N. Popper
Department of Psychology, Loyola University Chicago, Chicago, IL, USA
Richard R. Fay

Compliance with Ethics Statements

Samuel Robert Mathias declares that he has no conflict of interest.

Katharina von Kriegstein declares that she has no conflict of interest.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mathias, S.R., von Kriegstein, K. (2019). Voice Processing and Voice-Identity Recognition. In: Siedenburg, K., Saitis, C., McAdams, S., Popper, A., Fay, R. (eds) Timbre: Acoustics, Perception, and Cognition. Springer Handbook of Auditory Research, vol 69. Springer, Cham. https://doi.org/10.1007/978-3-030-14832-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-14832-4_7
Published: 08 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14831-7
Online ISBN: 978-3-030-14832-4
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics

Voice Processing and Voice-Identity Recognition

Abstract

Access this chapter

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Compliance with Ethics Statements

Compliance with Ethics Statements

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation