Abstract
In this chapter, we review empirical data and theoretical models which have been put forward in the affective science literature to account for the perception of emotions, when this process is simultaneously accomplished by sight and hearing. The visual component is provided by the face configuration that undergoes some geometric changes, which in turn lead to different and discrete emotion facial expressions. The auditory component is provided by the voice and its changes in pitch, duration, and/or intensity leading to different affective tones of voice. Face–voice integration during emotion perception occurs when affective information conveyed by the two sensory modalities is integrated into a unified percept, or multisensory object. Although one may assume that the rapid and mandatory combination of multiple or complementary affective cues is adaptive (i.e., it likely reduces the effects of adverse factors like drifts or intrinsic noise), the central nervous system must however show some selectivity regarding which inputs from separate senses may eventually combine, as compared with merely redundant emotion signals. Indeed, not all spatial or temporal coincidences or co-occurrences lead to the perception of unified objects. Interestingly, results of behavioral studies confirm this conjecture, and indicate that the combination of emotional facial expressions with affective prosody leads to the creation of genuinely multisensory emotional objects, which show different properties compared to the combination of an emotional facial expression with another redundant or distracting emotional facial expression, or an emotion written word. Hence, the findings and models reviewed in this chapter suggest that some selectivity can be found in the way visual and auditory information is actually combined during emotion perception. The rapid and automatic pairing of an emotional face with an affective voice might present a naturalistic situation in the sense that there is no need for mediation by higher-level cognitive, attentional or linguistic processes, which may be necessary for the efficient decoding of other stimulus categories or multisensory objects.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adolphs, R., Gosselin, F., Buchanan, T. W., Tranel, D., Schyns, P., & Damasio, A. R. (2005). A mechanism for impaired fear recognition after amygdala damage. Nature, 433(7021), 68–72.
Banse, R., & Scherer, K. R. (1996). Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 70(3), 614–636.
Beale, J. M., & Keil, F. C. (1995). Categorical effects in the perception of faces. Cognition, 57(3), 217–239.
Bermant, R. I., & Welch, R. B. (1976). Effect of degree of separation of visual-auditory stimulus and eye position upon spatial interaction of vision and audition. Perceptual and Motor Skills, 43(2), 487–493.
Bertelson, P. (1999). Ventriloquism: A case of crossmodal perceptual grouping. In G. Aschersleben, T. Bachman, & J. Musseler (Eds.), Cognitive contributions to the perception of spatial and temporal events (pp. 347–369). Amsterdam: Elsevier Science.
Bocanegra, B. R., & Zeelenberg, R. (2009a). Dissociating emotion-induced blindness and hypervision. Emotion, 9(6), 865–873.
Bocanegra, B. R., & Zeelenberg, R. (2009b). Emotion improves and impairs early vision. Psychological Science, 20(6), 707–713.
Borod, J. C., Cicero, B. A., Obler, L. K., Welkowitz, J., Erhan, H. M., Santschi, C., et al. (1998). Right hemisphere emotional perception: Evidence across multiple channels. Neuropsychology, 12(3), 446–458.
Borod, J. C., Pick, L. H., Hall, S., Sliwinski, M., Madigan, N., Obler, L. K., et al. (2000). Relationships among facial, prosodic, and lexical channels of emotional perceptual processing. Cognition & Emotion, 14(2), 193–211.
Brosch, T., Pourtois, G., & Sander, D. (2010). The perception and categorisation of emotional stimuli: A review. Cognition & Emotion, 24(3), 377–400.
Burnham, D. (1999). Perceiving talking faces: From speech perception to a behavioral principle. Trends in Cognitive Sciences, 3, 487–488. reviewed by D. Burnham.
Calder, A. J., Lawrence, A. D., & Young, A. W. (2001). Neuropsychology of fear and loathing. Nature reviews, 2(5), 352–363.
Calvert, G. A. (2001). Crossmodal processing in the human brain: Insights from functional neuroimaging studies. Cerebral Cortex, 11(12), 1110–1123.
Calvert, G. A., Brammer, M. J., Bullmore, E. T., Campbell, R., Iversen, S. D., & David, A. S. (1999). Response amplification in sensory-specific cortices during crossmodal binding. Neuroreport, 10(12), 2619–2623.
Calvert, G. A., Bullmore, E. T., Brammer, M. J., Campbell, R., Williams, S. C., McGuire, P. K., et al. (1997). Activation of auditory cortex during silent lipreading. Science, 276(5312), 593–596.
Calvert, G. A., Spence, C., & Stein, B. E. (2004). The handbook of multisensory processes. Cambridge: MIT Press.
Campanella, S., & Belin, P. (2007). Integrating face and voice in person perception. Trends in Cognitive Sciences, 11(12), 535–543.
Campbell, R., Dodd, B., & Burnham, D. (1998). Hearing by eye II: Advances in the psychology of speechreading and audio-visual speech. Hove, UK: Psychology Press.
Cummings, K. E., & Clements, M. A. (1995). Analysis of the glottal excitation of emotionally styled and stressed speech. The Journal of the Acoustical Society of America, 98, 88–98.
Damasio, A. R. (1989). Time-locked multiregional retroactivation: A system-level proposal for the neural substrates of recall and recognition. Cognition, 33, 25–62.
Damasio, A. R. (1994). Descartes’ error: Emotion, reason and the human brain. New York: Putman Books.
Darwin, C. (1871). The descent of man. London: John Murray.
Darwin, C. (1872). The expression of emotions in man and animals. London: John Murray.
de Gelder, B. (2006). Towards the neurobiology of emotional body language. Nature reviews, 7(3), 242–249.
De Gelder, B., & Bertelson, P. (2003). Multisensory integration, perception and ecological validity. Trends in Cognitive Sciences, 7(10), 460–467.
de Gelder, B., Pourtois, G., van Raamsdonk, M., Vroomen, J., & Weiskrantz, L. (2001). Unseen stimuli modulate conscious visual experience: Evidence from inter-hemispheric summation. Neuroreport, 12(2), 385–391.
de Gelder, B., Pourtois, G., Vroomen, J., & Bachoud-Levi, A. C. (2000). Covert processing of faces in prosopagnosia is restricted to facial expressions: Evidence from cross-modal bias. Brain and Cognition, 44(3), 425–444.
de Gelder, B., Pourtois, G., & Weiskrantz, L. (2002). Fear recognition in the voice is modulated by unconsciously recognized facial expressions but not by unconsciously recognized affective pictures. Proceedings of the National Academy of Sciences of the United States of America, 99(6), 4121–4126.
De Gelder, B., & Vroomen, J. (2000). Perceiving emotions by ear and by eye. Cognition & Emotion, 14(289–311).
De Gelder, B., Vroomen, J., & Bertelson, P. (1998). Upright but not inverted faces modify the perception of emotion in the voice. Current Psychology of Cognition, 17, 1021–1031.
de Gelder, B., Vroomen, J., & Pourtois, G. (1999). Seeing cries and hearing smiles. Crossmodal perception of emotional expressions. In G. Aschersleben, T. Bachmann, & J. Müsseler (Eds.), Cognitive contributions to the perception of spatial and temporal events (pp. 425–438). Amsterdam: Elsevier.
de Gelder, B., Vroomen, J., & Pourtois, G. (2004). Multisensory perception of emotion, its time course, and its neural basis. In G. Calvert, C. Spence, & B. E. Stein (Eds.), The handbook of multisensory processes (pp. 581–597). Cambridge, MA: MIT Press.
deGelder, B., Teunisse, J. P., & Benson, P. J. (1997). Categorical perception of facial expressions: Categories and their internal structure. Cognition & Emotion, 11(1), 1–23.
Dodd, B., & Campbell, R. (1987). Hearing by eye: The psychology of lip-reading. Hillsdale, NJ: Lawrence Erlbaum Associates.
Driver, J. (1996). Enhancement of selective listening by illusory mislocation of speech sounds due to lip-reading. Nature, 381(6577), 66–68.
Driver, J., & Spence, C. (1998a). Cross-modal links in spatial attention. Philosophical Transactions of the Royal Society of London, 353(1373), 1319–1331.
Driver, J., & Spence, C. (1998b). Crossmodal attention. Current Opinion in Neurobiology, 8(2), 245–253.
Driver, J., & Spence, C. (2000). Multisensory perception: Beyond modularity and convergence. Current Biology, 10(20), R731–R735.
Ekman, P. (1992). An argument for basic emotions. Cognition & Emotion, 6, 169–200.
Ekman, P., & Friesen, W. V. (1976). Pictures of facial affect. Palo-Alto: Consulting Psychologists Press.
Etcoff, N. L., & Magee, J. J. (1992). Categorical perception of facial expressions. Cognition, 44(3), 227–240.
Ethofer, T., Pourtois, G., & Wildgruber, D. (2006). Investigating audiovisual integration of emotional signals in the human brain. Progress in Brain Research, 156, 345–361.
Farah, M. J., Wong, A. B., Monheit, M. A., & Morrow, L. A. (1989). Parietal lobe mechanisms of spatial attention – Modality-specific or supramodal. Neuropsychologia, 27(4), 461–470.
Fodor, J. (1983). The modularity of mind. Cambridge, MA: MIT Press.
Foxe, J. J., & Molholm, S. (2009). Ten years at the multisensory forum: Musings on the evolution of a field. Brain Topography, 21(3–4), 149–154.
Frick, R. W. (1985). Communicating emotion – The role of prosodic features. Psychological Bulletin, 97(3), 412–429.
Frijda, N. (1989). The emotions. Cambridge: Cambridge University Press.
Fuster, J. M., Bodner, M., & Kroger, J. K. (2000). Cross-modal and cross-temporal association in neurons of frontal cortex. Nature, 405(6784), 347–351.
Giard, M. H., & Peronnet, F. (1999). Auditory-visual integration during multimodal object recognition in humans: A behavioral and electrophysiological study. Journal of Cognitive Neuroscience, 11(5), 473–490.
Haith, M. M., Bergman, T., & Moore, M. J. (1977). Eye contact and face scanning in early infancy. Science, 198, 853–855.
Hay, J. C., Pick, H. L., & Ikeda, K. (1965). Visual capture produced by prism spectacles. Psychonomic Science, 2(8), 215–216.
Held, R. (1965). Plasticity in sensory-motor systems. Scientific American, 213, 84–94.
Kahneman, D. (1973). Attention and effort. Englewood Cliffs, NJ: Prentice Hall.
Lavie, N. (1995). Perceptual load as a necessary condition for selective attention. Journal of Experimental Psychology. Human Perception and Performance, 21(3), 451–468.
Lavie, N. (2005). Distracted and confused?: Selective attention under load. Trends in Cognitive Sciences, 9(2), 75–82.
Lehmann, S., & Murray, M. M. (2005). The role of multisensory memories in unisensory object discrimination. Brain Research, 24(2), 326–334.
Lewkowicz, D. J. (2000). The development of intersensory temporal perception: An epigenetic systems/limitations view. Psychological Bulletin, 126(2), 281–308.
Liberman, A. M., Harris, K. S., Hoffman, H. S., & Griffith, B. C. (1957). The discrimination of speech sounds within and across phoneme boundaries. Journal of Experimental Psychology, 54(5), 358–368.
Lieberman, P., & Michaels, S. B. (1962). Some aspects of fundamental frequency and envelope amplitude as related to emotional content of speech. The Journal of the Acoustical Society of America, 34, 922–927.
Macaluso, E., Frith, C. D., & Driver, J. (2000). Modulation of human visual cortex by crossmodal spatial attention. Science, 289(5482), 1206–1208.
MacLeod, C. M. (1991). Half a century of research on the Stroop effect: An integrative review. Psychological Bulletin, 109, 163–203.
Marois, R., & Ivanoff, J. (2005). Capacity limits of information processing in the brain. Trends in Cognitive Sciences, 9(6), 296–305.
Marzi, C. A., Tassinari, G., Aglioti, S., & Lutzemberger, L. (1986). Spatial summation across the vertical meridian in hemianopics: A test of blindsight. Neuropsychologia, 24(6), 749–758.
Massaro, D. W. (1987). Speech perception by ear and by eye: A paradigm for psychological inquiry. Hillsdale, NJ: Lawrence Erlbaum Associates.
Massaro, D. W. (1998). Perceiving talking faces: From speech perception to a behavioral principle. Cambridge: MIT Press.
Massaro, D. W., & Egan, P. B. (1996). Perceiving affect from the voice and the face. Psychonomic Bulletin & Review, 3, 215–221.
Mcgurk, H., & Macdonald, J. (1976). Hearing lips and seeing voices. Nature, 264(5588), 746–748.
Mckelvie, S. J. (1995). Emotional expression in upside-down faces – Evidence for configurational and componential processing. The British Journal of Social Psychology, 34, 325–334.
Mesulam, M. M. (1998). From sensation to cognition. Brain, 121, 1013–1052.
Miller, J. (1982). Divided attention – Evidence for co-activation with redundant signals. Cognitive Psychology, 14(2), 247–279.
Miller, J. (1986). Timecourse of coactivation in bimodal divided attention. Perception & Psychophysics, 40(5), 331–343.
Miniussi, C., Girelli, M., & Marzi, C. A. (1998). Neural site of the redundant target effect electrophysiological evidence. Journal of Cognitive Neuroscience, 10(2), 216–230.
Molholm, S., Ritter, W., Murray, M. M., Javitt, D. C., Schroeder, C. E., & Foxe, J. J. (2002). Multisensory auditory-visual interactions during early sensory processing in humans: A high-density electrical mapping study. Brain Research, 14(1), 115–128.
Moors, A., & De Houwer, J. (2006). Automaticity: A theoretical and conceptual analysis. Psychological Bulletin, 132, 297–326.
Morris, J. S., Scott, S. K., & Dolan, R. J. (1999). Saying it with feeling: Neural responses to emotional vocalizations. Neuropsychologia, 37(10), 1155–1163.
Murray, I. R., & Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. The Journal of the Acoustical Society of America, 93, 1097–1108.
Murray, M. M., Foxe, J. J., Higgins, B. A., Javitt, D. C., & Schroeder, C. E. (2001). Visuo-spatial neural response interactions in early cortical processing during a simple reaction time task: A high-density electrical mapping study. Neuropsychologia, 39(8), 828–844.
Osullivan, M., Ekman, P., Friesen, W., & Scherer, K. (1985). What you say and how you say it – The contribution of speech content and voice quality to judgments of others. Journal of Personality and Social Psychology, 48(1), 54–62.
Panksepp, J. (2005). Psychology. Beyond a joke: From animal laughter to human joy? Science, 308(5718), 62–63.
Pashler, H. (1994). Dual-task interference in simple tasks: Data and theory. Psychological Bulletin, 116(2), 220–244.
Phelps, E. A., Ling, S., & Carrasco, M. (2006). Emotion facilitates perception and potentiates the perceptual benefits of attention. Psychological Science, 17(4), 292–299.
Pourtois, G., & de Gelder, B. (2002). Semantic factors influence multisensory pairing: A transcranial magnetic stimulation study. Neuroreport, 13(12), 1567–1573.
Pourtois, G., de Gelder, B., Bol, A., & Crommelinck, M. (2005). Perception of facial expressions and voices and of their combination in the human brain. Cortex; a journal devoted to the study of the nervous system and behavior, 41(1), 49–59.
Pourtois, G., de Gelder, B., Vroomen, J., Rossion, B., & Crommelinck, M. (2000). The time-course of intermodal binding between seeing and hearing affective information. Neuroreport, 11(6), 1329–1333.
Pourtois, G., Debatisse, D., Despland, P. A., & de Gelder, B. (2002). Facial expressions modulate the time course of long latency auditory brain potentials. Brain Research, 14(1), 99–105.
Pourtois, G., Grandjean, D., Sander, D., & Vuilleumier, P. (2004). Electrophysiological correlates of rapid spatial orienting towards fearful faces. Cerebral Cortex, 14(6), 619–633.
Raab, D. H. (1962). Statistical facilitation of simple reaction times. Transactions of the New York Academy of Sciences, 24, 574–590.
Sander, D., Grafman, J., & Zalla, T. (2003). The human amygdala: An evolved system for relevance detection. Reviews in the Neurosciences, 14(4), 303–316.
Savazzi, S., & Marzi, C. A. (2002). Speeding up reaction time with invisible stimuli. Current Biology, 12(5), 403–407.
Savazzi, S., & Marzi, C. A. (2004). The superior colliculus subserves interhemispheric neural summation in both normals and patients with a total section or agenesis of the corpus callosum. Neuropsychologia, 42(12), 1608–1618.
Savazzi, S., & Marzi, C. A. (2008). Does the redundant signal effect occur at an early visual stage? Experimental brain research. Experimentelle Hirnforschung. Expérimentation cérébrale, 184(2), 275–281.
Scherer, K. (1989). Vocal measurement of emotion. In R. Plutchik & H. Kellerman (Eds.), Emotion: Theory, research, and experience (Vol. 4, pp. 233–259). San Diego, CA: Academic.
Scherer, K. R., Banse, R., & Wallbott, H. G. (2001). Emotion inferences from vocal expression correlate across languages and cultures. Journal of Cross-Cultural Psychology, 32(1), 76–92.
Schneider, W., & Shiffrin, R. M. (1977). Controlled and automatic human information-processing. 1. Detection, search, and attention. Psychological Review, 84(1), 1–66.
Scott, S. K., Young, A. W., Calder, A. J., Hellawell, D. J., Aggleton, J. P., & Johnson, M. (1997). Impaired auditory recognition of fear and anger following bilateral amygdala lesions. Nature, 385(6613), 254–257.
Sekuler, R., Sekuler, A. B., & Lau, R. (1997). Sound alters visual motion perception. Nature, 385(6614), 308.
Shams, L., Kamitani, Y., & Shimojo, S. (2000). Illusions. What you see is what you hear. Nature, 408(6814), 788.
Smith, M. L., Cottrell, G. W., Gosselin, F., & Schyns, P. G. (2005). Transmitting and decoding facial expressions. Psychological Science, 16(3), 184–189.
Stein, B. E., & Meredith, M. A. (1993). The merging of the senses. Cambridge: Bradford Books.
Talsma, D., Senkowski, D., Soto-Faraco, S., & Woldorff, M. G. (2010). The multifaceted interplay between attention and multisensory integration. Trends in Cognitive Sciences, 14(9), 400–410.
Tanaka, J. W., & Farah, M. J. (1993). Parts and wholes in face recognition. The Quarterly Journal of Experimental Psychology. A, Human Experimental Psychology, 46(2), 225–245.
Turatto, M., Mazza, V., Savazzi, S., & Marzi, C. A. (2004). The role of the magnocellular and parvocellular systems in the redundant target effect. Experimental brain research. Experimentelle Hirnforschung. Expérimentation cérébrale, 158(2), 141–150.
Vroomen, J., Collier, R., & Mozziconacci, S. (1993). Duration and intonation in emotional speech. Proceedings of the Third European Conference on Speech Communication and Technology, Berlin, (pp. 577–580).
Vroomen, J., Driver, J., & de Gelder, B. (2001). Is cross-modal integration of emotional expressions independent of attentional resources? Cognitive, Affective, & Behavioral Neuroscience, 1(4), 382–387.
Vuilleumier, P. (2005). How brains beware: Neural mechanisms of emotional attention. Trends in Cognitive Sciences, 9(12), 585–594.
Vuilleumier, P., & Pourtois, G. (2007). Distributed and interactive brain mechanisms during emotion face perception: Evidence from functional neuroimaging. Neuropsychologia, 45(1), 174–194.
Walker, A., & Grolnick, W. (1983). Discrimination of vocal expressions by young infants. Infant Behavior & Development, 6, 491–498.
Walker-Andrews, A. S. (1997). Infants’ Perception of expressive behaviors: Differentiation of multimodal information. Psychological Bulletin, 121, 437–456.
Weiskrantz, L. (1986). Blindsight. A case study and implications. Oxford: Oxford University Press.
Williams, C. E., & Stevens, K. N. (1972). Emotions and speech – Some acoustical correlates. The Journal of the Acoustical Society of America, 52(4), 1238–1250.
Acknowledgments
Writing and elaboration of this chapter was made possible thanks to the financial support provided by the European Research Council (Starting Grant #200758) and Ghent University (BOF Grant #05Z01708) to G.P. The behavioral results presented in this chapter were already partly introduced and discussed in the unpublished thesis manuscript written up and submitted by Gilles Pourtois (Tilburg University, April 2002).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Pourtois, G., Dhar, M. (2013). Integration of Face and Voice During Emotion Perception: Is There Anything Gained for the Perceptual System Beyond Stimulus Modality Redundancy?. In: Belin, P., Campanella, S., Ethofer, T. (eds) Integrating Face and Voice in Person Perception. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-3585-3_10
Download citation
DOI: https://doi.org/10.1007/978-1-4614-3585-3_10
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-3584-6
Online ISBN: 978-1-4614-3585-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)