Multisensory Integration in Speech Processing: Neural Mechanisms of Cross-Modal Aftereffects

  • Niclas Kilian-Hütten
  • Elia Formisano
  • Jean Vroomen
Part of the Innovations in Cognitive Neuroscience book series (Innovations Cogn.Neuroscience)


Traditionally, perceptual neuroscience has focused on unimodal information processing. This is true also for investigations of speech processing, where the auditory modality was the natural focus of interest. Given the complexity of neuronal processing, this was a logical step, considering that the field was still in its infancy. However, it is clear that this restriction does not do justice to the way we perceive the world around us in everyday interactions. Very rarely is sensory information confined to one modality. Instead, we are constantly confronted with a stream of input to several or all senses and already in infancy, we match facial movements with their corresponding sounds (Campbell et al. 2001; Kuhl and Meltzoff 1982). Moreover, the information that is processed by our individual senses does not stay separated. Rather, the different channels interact and influence each other, affecting perceptual interpretations and constructions (Calvert 2001). Consequently, in the last 15–20 years, the perspective in cognitive science and perceptual neuroscience has shifted to include investigations of such multimodal integrative phenomena. Facilitating cross-modal effects have consistently been demonstrated behaviorally (Shimojo and Shams 2001). When multisensory input is congruent (e.g., semantically and/or temporally) it typically lowers detection thresholds (Frassinetti et al. 2002), shortens reaction times (Forster et al. 2002; Schröger and Widmann 1998), and decreases saccadic eye movement latencies (Hughes et al. 1994) as compared to unimodal exposure. When incongruent input is (artificially) added in a second modality, this usually has opposite consequences (Sekuler et al. 1997).


  1. Alais, D., & Burr, D. (2004). The ventriloquist effect results from near-optimal bimodal integration. Current Biology, 14, 257–262.PubMedCrossRefGoogle Scholar
  2. Anstis, S., Verstraten, F. A., & Mather, G. (1998). The motion aftereffect. Trends in Cognitive Sciences, 2, 111–117.PubMedCrossRefGoogle Scholar
  3. Arnal, L. H., & Giraud, A. L. (2012). Cortical oscillations and sensory predictions. Trends in Cognitive Sciences, 16, 390–398.PubMedCrossRefGoogle Scholar
  4. Arnal, L. H., Wyart, V., & Giraud, A.-L. (2011). Transitions in neural oscillations reflect prediction errors generated in audiovisual speech. Nature Neuroscience, 14, 797–801.PubMedCrossRefGoogle Scholar
  5. Beauchamp, M. S. (2005). Statistical criteria in FMRI studies of multisensory integration. Neuroinformatics, 3, 93–113.PubMedPubMedCentralCrossRefGoogle Scholar
  6. Beauchamp, M. S., Argall, B. D., Bodurka, J., Duyn, J. H., & Martin, A. (2004a). Unraveling multisensory integration: Patchy organization within human STS multisensory cortex. Nature Neuroscience, 7, 1190–1192.PubMedCrossRefGoogle Scholar
  7. Beauchamp, M. S., Lee, K. E., Argall, B. D., & Martin, A. (2004b). Integration of auditory and visual information about objects in superior temporal sulcus. Neuron, 41, 809–823.PubMedCrossRefGoogle Scholar
  8. Beauchamp, M. S., Nath, A. R., & Pasalar, S. (2010). fMRI-guided transcranial magnetic stimulation reveals that the superior temporal sulcus is a cortical locus of the McGurk effect. Journal of Neuroscience, 30, 2414–2417.PubMedPubMedCentralCrossRefGoogle Scholar
  9. Bermant, R. I., & Welch, R. B. (1976). Effect of degree of separation of visual-auditory stimulus and eye position upon spatial interaction of vision and audition. Perceptual and Motor Skills, 43, 487–493.CrossRefGoogle Scholar
  10. Bertelson, P., & Aschersleben, G. (1998). Automatic visual bias of perceived auditory location. Psychonomic Bulletin & Review, 5, 482–489.CrossRefGoogle Scholar
  11. Bertelson, P., Frissen, I., Vroomen, J., & De Gelder, B. (2006). The aftereffects of ventriloquism: Patterns of spatial generalization. Attention, Perception, & Psychophysics, 68, 428–436.CrossRefGoogle Scholar
  12. Bertelson, P., & Radeau, M. (1981). Cross-modal bias and perceptual fusion with auditory-visual spatial discordance. Attention, Perception, & Psychophysics, 29, 578–584.CrossRefGoogle Scholar
  13. Bertelson, P., Vroomen, J. & De Gelder, B. (2003). Visual recalibration of auditory speech identification: A McGurk aftereffect. Psychological Sciences, 14, 592–597.Google Scholar
  14. Besle, J., Bertrand, O., & Giard, M.-H. (2009). Electrophysiological (EEG, sEEG, MEG) evidence for multiple audiovisual interactions in the human auditory cortex. Hearing Research, 258, 143–151.PubMedCrossRefGoogle Scholar
  15. Besle, J., Fischer, C., Bidet-Caulet, A., Lecaignard, F., Bertrand, O., & Giard, M.-H. (2008). Visual activation and audiovisual interactions in the auditory cortex during speech perception: Intracranial recordings in humans. Journal of Neuroscience, 28, 14301–14310.PubMedCrossRefGoogle Scholar
  16. Besle, J., Fort, A., Delpuech, C., & Giard, M. H. (2004). Bimodal speech: Early suppressive visual effects in human auditory cortex. European Journal of Neuroscience, 20, 2225–2234.PubMedPubMedCentralCrossRefGoogle Scholar
  17. Blank, H., Anwander, A., & von Kriegstein, K. (2011). Direct structural connections between voice-and face-recognition areas. Journal of Neuroscience, 31, 12906–12915.PubMedCrossRefGoogle Scholar
  18. Callan, D. E., Callan, A. M., Kroos, C., & Vatikiotis-Bateson, E. (2001). Multimodal contribution to speech perception revealed by independent component analysis: A single-sweep EEG case study. Cognitive Brain Research, 10, 349–353.PubMedCrossRefGoogle Scholar
  19. Callan, D. E., Jones, J. A., Munhall, K., Callan, A. M., Kroos, C., & Vatikiotis-Bateson, E. (2003). Neural processes underlying perceptual enhancement by visual speech gestures. Neuroreport, 14, 2213–2218.PubMedCrossRefGoogle Scholar
  20. Callan, D. E., Jones, J. A., Munhall, K., Kroos, C., Callan, A. M., & Vatikiotis-Bateson, E. (2004). Multisensory integration sites identified by perception of spatial wavelet filtered visual speech gesture information. Journal of Cognitive Neuroscience, 16, 805–816.PubMedCrossRefGoogle Scholar
  21. Calvert, G., Spence, C., & Stein, B. E. (Eds.) (2004). The handbook of multisensory processes. Cambridge, MA: MIT Press.Google Scholar
  22. Calvert, G. A. (2001). Crossmodal processing in the human brain: Insights from functional neuroimaging studies. Cerebral Cortex, 11, 1110–1123.PubMedCrossRefGoogle Scholar
  23. Calvert, G. A., Brammer, M. J., Bullmore, E. T., Campbell, R., Iversen, S. D., & David, A. S. (1999). Response amplification in sensory-specific cortices during crossmodal binding. Neuroreport, 10, 2619–2623.PubMedCrossRefGoogle Scholar
  24. Calvert, G. A., Campbell, R., & Brammer, M. J. (2000). Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Current Biology, 10, 649–657.PubMedCrossRefGoogle Scholar
  25. Calvert, G. A., Bullmore, E. T., Brammer, M. J., Campbell, R., Williams, S. C., McGuire, P. K., et al. (1997). Activation of auditory cortex during silent lipreading. Science, 276, 593–596.PubMedCrossRefGoogle Scholar
  26. Campbell, R., & Capek, C. (2008). Seeing speech and seeing sign: Insights from a fMRI study. International Journal of Audiology, 47, S3–S9.PubMedCrossRefGoogle Scholar
  27. Campbell, R., MacSweeney, M., Surguladze, S., Calvert, G., McGuire, P., Suckling, J., et al. (2001). Cortical substrates for the perception of face actions: An fMRI study of the specificity of activation for seen speech and for meaningless lower-face acts (gurning). Cognitive Brain Research, 12, 233–243.PubMedCrossRefGoogle Scholar
  28. Capek, C. M., Bavelier, D., Corina, D., Newman, A. J., Jezzard, P., & Neville, H. J. (2004). The cortical organization of audio-visual sentence comprehension: An fMRI study at 4 Tesla. Cognitive Brain Research, 20, 111–119.PubMedCrossRefGoogle Scholar
  29. Cappe, C., Rouiller, E. M., & Barone, P. (2009). Multisensory anatomical pathways. Hearing Research, 258, 28–36.PubMedCrossRefGoogle Scholar
  30. Cappe, C., Thut, G., Romei, V., & Murray, M. M. (2010). Auditory–visual multisensory interactions in humans: Timing, topography, directionality, and sources. Journal of Neuroscience, 30, 12572–12580.PubMedCrossRefGoogle Scholar
  31. Clavagnier, S., Falchier, A., & Kennedy, H. (2004). Long-distance feedback projections to area V1: Implications for multisensory integration, spatial awareness, and visual consciousness. Cognitive, Affective, & Behavioral Neuroscience, 4, 117–126.CrossRefGoogle Scholar
  32. Clos, M., Langner, R., Meyer, M., Oechslin, M. S., Zilles, K., & Eickhoff, S. B. (2012). Effects of prior information on decoding degraded speech: An fMRI study. Human Brain Mapping, 35, 61–74.Google Scholar
  33. Diehl, R. L. (1981). Feature detectors for speech: A critical reappraisal. Psychological Bulletin, 89, 1.PubMedCrossRefGoogle Scholar
  34. Diehl, R. L., Elman, J. L., & McCusker, S. B. (1978). Contrast effects on stop consonant identification. Journal of Experimental Psychology: Human Perception and Performance, 4, 599.PubMedGoogle Scholar
  35. Driver, J., & Noesselt, T. (2008). Multisensory interplay reveals crossmodal influences on ‘sensory-specific’ brain regions, neural responses, and judgments. Neuron, 57, 11–23.PubMedPubMedCentralCrossRefGoogle Scholar
  36. Eimas, P. D., & Corbit, J. D. (1973). Selective adaptation of linguistic feature detectors. Cognitive Psychology, 4, 99–109.CrossRefGoogle Scholar
  37. Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415, 429–433.PubMedCrossRefGoogle Scholar
  38. Falchier, A., Clavagnier, S., Barone, P., & Kennedy, H. (2002). Anatomical evidence of multimodal integration in primate striate cortex. Journal of Neuroscience, 22, 5749–5759.PubMedGoogle Scholar
  39. Falchier, A., Schroeder, C. E., Hackett, T. A., Lakatos, P., Nascimento-Silva, S., Ulbert, I., et al. (2010). Projection from visual areas V2 and prostriata to caudal auditory cortex in the monkey. Cerebral Cortex, 20, 1529–1538.PubMedCrossRefGoogle Scholar
  40. Fetsch, C. R., Pouget, A., DeAngelis, G. C., & Angelaki, D. E. (2012). Neural correlates of reliability-based cue weighting during multisensory integration. Nature Neuroscience, 15, 146–154.CrossRefGoogle Scholar
  41. Formisano, E., De Martino, F., Bonte, M., & Goebel, R. (2008). “Who” is saying “what”? Brain-based decoding of human voice and speech. Science, 322, 970–973.PubMedCrossRefGoogle Scholar
  42. Forster, B., Cavina-Pratesi, C., Aglioti, S. M., & Berlucchi, G. (2002). Redundant target effect and intersensory facilitation from visual-tactile interactions in simple reaction time. Experimental Brain Research, 143, 480–487.PubMedCrossRefGoogle Scholar
  43. Frassinetti, F., Bolognini, N., & Làdavas, E. (2002). Enhancement of visual perception by crossmodal visuo-auditory interaction. Experimental Brain Research, 147, 332–343.PubMedCrossRefGoogle Scholar
  44. Friston, K. (2005). A theory of cortical responses. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 360, 815–836.PubMedPubMedCentralCrossRefGoogle Scholar
  45. Ghazanfar, A. A. (2012). Unity of the senses for primate vocal communication. In M. T. Murray & M. M. Wallace (Eds.), The neural bases of multisensory processes (pp. 653–666). Boca Raton: CRC.Google Scholar
  46. Ghazanfar, A. A., Chandrasekaran, C., & Logothetis, N. K. (2008). Interactions between the superior temporal sulcus and auditory cortex mediate dynamic face/voice integration in rhesus monkeys. Journal of Neuroscience, 28, 4457–4469.PubMedPubMedCentralCrossRefGoogle Scholar
  47. Ghazanfar, A. A., Maier, J. X., Hoffman, K. L., & Logothetis, N. K. (2005). Multisensory integration of dynamic faces and voices in rhesus monkey auditory cortex. Journal of Neuroscience, 25, 5004–5012.PubMedCrossRefGoogle Scholar
  48. Ghazanfar, A. A., & Schroeder, C. E. (2006). Is neocortex essentially multisensory? Trends in Cognitive Sciences, 10, 278–285.PubMedCrossRefGoogle Scholar
  49. Giard, M. H., & Peronnet, F. (1999). Auditory-visual integration during multimodal object recognition in humans: A behavioral and electrophysiological study. Journal of Cognitive Neuroscience, 11, 473–490.PubMedCrossRefGoogle Scholar
  50. Gibson, J. J., & Radner, M. (1937). Adaptation, after-effect and contrast in the perception of tilted lines. I. Quantitative studies. Journal of Experimental Psychology, 20, 453.CrossRefGoogle Scholar
  51. Gilbert, C. D., Sigman, M., & Crist, R. E. (2001). The neural basis of perceptual learning. Neuron, 31, 681–697.PubMedCrossRefGoogle Scholar
  52. Goebel, R., Esposito, F., & Formisano, E. (2006). Analysis of functional image analysis contest (FIAC) data with brainvoyager QX: From single-subject to cortically aligned group general linear model analysis and self-organizing group independent component analysis. Human Brain Mapping, 27, 392–401.PubMedCrossRefGoogle Scholar
  53. Gondan, M., & Röder, B. (2006). A new method for detecting interactions between the senses in event-related potentials. Brain Research, 1073, 389–397.PubMedCrossRefGoogle Scholar
  54. Hackett, T. A., De La Mothe, L. A., Ulbert, I., Karmos, G., Smiley, J., & Schroeder, C. E. (2007). Multisensory convergence in auditory cortex. II. Thalamocortical connections of the caudal superior temporal plane. Journal of Comparative Neurology, 502, 924–952.PubMedCrossRefGoogle Scholar
  55. Hall, D. A., Fussell, C., & Summerfield, A. Q. (2005). Reading fluent speech from talking faces: Typical brain networks and individual differences. Journal of Cognitive Neuroscience, 17, 939–953.PubMedCrossRefGoogle Scholar
  56. Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., & Pietrini, P. (2001). Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293, 2425–2430.PubMedCrossRefGoogle Scholar
  57. Haynes, J.-D., & Rees, G. (2005a). Predicting the orientation of invisible stimuli from activity in human primary visual cortex. Nature Neuroscience, 8, 686–691.PubMedCrossRefGoogle Scholar
  58. Haynes, J.-D., & Rees, G. (2005b). Predicting the stream of consciousness from activity in human visual cortex. Current Biology, 15, 1301–1307.PubMedCrossRefGoogle Scholar
  59. Hughes, H. C., Reuter-Lorenz, P. A., Nozawa, G., & Fendrich, R. (1994). Visual-auditory interactions in sensorimotor processing: Saccades versus manual responses. Journal of Experimental Psychology: Human Perception and Performance, 20, 131–153.PubMedGoogle Scholar
  60. Jiang, W., Wallace, M. T., Jiang, H., Vaughan, J. W., & Stein, B. E. (2001). Two cortical areas mediate multisensory integration in superior colliculus neurons. Journal of Neurophysiology, 85, 506–522.PubMedGoogle Scholar
  61. Kayser, C., Petkov, C. I., Augath, M., & Logothetis, N. K. (2005). Integration of touch and sound in auditory cortex. Neuron, 48, 373–384.PubMedCrossRefGoogle Scholar
  62. Kayser, J., Tenke, C. E., Gates, N. A., & Bruder, G. E. (2007). Reference-independent ERP old/new effects of auditory and visual word recognition memory: Joint extraction of stimulus-and response-locked neuronal generator patterns. Psychophysiology, 44, 949–967.PubMedCrossRefGoogle Scholar
  63. Kilian-Hütten, N., Valente, G., Vroomen, J., & Formisano, E. (2011a). Auditory cortex encodes the perceptual interpretation of ambiguous sound. Journal of Neuroscience, 31, 1715–1720.PubMedCrossRefGoogle Scholar
  64. Kilian-Hütten, N., Vroomen, J., & Formisano, E. (2011b). Brain activation during audiovisual exposure anticipates future perception of ambiguous speech. Neuroimage, 57, 1601–1607.PubMedCrossRefGoogle Scholar
  65. Kleinschmidt, D., & Jaeger, T. F. (2011). A Bayesian belief updating model of phonetic recalibration and selective adaptation. In Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics (pp. 10–19). Association for Computational Linguistics.Google Scholar
  66. Klemm, O. (1909). Lokalisation von Sinneseindrücken bei disparaten Nebenreizen. [Localization of sensory impressions with disparate distractors]. Psychologische Studien (Wundt) 5, 73–161.Google Scholar
  67. Kuhl, P. K., & Meltzoff, A. N. (1982). The bimodal perception of speech in infancy. Washington, DC: American Association for the Advancement of Science.Google Scholar
  68. Lakatos, P., Chen, C.-M., O’Connell, M. N., Mills, A., & Schroeder, C. E. (2007). Neuronal oscillations and multisensory interaction in primary auditory cortex. Neuron, 53, 279–292.PubMedPubMedCentralCrossRefGoogle Scholar
  69. Laurienti, P. J., Perrault, T. J., Stanford, T. R., Wallace, M. T., & Stein, B. E. (2005). On the use of superadditivity as a metric for characterizing multisensory integration in functional neuroimaging studies. Experimental Brain Research, 166, 289–297.PubMedCrossRefGoogle Scholar
  70. Ludman, C., Summerfield, A. Q., Hall, D., Elliott, M., Foster, J., Hykin, J. L., et al. (2000). Lip-reading ability and patterns of cortical activation studied using fMRI. British Journal of Audiology, 34, 225–230.PubMedCrossRefGoogle Scholar
  71. Ma, W. J., Zhou, X., Ross, L. A., Foxe, J. J., & Parra, L. C. (2009). Lip-reading aids word recognition most in moderate noise: A Bayesian explanation using high-dimensional feature space. PLoS One, 4, e4638.PubMedPubMedCentralCrossRefGoogle Scholar
  72. Macleod, A., & Summerfield, Q. (1990). A procedure for measuring auditory and audiovisual speech-reception thresholds for sentences in noise: Rationale, evaluation, and recommendations for use. British Journal of Audiology, 24, 29–43.PubMedCrossRefGoogle Scholar
  73. MacSweeney, M., Woll, B., Campbell, R., McGuire, P. K., David, A. S., Williams, S. C., et al. (2002). Neural systems underlying British Sign Language and audio-visual English processing in native users. Brain, 125, 1583–1593.PubMedCrossRefGoogle Scholar
  74. McGettigan, C., Faulkner, A., Altarelli, I., Obleser, J., Baverstock, H., & Scott, S. K. (2012). Speech comprehension aided by multiple modalities: Behavioural and neural interactions. Neuropsychologia, 50, 762–776.PubMedPubMedCentralCrossRefGoogle Scholar
  75. McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746–748Google Scholar
  76. Middelweerd, M., & Plomp, R. (1987). The effect of speechreading on the speech-reception threshold of sentences in noise. The Journal of the Acoustical Society of America, 82, 2145–2147.PubMedCrossRefGoogle Scholar
  77. Miller, L. M., & D’esposito, M. (2005). Perceptual fusion and stimulus coincidence in the cross-modal integration of speech. Journal of Neuroscience, 25, 5884–5893.PubMedCrossRefGoogle Scholar
  78. Molholm, S., Ritter, W., Murray, M. M., Javitt, D. C., Schroeder, C. E., & Foxe, J. J. (2002). Multisensory auditory–visual interactions during early sensory processing in humans: A high-density electrical mapping study. Cognitive Brain Research, 14, 115–128.PubMedCrossRefGoogle Scholar
  79. Möttönen, R., Schürmann, M., & Sams, M. (2004). Time course of multisensory interactions during audiovisual speech perception in humans: A magnetoencephalographic study. Neuroscience Letters, 363, 112–115.PubMedCrossRefGoogle Scholar
  80. Munhall, K. G., & Buchan, J. N. (2004). Something in the way she moves. Trends in Cognitive Science, 8, 51–53.CrossRefGoogle Scholar
  81. Munhall, K. G., Jones, J. A., Callan, D. E., Kuratate, T., & Vatikiotis-Bateson, E. (2004). Visual prosody and speech intelligibility head movement improves auditory speech perception. Psychological Science, 15, 133–137.PubMedCrossRefGoogle Scholar
  82. Murray, M. T., & Wallace, M. M. (Eds.) (2011). The neural bases of multisensory processes. Boca Raton: CRC.Google Scholar
  83. Myers, E. B., Blumstein, S. E., Walsh, E., & Eliassen, J. (2009). Inferior frontal regions underlie the perception of phonetic category invariance. Psychological Science, 20, 895–903.PubMedPubMedCentralCrossRefGoogle Scholar
  84. Nath, A. R., & Beauchamp, M. S. (2011). Dynamic changes in superior temporal sulcus connectivity during perception of noisy audiovisual speech. Journal of Neuroscience, 31, 1704–1714.PubMedPubMedCentralCrossRefGoogle Scholar
  85. Nath, A. R., Fava, E. E., & Beauchamp, M. S. (2011). Neural correlates of interindividual differences in children’s audiovisual speech perception. Journal of Neuroscience, 31, 13963–13971.PubMedPubMedCentralCrossRefGoogle Scholar
  86. Naumer, M. J., Doehrmann, O., Müller, N. G., Muckli, L., Kaiser, J., & Hein, G. (2009). Cortical plasticity of audio–visual object representations. Cerebral Cortex, 19, 1641–1653.PubMedCrossRefGoogle Scholar
  87. Noesselt, T., Rieger, J. W., Schoenfeld, M. A., Kanowski, M., Hinrichs, H., Heinze, H.-J., et al. (2007). Audiovisual temporal correspondence modulates human multisensory superior temporal sulcus plus primary sensory cortices. Journal of Neuroscience, 27, 11431–11441.PubMedPubMedCentralCrossRefGoogle Scholar
  88. Noppeney, U., Josephs, O., Hocking, J., Price, C. J., & Friston, K. J. (2008). The effect of prior visual information on recognition of speech and sounds. Cerebral Cortex, 18, 598–609.PubMedCrossRefGoogle Scholar
  89. Ojanen, V., Möttönen, R., Pekkola, J., Jääskeläinen, I. P., Joensuu, R., Autti, T., et al. (2005). Processing of audiovisual speech in Broca’s area. Neuroimage, 25, 333–338.PubMedCrossRefGoogle Scholar
  90. Olson, I. R., Gatenby, J. C., & Gore, J. C. (2002). A comparison of bound and unbound audio–visual information processing in the human cerebral cortex. Cognitive Brain Research, 14, 129–138.PubMedCrossRefGoogle Scholar
  91. Ozker, M., Schepers, I. M., Magnotti, J. F., Yosher, D., & Beauchamps, M. (2017). A double dissociation between anterior and posterior superior temporal gyrus for processing audiovisual speech demonstrated by electrocorticography. Journal of Cognitive Neuroscience, 2916, 1044–1060.Google Scholar
  92. Pekkola, J., Ojanen, V., Autti, T., Jääskeläinen, I. P., Möttönen, R., Tarkiainen, A., et al. (2005). Primary auditory cortex activation by visual speech: An fMRI study at 3 T. Neuroreport, 16, 125–128.PubMedCrossRefGoogle Scholar
  93. Peelle, J.E. & Sommers, M.S. (2015). Prediction and constraint in audiovisual speech perception. Cortex, 68, 169–181.Google Scholar
  94. Ponton, C. W., Bernstein, L. E., & Auer, E. T. (2009). Mismatch negativity with visual-only and audiovisual speech. Brain Topography, 21, 207–215.PubMedPubMedCentralCrossRefGoogle Scholar
  95. Purkinje, J. (1820). Beiträge zur näheren Kenntnis des Schwindels. Med Jahrb kuk Staates (Wien), 6, 23–35.Google Scholar
  96. Radeau, M., & Bertelson, P. (1974). The after-effects of ventriloquism. The Quarterly Journal of Experimental Psychology, 26, 63–71.PubMedCrossRefGoogle Scholar
  97. Radeau, M., & Bertelson, P. (1977). Adaptation to auditory-visual discordance and ventriloquism in semirealistic situations. Perception & Psychophysics, 22, 137–146.CrossRefGoogle Scholar
  98. Raizada, R. D., & Poldrack, R. A. (2007). Selective amplification of stimulus differences during categorical processing of speech. Neuron, 56, 726–740.PubMedCrossRefGoogle Scholar
  99. Reisberg, D., McLean, J., & Goldfield, A. (1987). Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli. In B. Dodd & R. Campbell (Eds.), Hearing by eye: The psychology of lip-reading (pp. 97–113). London: Erlbaum.Google Scholar
  100. Remez, R. E. (2012). Three puzzles of multimodal speech perception. In G. Bailly, P. Perrier, & E. Vatikiotis-Bateson (Eds.), Audiovisual speech processing (pp. 4–20). Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  101. Roberts, M., & Summerfield, Q. (1981). Audiovisual presentation demonstrates that selective adaptation in speech perception is purely auditory. Attention, Perception, & Psychophysics, 30, 309–314.CrossRefGoogle Scholar
  102. Rosenblum, L. D., Pisoni, D., & Remez, R. (2005). Primacy of multimodal speech perception. In D. Pisoni & R. Remez (Eds.) Handbook of speech perception (pp. 51–78). Malden: Blackwell.Google Scholar
  103. Ross, L. A., Saint-Amour, D., Leavitt, V. M., Javitt, D. C., & Foxe, J. J. (2007). Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments. Cerebral Cortex, 17, 1147–1153.PubMedCrossRefGoogle Scholar
  104. Samuel, A. G. (1986). Red herring detectors and speech perception: In defense of selective adaptation. Cognitive Psychology, 18, 452–499.PubMedCrossRefGoogle Scholar
  105. Schroeder, C. E., Lakatos, P., Kajikawa, Y., Partan, S., & Puce, A. (2008). Neuronal oscillations and visual amplification of speech. Trends in Cognitive Sciences, 12, 106–113.PubMedPubMedCentralCrossRefGoogle Scholar
  106. Schroeder, C. E., Smiley, J., Fu, K. G., McGinnis, T., O’Connell, M. N., & Hackett, T. A. (2003). Anatomical mechanisms and functional implications of multisensory convergence in early cortical processing. International Journal of Psychophysiology, 50, 5–17.PubMedCrossRefGoogle Scholar
  107. Schröger, E., & Widmann, A. (1998). Speeded responses to audiovisual signal changes result from bimodal integration. Psychophysiology, 35, 755–759.PubMedCrossRefGoogle Scholar
  108. Schwiedrzik, C. M., Ruff, C. C., Lazar, A., Leitner, F. C., Singer, W., & Melloni, L. (2014). Untangling perceptual memory: Hysteresis and adaptation map into separate cortical networks. Cerebral Cortex, 24, 1152–1164.PubMedCrossRefGoogle Scholar
  109. Sekuler, R., Sekuler, A. B., & Lau, R. (1997). Sound alters visual motion perception. Nature, 385, 308.PubMedCrossRefGoogle Scholar
  110. Senkowski, D., Talsma, D., Grigutsch, M., Herrmann, C. S., & Woldorff, M. G. (2007). Good times for multisensory integration: Effects of the precision of temporal synchrony as revealed by gamma-band oscillations. Neuropsychologia, 45, 561–571.PubMedCrossRefGoogle Scholar
  111. Sheppard, J. P., Raposo, D., & Churchland, A. K. (2013). Dynamic weighting of multisensory stimuli shapes decision-making in rats and humans. Journal of Vision, 13, 4.PubMedPubMedCentralCrossRefGoogle Scholar
  112. Shimojo, S., & Shams, L. (2001). Sensory modalities are not separate modalities: Plasticity and interactions. Current Opinion in Neurobiology, 11, 505–509.PubMedCrossRefGoogle Scholar
  113. Skipper, J. I., Nusbaum, H. C., & Small, S. L. (2005). Listening to talking faces: Motor cortical activation during speech perception. Neuroimage, 25, 76–89.PubMedCrossRefGoogle Scholar
  114. Sohoglu, E., Peelle, J. E., Carlyon, R. P., & Davis, M. H. (2012). Predictive top-down integration of prior knowledge during speech perception. Journal of Neuroscience, 32, 8443–8453.PubMedCrossRefGoogle Scholar
  115. Staeren, N., Renvall, H., De Martino, F., Goebel, R., & Formisano, E. (2009). Sound categories are represented as distributed patterns in the human auditory cortex. Current Biology, 19, 498–502.PubMedCrossRefGoogle Scholar
  116. Stein, B., & Meredith, M. (1990). Multimodal integration: Neural and behavioral solutions for dealing with stimuli from different modalities. Annals of the New York Academy of Science, 606, 51–70.CrossRefGoogle Scholar
  117. Stein, B. E., Huneycutt, W. S., & Meredith, M. A. (1988). Neurons and behavior: The same rules of multisensory integration apply. Brain Research, 448, 355–358.PubMedCrossRefGoogle Scholar
  118. Stein, B. E., Stanford, T. R., Ramachandran, R., Perrault, T. J., & Rowland, B. A. (2009). Challenges in quantifying multisensory integration: Alternative criteria, models, and inverse effectiveness. Experimental Brain Research, 198, 113.PubMedPubMedCentralCrossRefGoogle Scholar
  119. Stekelenburg, J. J., & Vroomen, J. (2007). Neural correlates of multisensory integration of ecologically valid audiovisual events. Journal of Cognitive Neuroscience, 19, 1964–1973.PubMedCrossRefGoogle Scholar
  120. Stevenson, R. A., Geoghegan, M. L., & James, T. W. (2007). Superadditive BOLD activation in superior temporal sulcus with threshold non-speech objects. Experimental Brain Research, 179, 85–95.PubMedCrossRefGoogle Scholar
  121. Stevenson, R. A., & James, T. W. (2009). Audiovisual integration in human superior temporal sulcus: Inverse effectiveness and the neural processing of speech and object recognition. Neuroimage, 44, 1210–1223.PubMedCrossRefGoogle Scholar
  122. Stratton, G. M. (1897). Vision without inversion of the retinal image. Psychological Review, 4, 341.CrossRefGoogle Scholar
  123. Sumby, W. H., & Pollack, I. (1954). Visual contribution to speech intelligibility in noise. The Journal of the Acoustical Society of America, 26, 212–215.CrossRefGoogle Scholar
  124. Teder-Sälejärvi, W., McDonald, J., Di Russo, F., & Hillyard, S. (2002). An analysis of audio-visual crossmodal integration by means of event-related potential (ERP) recordings. Cognitive Brain Research, 14, 106–114.PubMedCrossRefGoogle Scholar
  125. Van Wassenhove, V., Grant, K. W., & Poeppel, D. (2005). Visual speech speeds up the neural processing of auditory speech. Proceedings of the National Academy of Sciences of the United States of America, 102, 1181–1186.PubMedPubMedCentralCrossRefGoogle Scholar
  126. von Kriegstein, K. (2012). A multisensory perspective on human auditory communication. In M. T. Murray & M. M. Wallace (Eds.), The neural bases of multisensory processes (pp. 683–702). Boca Raton: CRC.Google Scholar
  127. Vroomen, J., & Baart, M. (2009). Phonetic recalibration only occurs in speech mode. Cognition, 110, 254–259.PubMedCrossRefGoogle Scholar
  128. Vroomen, J., & Baart, M. (2012). Phonetic recalibration in audiovisual speech. In M. T. Murray & M. M. Wallace (Eds.), The neural bases of multisensory processes (pp. 363–380). Boca Raton: CRC.Google Scholar
  129. Vroomen, J., & de Gelder, B. (2004). Temporal ventriloquism: Sound modulates the flash-lag effect. Journal of Experimental Psychology: Human Perception and Performance, 30, 513–518.PubMedGoogle Scholar
  130. Vroomen, J., & Stekelenburg, J. J. (2010). Visual anticipatory information modulates multisensory interactions of artificial audiovisual stimuli. Journal of Cognitive Neuroscience, 22, 1583–1596.PubMedCrossRefGoogle Scholar
  131. Vroomen, J., van Linden, S., De Gelder, B., & Bertelson, P. (2007). Visual recalibration and selective adaptation in auditory–visual speech perception: Contrasting build-up courses. Neuropsychologia, 45, 572–577.PubMedCrossRefGoogle Scholar
  132. Vroomen, J., van Linden, S., Keetels, M., De Gelder, B., & Bertelson, P. (2004). Selective adaptation and recalibration of auditory speech by lipread information: Dissipation. Speech Communication, 44, 55–61.CrossRefGoogle Scholar
  133. Watkins, K. E., Strafella, A. P., & Paus, T. (2003). Seeing and hearing speech excites the motor system involved in speech production. Neuropsychologia, 41, 989–994.PubMedCrossRefGoogle Scholar
  134. Wright, T. M., Pelphrey, K. A., Allison, T., McKeown, M. J., & McCarthy, G. (2003). Polysensory interactions along lateral temporal regions evoked by audiovisual speech. Cerebral Cortex, 13, 1034–1043.PubMedCrossRefGoogle Scholar
  135. Yehia, H. C., Kuratate, T., & Vatikiotis-Bateson, E. (2002). Linking facial animation, head motion and speech acoustics. Journal of Phonetics, 30, 555–568.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  • Niclas Kilian-Hütten
    • 1
    • 2
  • Elia Formisano
    • 3
  • Jean Vroomen
    • 4
  1. 1.Department of PsychiatryColumbia University College of Physicians and SurgeonsNew YorkUSA
  2. 2.Department of Cognitive NeuroscienceMaastricht UniversityMaastrichtThe Netherlands
  3. 3.Maastricht Brain Imaging CenterMaastricht UniversityMaastrichtThe Netherlands
  4. 4.Department of Cognitive NeuropsychologyTilburg UniversityTilburgThe Netherlands

Personalised recommendations