“Paying” attention to audiovisual speech: Do incongruent stimuli incur greater costs?

  • Violet A. BrownEmail author
  • Julia F. StrandEmail author
Registered Reports and Replications


The McGurk effect is a multisensory phenomenon in which discrepant auditory and visual speech signals typically result in an illusory percept. McGurk stimuli are often used in studies assessing the attentional requirements of audiovisual integration, but no study has directly compared the costs associated with integrating congruent versus incongruent audiovisual speech. Some evidence suggests that the McGurk effect may not be representative of naturalistic audiovisual speech processing – susceptibility to the McGurk effect is not associated with the ability to derive benefit from the addition of the visual signal, and distinct cortical regions are recruited when processing congruent versus incongruent speech. In two experiments, one using response times to identify congruent and incongruent syllables and one using a dual-task paradigm, we assessed whether congruent and incongruent audiovisual speech incur different attentional costs. We demonstrated that response times to both the speech task (Experiment 1) and a secondary vibrotactile task (Experiment 2) were indistinguishable for congruent compared to incongruent syllables, but McGurk fusions were responded to more quickly than McGurk non-fusions. These results suggest that despite documented differences in how congruent and incongruent stimuli are processed, they do not appear to differ in terms of processing time or effort, at least in the open-set task speech task used here. However, responses that result in McGurk fusions are processed more quickly than those that result in non-fusions, though attentional cost is comparable for the two response types.


McGurk effect Audiovisual integration Dual-task Listening effort Response time 



The authors thank Kristin Van Engen for helpful feedback on an earlier draft of the paper and the research assistants at Carleton College and Washington University in St. Louis who assisted with data collection and transcription. Carleton College supported this work.

Compliance with ethical standards

Open Practices Statement

This Registered Report was approved in principle prior to data collection (see All data, code, and stimuli are available at


  1. Alsius, A., Möttönen, R., Sams, M. E., Soto-Faraco, S., & Tiippana, K. (2014). Effect of attentional load on audiovisual speech perception: Evidence from ERPs. Frontiers in Psychology, 5, 727.Google Scholar
  2. Alsius, A., Navarra, J., Campbell, R., & Soto-Faraco, S. (2005). Audiovisual integration of speech falters under high attention demands. Current Biology: CB, 15(9), 839–843.Google Scholar
  3. Alsius, A., Navarra, J., & Soto-Faraco, S. (2007). Attention to touch weakens audiovisual speech integration. Experimental Brain Research, 183(3), 399–404.Google Scholar
  4. Alsius, A., Paré, M., & Munhall, K. G. (2017). Forty Years After Hearing Lips and Seeing Voices: the McGurk Effect Revisited. Multisensory Research, 31(1-2), 111–144.Google Scholar
  5. Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3).
  6. Basu Mallick, D., Magnotti, J. F., & Beauchamp, M. S. (2015). Variability and stability in the McGurk effect: contributions of participants, stimuli, time, and response type. Psychonomic Bulletin & Review, 22(5), 1299–1307.Google Scholar
  7. Bates, D., Maechler, M., Bolker, B., Walker, S., Christensen, R., Singmann, H., … Green, P. (2014). Package “lme4.” R foundation for statistical computing, Vienna, 12. Retrieved from
  8. Beauchamp, M. S., Nath, A. R., & Pasalar, S. (2010). fMRI-Guided transcranial magnetic stimulation reveals that the superior temporal sulcus is a cortical locus of the McGurk effect. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 30(7), 2414–2417.Google Scholar
  9. Bourland-Hicks, C., & Tharpe, A. M. (2002). Listening effort and fatigue in school-age children with and without hearing loss. Journal of Speech, Language, and Hearing Research: JSLHR, 45(3), 573–584.Google Scholar
  10. Brancazio, L. (2004). Lexical influences in audiovisual speech perception. Journal of Experimental Psychology. Human Perception and Performance, 30(3), 445–463.Google Scholar
  11. Brancazio, L., & Miller, J. L. (2005). Use of visual information in speech perception: Evidence for a visual rate effect both with and without a McGurk effect. Perception & Psychophysics, 67(5), 759–769.Google Scholar
  12. Calvert, G. A., Campbell, R., & Brammer, M. J. (2000). Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Current Biology: CB, 10(11), 649–657.Google Scholar
  13. Caramazza, A., & Brones, I. (1979). Lexical access in bilinguals. Bulletin of the Psychonomic Society, 13(4), 212–214.Google Scholar
  14. Clopper, C. G., & Pisoni, D. B. (2007). Free classification of regional dialects of American English. Journal of Phonetics, 35(3), 421–438.Google Scholar
  15. Clopper, C. G., Pisoni, D. B., & Tierney, A. T. (2006). Effects of open-set and closed-set task demands on spoken word recognition. Journal of the American Academy of Audiology, 17(5), 331–349.Google Scholar
  16. Colin, C., Radeau, M., & Deltenre, P. (2005). Top-down and bottom-up modulation of audiovisual integration in speech. The European Journal of Cognitive Psychology, 17(4), 541–560.Google Scholar
  17. Colin, C., Radeau, M., Soquet, A., Demolin, D., Colin, F., & Deltenre, P. (2002). Mismatch negativity evoked by the McGurk–MacDonald effect: A phonetic representation within short-term memory. Clinical Neurophysiology: Official Journal of the International Federation of Clinical Neurophysiology, 113, 495–506.Google Scholar
  18. Collins, A. M., & Loftus, E. F. (1975). A spreading-activation theory of semantic processing. Psychological Review.
  19. Desjardins, J. L., & Doherty, K. A. (2013). Age-related changes in listening effort for various types of masker noises. Ear and Hearing, 34(3), 261–272.Google Scholar
  20. Desjardins, J. L., & Doherty, K. A. (2014). The effect of hearing aid noise reduction on listening effort in hearing-impaired adults. Ear and Hearing, 35(6), 600–610.Google Scholar
  21. Downs, D. W. (1982). Effects of hearing aid use on speech discrimination and listening effort. The Journal of Speech and Hearing Disorders, 47(2), 189–193.Google Scholar
  22. Erber, N. P. (1969). Interaction of audition and vision in the recognition of oral speech stimuli. Journal of Speech and Hearing Research, 12(2), 423–425.Google Scholar
  23. Erickson, L. C., Zielinski, B. A., Zielinski, J. E. V., Liu, G., Turkeltaub, P. E., Leaver, A. M., & Rauschecker, J. P. (2014). Distinct cortical locations for integration of audiovisual speech and the McGurk effect. Frontiers in Psychology, 5, 534.Google Scholar
  24. Forster, K. I., & Bednall, E. S. (1976). Terminating and exhaustive search in lexical access. Memory & Cognition, 4(1), 53–61.Google Scholar
  25. Fraser, S., Gagné, J.-P., Alepins, M., & Dubois, P. (2010). Evaluating the effort expended to understand speech in noise using a dual-task paradigm: The effects of providing visual speech cues. Journal of Speech, Language, and Hearing Research: JSLHR, 53(1), 18–33.Google Scholar
  26. Gagné, J.-P., Besser, J., & Lemke, U. (2017). Behavioral assessment of listening effort using a dual-task paradigm: A review. Trends in Hearing, 21, 2331216516687287.Google Scholar
  27. Gentilucci, M., & Cattaneo, L. (2005). Automatic audiovisual integration in speech perception. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 167(1), 66–75.Google Scholar
  28. Gilchrist, J. M., & Allen, P. M. (2015). Lexical decisions in adults with low and high susceptibility to pattern-related visual stress: A preliminary investigation. Frontiers in Psychology, 6.
  29. Gosselin, P. A., & Gagné, J.-P. (2011a). Older adults expend more listening effort than young adults recognizing audiovisual speech in noise. International Journal of Audiology, 50(11), 786–792.Google Scholar
  30. Gosselin, P. A., & Gagné, J.-P. (2011b). Older adults expend more listening effort than young adults recognizing speech in noise. Journal of Speech Language and Hearing Research, 54(3), 944–958.Google Scholar
  31. Gottfried, J. A., & Dolan, R. J. (2003). The nose smells what the eye sees: Crossmodal visual facilitation of human olfactory perception. Neuron, 39, 375–386.Google Scholar
  32. Grant, K. W., & Seitz, P. F. (1998). Measures of auditory–visual integration in nonsense syllables and sentences. Journal of the Acoustical Society of America, 104(4), 2438–2450.Google Scholar
  33. Grant, K. W., Walden, B. E., & Seitz, P. F. (1998). Auditory-visual speech recognition by hearing-impaired subjects: Consonant recognition, sentence recognition, and auditory-visual integration. The Journal of the Acoustical Society of America, 103(5 Pt 1), 2677–2690.Google Scholar
  34. Green, K. P., & Kuhl, P. K. (1991). Integral processing of visual place and auditory voicing information during phonetic perception. Journal of Experimental Psychology. Human Perception and Performance, 17(1), 278–288.Google Scholar
  35. Kahneman, D. (1973). Attention and effort. Englewood Cliffs: Prentice-Hall.Google Scholar
  36. Keane, B. P., Rosenthal, O., Chun, N. H., & Shams, L. (2010). Audiovisual integration in high functioning adults with autism. Research in Autism Spectrum Disorders, 4(2), 276–289.Google Scholar
  37. Kuznetsova, A., Brockhoff, P., & Christensen, R. (2017). lmerTest Package: Tests in Linear Mixed Effects Models. Journal of Statistical Software, Articles, 82(13), 1–26.Google Scholar
  38. Lackner, J. R. (1977). Induction of illusory self-rotation and nystagmus by a rotating sound-field. Aviation, Space, and Environmental Medicine, 48(2), 129–131.Google Scholar
  39. Luce, P. A., & Pisoni, D. B. (1998). Recognizing spoken words: The Neighborhood Activation Model. Ear and Hearing, 19(1), 1–36.Google Scholar
  40. Magnotti, J. F., Basu Mallick, D., Feng, G., Zhou, B., Zhou, W., & Beauchamp, M. S. (2015). Similar frequency of the McGurk effect in large samples of native Mandarin Chinese and American English speakers. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 233(9), 2581–2586.Google Scholar
  41. Massaro, D. W., & Cohen, M. M. (1983). Evaluation and integration of visual and auditory information in speech perception. Journal of Experimental Psychology: Human Perception and Performance, 9(5), 753–771.Google Scholar
  42. Massaro, D. W., & Ferguson, E. L. (1993). Cognitive style and perception: the relationship between category width and speech perception, categorization, and discrimination. The American Journal of Psychology, 106(1), 25–49.Google Scholar
  43. McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264.
  44. Mishra, S., Lunner, T., Stenfelt, S., Rönnberg, J., & Rudner, M. (2013a). Seeing the talker’s face supports executive processing of speech in steady state noise. Frontiers in Systems Neuroscience, 7, 96.Google Scholar
  45. Mishra, S., Lunner, T., Stenfelt, S., Rönnberg, J., & Rudner, M. (2013b). Visual information can hinder working memory processing of speech. Journal of Speech, Language, and Hearing Research, 56, 1120–1132.Google Scholar
  46. Morís Fernández, L., Macaluso, E., & Soto-Faraco, S. (2017). Audiovisual integration as conflict resolution: The conflict of the McGurk illusion. Human Brain Mapping.
  47. Nahorna, O., Berthommier, F., & Schwartz, J.-L. (2012). Binding and unbinding the auditory and visual streams in the McGurk effect. The Journal of the Acoustical Society of America, 132(2), 1061–1077.Google Scholar
  48. Nahorna, O., Berthommier, F., & Schwartz, J.-L. (2015). Audio-visual speech scene analysis: Characterization of the dynamics of unbinding and rebinding the McGurk effect. The Journal of the Acoustical Society of America, 137(1), 362–377.Google Scholar
  49. Navarra, J., Alsius, A., Soto-Faraco, S., & Spence, C. (2010). Assessing the role of attention in the audiovisual integration of speech. An International Journal on Information Fusion, 11(1), 4–11.Google Scholar
  50. Norrix, L. W., Plante, E., & Vance, R. (2006). Auditory-visual speech integration by adults with and without language-learning disabilities. Journal of Communication Disorders, 39(1), 22–36.Google Scholar
  51. Pashler, H. (1994). Dual-task interference in simple tasks: Data and theory. Psychological Bulletin, 116(2), 220–244.Google Scholar
  52. R Core Team. (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Retrieved from Scholar
  53. Rips, L. J., & Shoben, E. J. (1973). Semantic distance and the verification of semantic relations. Journal of Verbal Learning and Verbal Behavior, 12:1–20.Google Scholar
  54. Rosenblum, L. D., & Saldaña, H. M. (1992). Discrimination tests of visually influenced syllables. Perception & Psychophysics, 52(4), 461–473.Google Scholar
  55. Rubenstein, H., Garfield, L., & Millikan, J. A. (1970). Homographic entries in the internal lexicon. Journal of Verbal Learning and Verbal Behavior, 9(5), 487–494.Google Scholar
  56. Saldaña, H. M., & Rosenblum, L. D. (1993). Visual influences on auditory pluck and bow judgments. Perception & Psychophysics, 54(3), 406–416.Google Scholar
  57. Sarampalis, A., Kalluri, S., Edwards, B., & Hafter, E. (2009). Objective measures of listening effort: Effects of background noise and noise reduction. Journal of Speech, Language, and Hearing Research: JSLHR, 52(5), 1230–1240.Google Scholar
  58. Shankar, M. U., Levitan, C. A., Prescott, J., & Spence, C. (2009). The influence of color and label information on flavor perception. Chemosensory Perception, 2(2), 53–58.Google Scholar
  59. Sommers, M. S., & Phelps, D. (2016). Listening effort in younger and older adults: A comparison of auditory-only and auditory-visual presentations. Ear and Hearing, 37 Suppl 1, 62S – 8S.Google Scholar
  60. Soto-Faraco, S., & Alsius, A. (2007). Conscious access to the unisensory components of a cross-modal illusion. Neuroreport, 18(4), 347–350.Google Scholar
  61. Soto-Faraco, S., Navarra, J., & Alsius, A. (2004). Assessing automaticity in audiovisual speech integration: evidence from the speeded classification task. Cognition, 92(3), B13–B23.Google Scholar
  62. Strand, J. F., Brown, V. A., & Barbour, D. L. (2018). Talking points: A modulating circle reduces listening effort without improving speech recognition. Psychonomic Bulletin & Review.
  63. Strand, J. F., Brown, V. A., Merchant, M. B., Brown, H. E., & Smith, J. (2018). Measuring listening effort: Convergent validity, sensitivity, and links with cognitive and personality measures. Journal of Speech, Language, and Hearing Research: JSLHR, 61, 1463–1486.Google Scholar
  64. Strand, J. F., Cooperman, A., Rowe, J., & Simenstad, A. (2014). Individual differences in susceptibility to the McGurk effect: Links with lipreading and detecting audiovisual incongruity. Journal of Speech, Language, and Hearing Research: JSLHR, 57(6), 2322–2331.Google Scholar
  65. Sumby, W. H., & Pollack, I. (1954). Visual contributions to speech intelligibility in noise. The Journal of the Acoustical Society of America, 26(2), 212–215.Google Scholar
  66. Talsma, D., & Woldorff, M. G. (2005). Selective attention and multisensory integration: Multiple phases of effects on the evoked brain activity. Journal of Cognitive Neuroscience, 17(7), 1098–1114.Google Scholar
  67. Tiippana, K., Andersen, T. S., & Sams, M. (2004). Visual attention modulates audiovisual speech perception. The European Journal of Cognitive Psychology, 16(3), 457–472.Google Scholar
  68. Tiippana, K., Puharinen, H., Möttönen, R., & Sams, M. (2011). Sound location can influence audiovisual speech perception when spatial attention is manipulated. Seeing and Perceiving, 24(1), 67–90.Google Scholar
  69. Toscano, J. C., & Allen, J. B. (2014). Across- and within-consonant errors for isolated syllables in noise. Journal of Speech, Language, and Hearing Research: JSLHR, 57(6), 2293–2307.Google Scholar
  70. Tuomainen, J., Andersen, T. S., Tiippana, K., & Sams, M. (2005). Audio-visual speech perception is special. Cognition, 96(1), B13–B22.Google Scholar
  71. Tye-Murray, N., Spehar, B., Myerson, J., Hale, S., & Sommers, M. S. (2016). Lipreading and audiovisual speech recognition across the adult lifespan: Implications for audiovisual integration. Psychology and Aging, 31(4), 380–389.Google Scholar
  72. Van der Burg, E., Brederoo, S. G., Nieuwenstein, M. R., Theeuwes, J., & Olivers, C. N. L. (2010). Audiovisual semantic interference and attention: evidence from the attentional blink paradigm. Acta Psychologica, 134(2), 198–205.Google Scholar
  73. Van Engen, K. J., Phelps, J. E. B., Smiljanic, R., & Chandrasekaran, B. (2014). Enhancing speech intelligibility: Interactions among context, modality, speech style, and masker. Journal of Speech, Language, and Hearing Research: JSLHR, 57(5), 1908–1918.Google Scholar
  74. Van Engen, K. J., Xie, Z., & Chandrasekaran, B. (2017). Audiovisual sentence recognition not predicted by susceptibility to the McGurk effect. Attention, Perception & Psychophysics, 79(2), 396–403.Google Scholar
  75. Zampini, M., & Spence, C. (2004). The role of auditory cues in modulating the perceived crispness and staleness of potato chips. Journal of Sensory Studies, 19(5), 347–363.Google Scholar

Copyright information

© The Psychonomic Society, Inc. 2019

Authors and Affiliations

  1. 1.Psychological and Brain SciencesWashington University in St. LouisSt. LouisUSA
  2. 2.Carleton CollegeNorthfieldUSA

Personalised recommendations