Attentional resources contribute to the perceptual learning of talker idiosyncrasies in audiovisual speech

  • Alexandra JesseEmail author
  • Elina Kaplan
Perceptual/Cognitive Constraints on the Structure of Speech Communication: In Honor of Randy Diehl


To recognize audiovisual speech, listeners evaluate and combine information obtained from the auditory and visual modalities. Listeners also use information from one modality to adjust their phonetic categories to a talker’s idiosyncrasy encountered in the other modality. In this study, we examined whether the outcome of this cross-modal recalibration relies on attentional resources. In a standard recalibration experiment in Experiment 1, participants heard an ambiguous sound, disambiguated by the accompanying visual speech as either /p/ or /t/. Participants’ primary task was to attend to the audiovisual speech while either monitoring a tone sequence for a target tone or ignoring the tones. Listeners subsequently categorized the steps of an auditory /p/–/t/ continuum more often in line with their exposure. The aftereffect of phonetic recalibration was reduced, but not eliminated, by attentional load during exposure. In Experiment 2, participants saw an ambiguous visual speech gesture that was disambiguated auditorily as either /p/ or /t/. At test, listeners categorized the steps of a visual /p/–/t/ continuum more often in line with the prior exposure. Imposing load in the auditory modality during exposure did not reduce the aftereffect of this type of cross-modal phonetic recalibration. Together, these results suggest that auditory attentional resources are needed for the processing of auditory speech and/or for the shifting of auditory phonetic category boundaries. Listeners thus need to dedicate attentional resources in order to accommodate talker idiosyncrasies in audiovisual speech.


Speech perception Perceptual learning Multisensory processing 



  1. Adank, P., & Janse, E. (2010). Comprehension of a novel accent by young and older listeners. Psychology and Aging, 25, 736–740. CrossRefPubMedGoogle Scholar
  2. Alais, D., Morrone, C., & Burr, D. (2006). Separate attentional resources for vision and audition. Proceedings of the Royal Society B, 273, 1339–1345. CrossRefPubMedGoogle Scholar
  3. Alsius, A., Möttönen, R., Sams, M. E., Soto-Faraco, S., & Tiippana, K. (2014). Effect of attentional load on audiovisual speech perception: Evidence from ERPs. Frontiers in Psychology, 5, 727. CrossRefPubMedGoogle Scholar
  4. Alsius, A., Navarra, J., Campbell, R., & Soto-Faraco, S. (2005). Audiovisual integration of speech falters under high attention demands. Current Biology, 15, 839–843. CrossRefPubMedGoogle Scholar
  5. Alsius, A., Navarra, J., & Soto-Faraco, S. (2007). Attention to touch weakens audiovisual speech integration. Experimental Brain Research, 183, 399–404. CrossRefPubMedGoogle Scholar
  6. Arrighi, R., Lunardi, R., & Burr, D. (2011). Vision and audition do not share attentional resources in sustained tasks. Frontiers in Psychology, 2, 56. CrossRefPubMedGoogle Scholar
  7. Baart, M., de Boer-Schellekens, L., & Vroomen, J. (2012). Lipread-induced phonetic recalibration in dyslexia. Acta Psychologica, 140, 91–95. CrossRefPubMedGoogle Scholar
  8. Baart, M., & Vroomen, J. (2010a). Do you see what you are hearing? Cross-modal effects of speech sounds on lipreading. Neuroscience Letters, 471, 100–103. CrossRefPubMedGoogle Scholar
  9. Baart, M., & Vroomen, J. (2010b). Phonetic recalibration does not depend on working memory. Experimental Brain Research, 203, 575–582. CrossRefPubMedGoogle Scholar
  10. Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68, 255–278. CrossRefGoogle Scholar
  11. Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67, 1–48. CrossRefGoogle Scholar
  12. Berman, R. A., & Colby, C. L. (2002). Auditory and visual attention modulate motion processing in area MT. Neuropsychologia, 14, 64–74. Google Scholar
  13. Bertelson, P., Vroomen, J., & de Gelder, B. (2003). Visual recalibration of auditory speech identification: A McGurk aftereffect. Psychological Science, 14, 592–597.
  14. Boersma, P., & Weenink, D. (2016). Praat: Doing phonetics by computer (Version 6.0.19) [Computer program]. Retrieved from
  15. Brainard, D. H. (1997). The Psychophysics Toolbox. Spatial Vision, 10, 433–436.CrossRefPubMedGoogle Scholar
  16. Brancazio, L., & Miller, J. L. (2005). Use of visual information in speech perception: Evidence for a visual rate effect both with and without a McGurk effect. Perception & Psychophysics, 67, 759–769. CrossRefGoogle Scholar
  17. Chun, M. M., Golomb, J. D., & Turk-Browne, N. B. (2011). A taxonomy of external and internal attention. Annual Review of Psychology, 62, 73–101. CrossRefPubMedGoogle Scholar
  18. Colin, C., Radeau, M., Soquet, A., Demolin, D., Colin, F., & Deltenre, P. (2002). Mismatch negativity evoked by the McGurk–MacDonald effect: A phonetic representation within short-term memory. Clinical Neurophysiology, 113, 495–506. CrossRefPubMedGoogle Scholar
  19. Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven attention in the brain. Nature Reviews, 3, 201–215. CrossRefPubMedGoogle Scholar
  20. Dias, J. W., Cook, T. C., & Rosenblum, L. D. (2016). Influences of selective adaptation on perception of audiovisual speech. Journal of Phonetics, 56, 75–84. CrossRefPubMedGoogle Scholar
  21. Diehl, R. L. (1975). The effect of selective adaptation on the identification of speech sounds. Perception & Psychophysics, 17, 48–52. CrossRefGoogle Scholar
  22. Eimas, P. D., & Corbit, J. D. (1973). Selective adaptation of linguistic feature detectors. Cognitive Psychology, 4, 99–109. CrossRefGoogle Scholar
  23. Heald, S. L. M., & Nusbaum, H. C. (2014). Talker variability in audio–visual speech perception. Frontiers in Psychology, 5, 698. CrossRefPubMedGoogle Scholar
  24. Houghton, R. J., Macken, W. J., & Jones, D. M. (2003). Attentional modulation of the visual motion aftereffect has a central cognitive locus: Evidence of interference by the postcategorical on the precategorical. Journal of Experimental Psychology: Human Perception and Performance, 29, 731–740. PubMedGoogle Scholar
  25. Huyck, J. J., & Johnsrude, I. S. (2012). Rapid perceptual learning of noise-vocoded speech requires attention. Journal of the Acoustical Society of America, 131, EL236–EL242. CrossRefPubMedGoogle Scholar
  26. Jaeggi, S. M., Buschkuehl, M., Perrig, W. J., & Meier, B. (2010). The concurrent validity of the N-back task as a working memory measure. Memory, 18, 394–412. CrossRefPubMedGoogle Scholar
  27. Janse, E., & Adank, P. (2012). Predicting foreign-accent adaptation in older adults. Quarterly Journal of Experimental Psychology, 65, 1563–1585. CrossRefGoogle Scholar
  28. Jesse, A., & Bartoli, M. (2018). Learning to recognize unfamiliar talkers: Listeners rapidly form representations of facial dynamic signatures. Cognition, 176, 195–208. CrossRefPubMedGoogle Scholar
  29. Jesse, A., & Massaro, D. W. (2010). The temporal distribution of information in audiovisual spoken-word identification. Attention, Perception, & Psychophysics, 72, 209–225. CrossRefGoogle Scholar
  30. Jesse, A., Vrignaud, N., Cohen, M. A., & Massaro, D. W. (2000). The processing of information from multiple sources in simultaneous interpreting. Interpreting, 5, 95–115. CrossRefGoogle Scholar
  31. Kahneman, D., & Chajczyk, D. (1983). Tests of the automaticity of reading: dilution of Stroop effects by color-irrelevant stimuli. Journal of Experimental Psychology: Human Perception and Performance, 9, 497–509. PubMedGoogle Scholar
  32. Kajander, D., Kaplan, E., & Jesse, A. (2016). Attention modulates cross-modal retuning of phonetic categories to speakers. Abstracts of the Psychonomic Society, 21, 114.Google Scholar
  33. Keetels, M., Pecoraro, M., & Vroomen, J. (2015). Recalibration of auditory phonemes by lipread speech is ear-specific. Cognition, 141, 121–126. CrossRefPubMedGoogle Scholar
  34. Keetels, M., Stekelenburg, J. J., & Vroomen, J. (2016). A spatial gradient in phonetic recalibration by lipread speech. Journal of Phonetics, 56, 124–130. CrossRefGoogle Scholar
  35. Kilian-Hütten, N., Vroomen, J., & Formisano, E. (2011). Brain activation during audiovisual exposure anticipates future perception of ambiguous speech. NeuroImage, 57, 1601–1607. CrossRefPubMedGoogle Scholar
  36. Lavie, N. (1995). Perceptual load as a necessary condition for selective attention. Journal of Experimental Psychology: Human Perception and Performance, 21, 451–468. PubMedGoogle Scholar
  37. Lavie, N., & Tsal, Y. (1994). Perceptual load as a major determinant of the locus of selection in visual attention. Perception & Psychophysics, 56, 183–197. CrossRefGoogle Scholar
  38. Magnuson, J. S., & Nusbaum, H. C. (2007). Acoustic differences, listener expectations, and the perceptual accommodation of talker variability. Journal of Experimental Psychology: Human Perception and Performance, 33, 391–409. PubMedGoogle Scholar
  39. Massaro, D. W. (1987). Speech perception by ear and eye: A paradigm for psychological inquiry. Hillsdale: Erlbaum.Google Scholar
  40. Massaro, D. W. (1998). Perceiving talking faces: From speech perception to a behavioral principle. Cambridge: MIT Press.Google Scholar
  41. Mattys, S. L., Barden, K., & Samuel, A. G. (2014). Extrinsic cognitive load impairs low-level speech perception. Psychonomic Bulletin & Review, 21, 748–754. CrossRefGoogle Scholar
  42. Mattys, S. L., & Palmer, S. D. (2015). Divided attention disrupts perceptual encoding during speech recognition. Journal of the Acoustical Society of America, 137, 1464–1472. CrossRefPubMedGoogle Scholar
  43. Mattys, S. L., & Wiget, L. (2011). Effects of cognitive load on speech recognition. Journal of Memory and Language, 65, 145–160. CrossRefGoogle Scholar
  44. Murphy, G., & Greene, C. M. (2017). The elephant in the road: Auditory perceptual load affects driver perception and awareness. Applied Cognitive Psychology, 31, 258–263. CrossRefGoogle Scholar
  45. Norris, D., McQueen, J. M., & Cutler, A. (2003). Perceptual learning in speech. Cognitive Psychology, 47, 204–238. CrossRefPubMedGoogle Scholar
  46. Prabhakaran, V., Narayanan, K., Zhao, Z., & Gabrieli, J. D. E. (2000). Integration of diverse information in working memory within the frontal lobe. Nature Reviews, 3, 85–90. Google Scholar
  47. R Core Team. (2014). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. Retrieved from Google Scholar
  48. Rees, G., Frith, C., & Lavie, N. (2001). Processing of irrelevant visual motion during performance of an auditory attention task. Neuropsychologia, 39, 937–949. CrossRefPubMedGoogle Scholar
  49. Reisberg, D., McLean, J., & Goldfield, A. (1987). Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli. In B. Dodd & R. Campbell (Eds.), Hearing by eye: The psychology of lipreading (pp. 97–113). Hillsdale: Erlbaum.Google Scholar
  50. Roberts, M., & Summerfield, Q. (1981). Audiovisual presentation demonstrates that selective adaptation in speech perception is purely auditory. Perception & Psychophysics, 30, 309–314. CrossRefGoogle Scholar
  51. Rosenblum, L. D., & Saldana, H. M. (1996). An audiovisual test of kinematic primitives for visual speech perception. Journal of Experimental Psychology: Human Perception and Performance, 22, 318–331. PubMedGoogle Scholar
  52. Saldana, H. M., & Rosenblum, L. D. (1994). Selective adaptation in speech perception using a compelling audiovisual adaptor. Journal of the Acoustical Society of America, 95, 3658–3661. CrossRefPubMedGoogle Scholar
  53. Samuel, A. G. (2016). Lexical representations are malleable for about one second: Evidence for the non-automaticity of perceptual recalibration. Cognitive Psychology, 88, 88–114. CrossRefPubMedGoogle Scholar
  54. Samuel, A. G., & Kat, D. (1998). Adaptation is automatic. Perception & Psychophysics, 60, 503–510. CrossRefGoogle Scholar
  55. Samuel, A. G., & Lieblich, J. (2014). Visual speech acts differently than lexical context in supporting speech perception. Journal of Experimental Psychology: Human Perception and Performance, 40, 1479–1490. PubMedGoogle Scholar
  56. Santangelo, V., & Spence, C. (2007). Multisensory cues capture spatial attention regardless of perceptual load. Journal of Experimental Psychology: Human Perception and Performance, 33, 1311–1321. PubMedGoogle Scholar
  57. Scharenborg, O., Weber, A., & Janse, E. (2014). The role of attentional abilities in lexically guided perceptual learning by older listeners. Attention, Perception, & Psychophysics, 77, 493–507. CrossRefGoogle Scholar
  58. Seitz, A. R., Protopapas, A., Tsushima, Y., Vlahou, E. L., Gori, S., Grossberg, S., & Watanabe, T. (2010). Unattended exposure to components of speech sounds yields same benefits as explicit auditory training. Cognition, 115, 435–443. CrossRefPubMedGoogle Scholar
  59. Sinnett, S., Costa, A., & Soto-Faraco, S. (2018). Manipulating inattentional blindness within and across sensory modalities. Quarterly Journal of Experimental Psychology, 59, 1425–1442. CrossRefGoogle Scholar
  60. Soto-Faraco, S., Navarra, J., & Alsius, A. (2004). Assessing automaticity in audiovisual speech integration: evidence from the speeded classification task. Cognition, 92, B13–23. CrossRefPubMedGoogle Scholar
  61. Sumby, W. H., & Pollack, I. (1954). Visual contribution to speech intelligibility in noise. The Journal of the Acoustical Society of America, 26, 212–215. CrossRefGoogle Scholar
  62. Summerfield, Q., & McGrath, M. (1984). Detection and resolution of audio-visual incompatibility in the perception of vowels. Quarterly Journal of Experimental Psychology, 36A, 51–74. CrossRefGoogle Scholar
  63. Sussman, J. E. (1993). Focused attention during selective adaptation along a place of articulation continuum. The Journal of the Acoustical Society of America, 93, 488–498. CrossRefPubMedGoogle Scholar
  64. Theeuwes, J. (1991). Exogenous and endogenous control of attention: The effect of visual onsets and offsets. Perception & Psychophysics, 49, 83–90. CrossRefGoogle Scholar
  65. Tiippana, K. (2014). What is the McGurk effect? Frontiers in Psychology, 5, 725. CrossRefPubMedGoogle Scholar
  66. Tuomainen, J., Andersen, T. S., Tiippana, K., & Sams, M. (2005). Audio–visual speech perception is special. Cognition, 96, B13–B22. CrossRefPubMedGoogle Scholar
  67. van der Zande, P., Jesse, A., & Cutler, A. (2014). Cross-speaker generalisation in two phoneme-level perceptual adaptation processes. Journal of Phonetics, 43, 38–46. CrossRefGoogle Scholar
  68. van der Zande, P., Jesse, A., & Cutler, A. (2013). Lexically guided retuning of visual phonetic categories. Journal of the Acoustical Society of America, 134, 562–571. CrossRefPubMedGoogle Scholar
  69. van Linden, S., & Vroomen, J. (2007). Recalibration of phonetic categories by lipread speech versus lexical information. Journal of Experimental Psychology: Human Perception and Performance, 33, 1483–1494. PubMedGoogle Scholar
  70. van Linden, S., & Vroomen, J. (2008). Audiovisual speech recalibration in children. Journal of Child Language, 35, 809–814. CrossRefPubMedGoogle Scholar
  71. Vroomen, J., & Baart, M. (2009a). Phonetic recalibration only occurs in speech mode. Cognition, 110, 254–259. CrossRefPubMedGoogle Scholar
  72. Vroomen, J., & Baart, M. (2009b). Recalibration of phonetic categories by lipread speech: measuring aftereffects after a 24-hour delay. Language and Speech, 52, 341–350. CrossRefPubMedGoogle Scholar
  73. Vroomen, J., van Linden, S., de Gelder, B., & Bertelson, P. (2007). Visual recalibration and selective adaptation in auditory–visual speech perception: Contrasting build-up courses. Neuropsychologia, 45, 572–577.
  74. Vroomen, J., van Linden, S., Keetels, M., de Gelder, B., & Bertelson, P. (2004). Selective adaptation and recalibration of auditory speech by lipread information: dissipation. Speech Communication, 44, 55–61.
  75. Wahn, B., & König, P. (2015). Audition and vision share spatial attentional resources, yet attentional load does not disrupt audiovisual integration. Frontiers in Psychology, 6, 14608. CrossRefGoogle Scholar
  76. Wahn, B., & König, P. (2017). Is attentional resource allocation across sensory modalities task-dependent? Advances in Cognitive Psychology, 13, 83–96. CrossRefPubMedGoogle Scholar
  77. Wahn, B., Murali, S., Sinnett, S., & König, P. (2017). Auditory stimulus detection partially depends on visuospatial attentional resources. I-Perception, 8, 204166951668802. CrossRefGoogle Scholar
  78. Walden, B. E., Prosek, R. A., & Worthington, D. W. (1974). Predicting audiovisual consonant recognition performance of hearing-impaired adults. Journal of Speech, Language, and Hearing Research, 17, 270–278. CrossRefGoogle Scholar
  79. Wilhelm, O., Hildebrandt, A., & Oberauer, K. (2013). What is working memory capacity, and how can we measure it? Frontiers in Psychology, 4, 433. CrossRefPubMedGoogle Scholar
  80. Wong, P. C. M., Nusbaum, H. C., & Small, S. L. (2004). Neural bases of talker normalization. Journal of Cognitive Neuroscience, 16, 1173–1184. CrossRefPubMedGoogle Scholar
  81. Woodman, G. F., Luck, S. J., & Schall, J. D. (2007). The role of working memory representations in the control of attention. Cerebral Cortex, 17(Supp. 1), i118–i124. CrossRefPubMedGoogle Scholar
  82. Wright, B. A., Sabin, A. T., Zhang, Y., Marrone, N., & Fitzgerald, M. B. (2010). Enhancing perceptual learning by combining practice with periods of additional sensory stimulation. Journal of Neuroscience, 30, 12868–12877. CrossRefPubMedGoogle Scholar
  83. Yakel, D. A., Rosenblum, L. D., & Fortier, M. A. (2000). Effects of talker variability on speech-reading. Perception & Psychophysics, 62, 1405–1412. CrossRefGoogle Scholar
  84. Yantis, S., & Jonides, J. (1984). Abrupt visual onsets and selective attention: evidence from visual search. Journal of Experimental Psychology: Human Perception and Performance, 10, 601–621. PubMedGoogle Scholar
  85. Zhang, X., & Samuel, A. G. (2014). Perceptual learning of speech under optimal and adverse conditions. Journal of Experimental Psychology: Human Perception and Performance, 40, 200–217. PubMedGoogle Scholar

Copyright information

© The Psychonomic Society, Inc. 2019

Authors and Affiliations

  1. 1.Department of Psychological and Brain SciencesUniversity of MassachusettsAmherstUSA

Personalised recommendations