Advertisement

Attentional resources contribute to the perceptual learning of talker idiosyncrasies in audiovisual speech

  • Alexandra JesseEmail author
  • Elina Kaplan
Perceptual/Cognitive Constraints on the Structure of Speech Communication: In Honor of Randy Diehl

Abstract

To recognize audiovisual speech, listeners evaluate and combine information obtained from the auditory and visual modalities. Listeners also use information from one modality to adjust their phonetic categories to a talker’s idiosyncrasy encountered in the other modality. In this study, we examined whether the outcome of this cross-modal recalibration relies on attentional resources. In a standard recalibration experiment in Experiment 1, participants heard an ambiguous sound, disambiguated by the accompanying visual speech as either /p/ or /t/. Participants’ primary task was to attend to the audiovisual speech while either monitoring a tone sequence for a target tone or ignoring the tones. Listeners subsequently categorized the steps of an auditory /p/–/t/ continuum more often in line with their exposure. The aftereffect of phonetic recalibration was reduced, but not eliminated, by attentional load during exposure. In Experiment 2, participants saw an ambiguous visual speech gesture that was disambiguated auditorily as either /p/ or /t/. At test, listeners categorized the steps of a visual /p/–/t/ continuum more often in line with the prior exposure. Imposing load in the auditory modality during exposure did not reduce the aftereffect of this type of cross-modal phonetic recalibration. Together, these results suggest that auditory attentional resources are needed for the processing of auditory speech and/or for the shifting of auditory phonetic category boundaries. Listeners thus need to dedicate attentional resources in order to accommodate talker idiosyncrasies in audiovisual speech.

Keywords

Speech perception Perceptual learning Multisensory processing 

Notes

References

  1. Adank, P., & Janse, E. (2010). Comprehension of a novel accent by young and older listeners. Psychology and Aging, 25, 736–740.  https://doi.org/10.1037/a0020054 CrossRefPubMedGoogle Scholar
  2. Alais, D., Morrone, C., & Burr, D. (2006). Separate attentional resources for vision and audition. Proceedings of the Royal Society B, 273, 1339–1345.  https://doi.org/10.1098/rspb.2005.3420 CrossRefPubMedGoogle Scholar
  3. Alsius, A., Möttönen, R., Sams, M. E., Soto-Faraco, S., & Tiippana, K. (2014). Effect of attentional load on audiovisual speech perception: Evidence from ERPs. Frontiers in Psychology, 5, 727.  https://doi.org/10.3389/fpsyg.2014.00727 CrossRefPubMedGoogle Scholar
  4. Alsius, A., Navarra, J., Campbell, R., & Soto-Faraco, S. (2005). Audiovisual integration of speech falters under high attention demands. Current Biology, 15, 839–843.  https://doi.org/10.1016/j.cub.2005.03.046 CrossRefPubMedGoogle Scholar
  5. Alsius, A., Navarra, J., & Soto-Faraco, S. (2007). Attention to touch weakens audiovisual speech integration. Experimental Brain Research, 183, 399–404.  https://doi.org/10.1007/s00221-007-1110-1 CrossRefPubMedGoogle Scholar
  6. Arrighi, R., Lunardi, R., & Burr, D. (2011). Vision and audition do not share attentional resources in sustained tasks. Frontiers in Psychology, 2, 56.  https://doi.org/10.3389/fpsyg.2011.00056 CrossRefPubMedGoogle Scholar
  7. Baart, M., de Boer-Schellekens, L., & Vroomen, J. (2012). Lipread-induced phonetic recalibration in dyslexia. Acta Psychologica, 140, 91–95.  https://doi.org/10.1016/j.actpsy.2012.03.003 CrossRefPubMedGoogle Scholar
  8. Baart, M., & Vroomen, J. (2010a). Do you see what you are hearing? Cross-modal effects of speech sounds on lipreading. Neuroscience Letters, 471, 100–103.  https://doi.org/10.1016/j.neulet.2010.01.019 CrossRefPubMedGoogle Scholar
  9. Baart, M., & Vroomen, J. (2010b). Phonetic recalibration does not depend on working memory. Experimental Brain Research, 203, 575–582.  https://doi.org/10.1007/s00221-010-2264-9 CrossRefPubMedGoogle Scholar
  10. Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68, 255–278.  https://doi.org/10.1016/j.jml.2012.11.001 CrossRefGoogle Scholar
  11. Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67, 1–48.  https://doi.org/10.18637/jss.v067.i01 CrossRefGoogle Scholar
  12. Berman, R. A., & Colby, C. L. (2002). Auditory and visual attention modulate motion processing in area MT. Neuropsychologia, 14, 64–74.  https://doi.org/10.1016/s0926-6410(02)00061-7 Google Scholar
  13. Bertelson, P., Vroomen, J., & de Gelder, B. (2003). Visual recalibration of auditory speech identification: A McGurk aftereffect. Psychological Science, 14, 592–597.  https://doi.org/10.1046/j.0956-7976.2003.psci_1470.x
  14. Boersma, P., & Weenink, D. (2016). Praat: Doing phonetics by computer (Version 6.0.19) [Computer program]. Retrieved from https://dx.www.praat.org/
  15. Brainard, D. H. (1997). The Psychophysics Toolbox. Spatial Vision, 10, 433–436.CrossRefPubMedGoogle Scholar
  16. Brancazio, L., & Miller, J. L. (2005). Use of visual information in speech perception: Evidence for a visual rate effect both with and without a McGurk effect. Perception & Psychophysics, 67, 759–769.  https://doi.org/10.3758/BF03193531 CrossRefGoogle Scholar
  17. Chun, M. M., Golomb, J. D., & Turk-Browne, N. B. (2011). A taxonomy of external and internal attention. Annual Review of Psychology, 62, 73–101.  https://doi.org/10.1146/annurev.psych.093008.100427 CrossRefPubMedGoogle Scholar
  18. Colin, C., Radeau, M., Soquet, A., Demolin, D., Colin, F., & Deltenre, P. (2002). Mismatch negativity evoked by the McGurk–MacDonald effect: A phonetic representation within short-term memory. Clinical Neurophysiology, 113, 495–506.  https://doi.org/10.1016/s1388-2457(02)00024-x CrossRefPubMedGoogle Scholar
  19. Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven attention in the brain. Nature Reviews, 3, 201–215.  https://doi.org/10.1038/nrn755 CrossRefPubMedGoogle Scholar
  20. Dias, J. W., Cook, T. C., & Rosenblum, L. D. (2016). Influences of selective adaptation on perception of audiovisual speech. Journal of Phonetics, 56, 75–84.  https://doi.org/10.1016/j.wocn.2016.02.004 CrossRefPubMedGoogle Scholar
  21. Diehl, R. L. (1975). The effect of selective adaptation on the identification of speech sounds. Perception & Psychophysics, 17, 48–52.  https://doi.org/10.3758/BF03203996 CrossRefGoogle Scholar
  22. Eimas, P. D., & Corbit, J. D. (1973). Selective adaptation of linguistic feature detectors. Cognitive Psychology, 4, 99–109.  https://doi.org/10.1016/0010-0285(73)90006-6 CrossRefGoogle Scholar
  23. Heald, S. L. M., & Nusbaum, H. C. (2014). Talker variability in audio–visual speech perception. Frontiers in Psychology, 5, 698.  https://doi.org/10.3389/fpsyg.2014.00698 CrossRefPubMedGoogle Scholar
  24. Houghton, R. J., Macken, W. J., & Jones, D. M. (2003). Attentional modulation of the visual motion aftereffect has a central cognitive locus: Evidence of interference by the postcategorical on the precategorical. Journal of Experimental Psychology: Human Perception and Performance, 29, 731–740.  https://doi.org/10.1037/0096-1523.29.4.731 PubMedGoogle Scholar
  25. Huyck, J. J., & Johnsrude, I. S. (2012). Rapid perceptual learning of noise-vocoded speech requires attention. Journal of the Acoustical Society of America, 131, EL236–EL242.  https://doi.org/10.1121/1.3685511 CrossRefPubMedGoogle Scholar
  26. Jaeggi, S. M., Buschkuehl, M., Perrig, W. J., & Meier, B. (2010). The concurrent validity of the N-back task as a working memory measure. Memory, 18, 394–412.  https://doi.org/10.1080/09658211003702171 CrossRefPubMedGoogle Scholar
  27. Janse, E., & Adank, P. (2012). Predicting foreign-accent adaptation in older adults. Quarterly Journal of Experimental Psychology, 65, 1563–1585.  https://doi.org/10.1080/17470218.2012.658822 CrossRefGoogle Scholar
  28. Jesse, A., & Bartoli, M. (2018). Learning to recognize unfamiliar talkers: Listeners rapidly form representations of facial dynamic signatures. Cognition, 176, 195–208.  https://doi.org/10.1016/j.cognition.2018.03.018 CrossRefPubMedGoogle Scholar
  29. Jesse, A., & Massaro, D. W. (2010). The temporal distribution of information in audiovisual spoken-word identification. Attention, Perception, & Psychophysics, 72, 209–225.  https://doi.org/10.3758/APP.72.1.209 CrossRefGoogle Scholar
  30. Jesse, A., Vrignaud, N., Cohen, M. A., & Massaro, D. W. (2000). The processing of information from multiple sources in simultaneous interpreting. Interpreting, 5, 95–115.  https://doi.org/10.1075/intp.5.2.04jes CrossRefGoogle Scholar
  31. Kahneman, D., & Chajczyk, D. (1983). Tests of the automaticity of reading: dilution of Stroop effects by color-irrelevant stimuli. Journal of Experimental Psychology: Human Perception and Performance, 9, 497–509.  https://doi.org/10.1037/0096-1523.9.4.497 PubMedGoogle Scholar
  32. Kajander, D., Kaplan, E., & Jesse, A. (2016). Attention modulates cross-modal retuning of phonetic categories to speakers. Abstracts of the Psychonomic Society, 21, 114.Google Scholar
  33. Keetels, M., Pecoraro, M., & Vroomen, J. (2015). Recalibration of auditory phonemes by lipread speech is ear-specific. Cognition, 141, 121–126.  https://doi.org/10.1016/j.cognition.2015.04.019 CrossRefPubMedGoogle Scholar
  34. Keetels, M., Stekelenburg, J. J., & Vroomen, J. (2016). A spatial gradient in phonetic recalibration by lipread speech. Journal of Phonetics, 56, 124–130.  https://doi.org/10.1016/j.wocn.2016.02.005 CrossRefGoogle Scholar
  35. Kilian-Hütten, N., Vroomen, J., & Formisano, E. (2011). Brain activation during audiovisual exposure anticipates future perception of ambiguous speech. NeuroImage, 57, 1601–1607.  https://doi.org/10.1016/j.neuroimage.2011.05.043 CrossRefPubMedGoogle Scholar
  36. Lavie, N. (1995). Perceptual load as a necessary condition for selective attention. Journal of Experimental Psychology: Human Perception and Performance, 21, 451–468.  https://doi.org/10.1037/0096-1523.21.3.451 PubMedGoogle Scholar
  37. Lavie, N., & Tsal, Y. (1994). Perceptual load as a major determinant of the locus of selection in visual attention. Perception & Psychophysics, 56, 183–197.  https://doi.org/10.3758/BF03213897 CrossRefGoogle Scholar
  38. Magnuson, J. S., & Nusbaum, H. C. (2007). Acoustic differences, listener expectations, and the perceptual accommodation of talker variability. Journal of Experimental Psychology: Human Perception and Performance, 33, 391–409.  https://doi.org/10.1037/0096-1523.33.2.391 PubMedGoogle Scholar
  39. Massaro, D. W. (1987). Speech perception by ear and eye: A paradigm for psychological inquiry. Hillsdale: Erlbaum.Google Scholar
  40. Massaro, D. W. (1998). Perceiving talking faces: From speech perception to a behavioral principle. Cambridge: MIT Press.Google Scholar
  41. Mattys, S. L., Barden, K., & Samuel, A. G. (2014). Extrinsic cognitive load impairs low-level speech perception. Psychonomic Bulletin & Review, 21, 748–754.  https://doi.org/10.3758/s13423-013-0544-7 CrossRefGoogle Scholar
  42. Mattys, S. L., & Palmer, S. D. (2015). Divided attention disrupts perceptual encoding during speech recognition. Journal of the Acoustical Society of America, 137, 1464–1472.  https://doi.org/10.1121/1.4913507 CrossRefPubMedGoogle Scholar
  43. Mattys, S. L., & Wiget, L. (2011). Effects of cognitive load on speech recognition. Journal of Memory and Language, 65, 145–160.  https://doi.org/10.1016/j.jml.2011.04.004 CrossRefGoogle Scholar
  44. Murphy, G., & Greene, C. M. (2017). The elephant in the road: Auditory perceptual load affects driver perception and awareness. Applied Cognitive Psychology, 31, 258–263.  https://doi.org/10.1002/acp.3311 CrossRefGoogle Scholar
  45. Norris, D., McQueen, J. M., & Cutler, A. (2003). Perceptual learning in speech. Cognitive Psychology, 47, 204–238.  https://doi.org/10.1016/S0010-0285(03)00006-9 CrossRefPubMedGoogle Scholar
  46. Prabhakaran, V., Narayanan, K., Zhao, Z., & Gabrieli, J. D. E. (2000). Integration of diverse information in working memory within the frontal lobe. Nature Reviews, 3, 85–90.  https://doi.org/10.1038/71156 Google Scholar
  47. R Core Team. (2014). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. Retrieved from https://dx.www.R-project.org/ Google Scholar
  48. Rees, G., Frith, C., & Lavie, N. (2001). Processing of irrelevant visual motion during performance of an auditory attention task. Neuropsychologia, 39, 937–949.  https://doi.org/10.1016/s0028-3932(01)00016-1 CrossRefPubMedGoogle Scholar
  49. Reisberg, D., McLean, J., & Goldfield, A. (1987). Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli. In B. Dodd & R. Campbell (Eds.), Hearing by eye: The psychology of lipreading (pp. 97–113). Hillsdale: Erlbaum.Google Scholar
  50. Roberts, M., & Summerfield, Q. (1981). Audiovisual presentation demonstrates that selective adaptation in speech perception is purely auditory. Perception & Psychophysics, 30, 309–314.  https://doi.org/10.3758/BF03206144 CrossRefGoogle Scholar
  51. Rosenblum, L. D., & Saldana, H. M. (1996). An audiovisual test of kinematic primitives for visual speech perception. Journal of Experimental Psychology: Human Perception and Performance, 22, 318–331.  https://doi.org/10.1037/0096-1523.22.2.318 PubMedGoogle Scholar
  52. Saldana, H. M., & Rosenblum, L. D. (1994). Selective adaptation in speech perception using a compelling audiovisual adaptor. Journal of the Acoustical Society of America, 95, 3658–3661.  https://doi.org/10.1121/1.409935 CrossRefPubMedGoogle Scholar
  53. Samuel, A. G. (2016). Lexical representations are malleable for about one second: Evidence for the non-automaticity of perceptual recalibration. Cognitive Psychology, 88, 88–114.  https://doi.org/10.1016/j.cogpsych.2016.06.007 CrossRefPubMedGoogle Scholar
  54. Samuel, A. G., & Kat, D. (1998). Adaptation is automatic. Perception & Psychophysics, 60, 503–510.  https://doi.org/10.3758/bf03206870 CrossRefGoogle Scholar
  55. Samuel, A. G., & Lieblich, J. (2014). Visual speech acts differently than lexical context in supporting speech perception. Journal of Experimental Psychology: Human Perception and Performance, 40, 1479–1490.  https://doi.org/10.1037/a0036656 PubMedGoogle Scholar
  56. Santangelo, V., & Spence, C. (2007). Multisensory cues capture spatial attention regardless of perceptual load. Journal of Experimental Psychology: Human Perception and Performance, 33, 1311–1321.  https://doi.org/10.1037/0096-1523.33.6.1311 PubMedGoogle Scholar
  57. Scharenborg, O., Weber, A., & Janse, E. (2014). The role of attentional abilities in lexically guided perceptual learning by older listeners. Attention, Perception, & Psychophysics, 77, 493–507.  https://doi.org/10.3758/s13414-014-0792-2 CrossRefGoogle Scholar
  58. Seitz, A. R., Protopapas, A., Tsushima, Y., Vlahou, E. L., Gori, S., Grossberg, S., & Watanabe, T. (2010). Unattended exposure to components of speech sounds yields same benefits as explicit auditory training. Cognition, 115, 435–443.  https://doi.org/10.1016/j.cognition.2010.03.004 CrossRefPubMedGoogle Scholar
  59. Sinnett, S., Costa, A., & Soto-Faraco, S. (2018). Manipulating inattentional blindness within and across sensory modalities. Quarterly Journal of Experimental Psychology, 59, 1425–1442.  https://doi.org/10.1080/17470210500298948 CrossRefGoogle Scholar
  60. Soto-Faraco, S., Navarra, J., & Alsius, A. (2004). Assessing automaticity in audiovisual speech integration: evidence from the speeded classification task. Cognition, 92, B13–23.  https://doi.org/10.1016/j.cognition.2003.10.005 CrossRefPubMedGoogle Scholar
  61. Sumby, W. H., & Pollack, I. (1954). Visual contribution to speech intelligibility in noise. The Journal of the Acoustical Society of America, 26, 212–215.  https://doi.org/10.1121/1.1907309 CrossRefGoogle Scholar
  62. Summerfield, Q., & McGrath, M. (1984). Detection and resolution of audio-visual incompatibility in the perception of vowels. Quarterly Journal of Experimental Psychology, 36A, 51–74.  https://doi.org/10.1080/14640748408401503 CrossRefGoogle Scholar
  63. Sussman, J. E. (1993). Focused attention during selective adaptation along a place of articulation continuum. The Journal of the Acoustical Society of America, 93, 488–498.  https://doi.org/10.1121/1.405629 CrossRefPubMedGoogle Scholar
  64. Theeuwes, J. (1991). Exogenous and endogenous control of attention: The effect of visual onsets and offsets. Perception & Psychophysics, 49, 83–90.  https://doi.org/10.3758/BF03211619 CrossRefGoogle Scholar
  65. Tiippana, K. (2014). What is the McGurk effect? Frontiers in Psychology, 5, 725.  https://doi.org/10.3389/fpsyg.2014.00725 CrossRefPubMedGoogle Scholar
  66. Tuomainen, J., Andersen, T. S., Tiippana, K., & Sams, M. (2005). Audio–visual speech perception is special. Cognition, 96, B13–B22.  https://doi.org/10.1016/j.cognition.2004.10.004 CrossRefPubMedGoogle Scholar
  67. van der Zande, P., Jesse, A., & Cutler, A. (2014). Cross-speaker generalisation in two phoneme-level perceptual adaptation processes. Journal of Phonetics, 43, 38–46.  https://doi.org/10.1016/j.wocn.2014.01.003 CrossRefGoogle Scholar
  68. van der Zande, P., Jesse, A., & Cutler, A. (2013). Lexically guided retuning of visual phonetic categories. Journal of the Acoustical Society of America, 134, 562–571.  https://doi.org/10.1121/1.4807814 CrossRefPubMedGoogle Scholar
  69. van Linden, S., & Vroomen, J. (2007). Recalibration of phonetic categories by lipread speech versus lexical information. Journal of Experimental Psychology: Human Perception and Performance, 33, 1483–1494.  https://doi.org/10.1037/0096-1523.33.6.1483 PubMedGoogle Scholar
  70. van Linden, S., & Vroomen, J. (2008). Audiovisual speech recalibration in children. Journal of Child Language, 35, 809–814.  https://doi.org/10.1017/S0305000908008817 CrossRefPubMedGoogle Scholar
  71. Vroomen, J., & Baart, M. (2009a). Phonetic recalibration only occurs in speech mode. Cognition, 110, 254–259.  https://doi.org/10.1016/j.cognition.2008.10.015 CrossRefPubMedGoogle Scholar
  72. Vroomen, J., & Baart, M. (2009b). Recalibration of phonetic categories by lipread speech: measuring aftereffects after a 24-hour delay. Language and Speech, 52, 341–350.  https://doi.org/10.1177/0023830909103178 CrossRefPubMedGoogle Scholar
  73. Vroomen, J., van Linden, S., de Gelder, B., & Bertelson, P. (2007). Visual recalibration and selective adaptation in auditory–visual speech perception: Contrasting build-up courses. Neuropsychologia, 45, 572–577.  https://doi.org/10.1016/j.neuropsychologia.2006.01.031
  74. Vroomen, J., van Linden, S., Keetels, M., de Gelder, B., & Bertelson, P. (2004). Selective adaptation and recalibration of auditory speech by lipread information: dissipation. Speech Communication, 44, 55–61.  https://doi.org/10.1016/j.specom.2004.03.009
  75. Wahn, B., & König, P. (2015). Audition and vision share spatial attentional resources, yet attentional load does not disrupt audiovisual integration. Frontiers in Psychology, 6, 14608.  https://doi.org/10.3389/fpsyg.2015.01084 CrossRefGoogle Scholar
  76. Wahn, B., & König, P. (2017). Is attentional resource allocation across sensory modalities task-dependent? Advances in Cognitive Psychology, 13, 83–96.  https://doi.org/10.5709/acp-0209-2 CrossRefPubMedGoogle Scholar
  77. Wahn, B., Murali, S., Sinnett, S., & König, P. (2017). Auditory stimulus detection partially depends on visuospatial attentional resources. I-Perception, 8, 204166951668802.  https://doi.org/10.1177/2041669516688026 CrossRefGoogle Scholar
  78. Walden, B. E., Prosek, R. A., & Worthington, D. W. (1974). Predicting audiovisual consonant recognition performance of hearing-impaired adults. Journal of Speech, Language, and Hearing Research, 17, 270–278.  https://doi.org/10.1044/jshr.1702.270 CrossRefGoogle Scholar
  79. Wilhelm, O., Hildebrandt, A., & Oberauer, K. (2013). What is working memory capacity, and how can we measure it? Frontiers in Psychology, 4, 433.  https://doi.org/10.3389/fpsyg.2013.00433/abstract CrossRefPubMedGoogle Scholar
  80. Wong, P. C. M., Nusbaum, H. C., & Small, S. L. (2004). Neural bases of talker normalization. Journal of Cognitive Neuroscience, 16, 1173–1184.  https://doi.org/10.1162/0898929041920522 CrossRefPubMedGoogle Scholar
  81. Woodman, G. F., Luck, S. J., & Schall, J. D. (2007). The role of working memory representations in the control of attention. Cerebral Cortex, 17(Supp. 1), i118–i124.  https://doi.org/10.1093/cercor/bhm065 CrossRefPubMedGoogle Scholar
  82. Wright, B. A., Sabin, A. T., Zhang, Y., Marrone, N., & Fitzgerald, M. B. (2010). Enhancing perceptual learning by combining practice with periods of additional sensory stimulation. Journal of Neuroscience, 30, 12868–12877.  https://doi.org/10.1523/JNEUROSCI.0487-10.2010 CrossRefPubMedGoogle Scholar
  83. Yakel, D. A., Rosenblum, L. D., & Fortier, M. A. (2000). Effects of talker variability on speech-reading. Perception & Psychophysics, 62, 1405–1412.  https://doi.org/10.3758/BF03212142 CrossRefGoogle Scholar
  84. Yantis, S., & Jonides, J. (1984). Abrupt visual onsets and selective attention: evidence from visual search. Journal of Experimental Psychology: Human Perception and Performance, 10, 601–621.  https://doi.org/10.1037/0096-1523.10.5.601 PubMedGoogle Scholar
  85. Zhang, X., & Samuel, A. G. (2014). Perceptual learning of speech under optimal and adverse conditions. Journal of Experimental Psychology: Human Perception and Performance, 40, 200–217.  https://doi.org/10.1037/a0033182 PubMedGoogle Scholar

Copyright information

© The Psychonomic Society, Inc. 2019

Authors and Affiliations

  1. 1.Department of Psychological and Brain SciencesUniversity of MassachusettsAmherstUSA

Personalised recommendations