Experimental Brain Research

, Volume 237, Issue 12, pp 3143–3153 | Cite as

The impact of when, what and how predictions on auditory speech perception

  • Serge Pinto
  • Pascale Tremblay
  • Anahita Basirat
  • Marc SatoEmail author
Research Article


An impressive number of theoretical proposals and neurobiological studies argue that perceptual processing is not strictly feedforward but rather operates through an interplay between bottom-up sensory and top-down predictive mechanisms. The present EEG study aimed to further determine how prior knowledge on auditory syllables may impact speech perception. Prior knowledge was manipulated by presenting the participants with visual information indicative of the syllable onset (when), its phonetic content (what) and/or its articulatory features (how). While when and what predictions consisted of unnatural visual cues (i.e., a visual timeline and a visuo-orthographic cue), how prediction consisted of the visual movements of a speaker. During auditory speech perception, when and what predictions both attenuated the amplitude of N1/P2 auditory evoked potentials. Regarding how prediction, not only an amplitude decrease but also a latency facilitation of N1/P2 auditory evoked potentials were observed during audiovisual compared to unimodal speech perception. However, when and what predictability effects were then reduced or abolished, with only what prediction reducing P2 amplitude but increasing latency. Altogether, these results demonstrate the influence of when, what and how visually induced predictions at an early stage on cortical auditory speech processing. Crucially, they indicate a preponderant predictive role of the speaker’s articulatory gestures during audiovisual speech perception, likely driven by attentional load and focus.


Auditory speech perception Audiovisual speech perception Predictive coding Predictive timing EEG 



The authors thank Avril Treille and Coriandre Vilain for their help with the stimuli. We also thank all the participants.

Compliance with ethical standards

Conflict of interest

The authors declare no competing financial interests.


  1. Alsius A, Navarra J, Campbell R, Soto-Faraco S (2005) Audiovisual integration of speech falters under high attention demands. Curr Biol 15:839–843PubMedGoogle Scholar
  2. Alsius A, Möttönen R, Sams ME, Soto-Faraco S, Tiippana K (2014) Effect of attentional load on audiovisual speech perception: evidence from ERPs. Front Psychol 5:727PubMedPubMedCentralGoogle Scholar
  3. Arnal LH, Giraud AL (2012) Cortical oscillations and sensory predictions. Trends Cogn Sci 16(7):390–398PubMedGoogle Scholar
  4. Arnal LH, Morillon B, Kell CA, Giraud AL (2009) Dual neural routing of visual facilitation in speech processing. J Neurosci 29(43):13445–13453PubMedPubMedCentralGoogle Scholar
  5. Baart M (2016) Quantifying lip-read induced suppression and facilitation of the auditory N1 and P2 reveals peak enhancements and delays. Psychophysiology 53(9):1295–1306PubMedGoogle Scholar
  6. Baart M, Stekelenburg JJ, Vroomen J (2014) Electrophysiological evidence for speech-specific audiovisual integration. Neuropsychologia 65:115–211Google Scholar
  7. Besle J, Fort A, Delpuech C, Giard MH (2004) Bimodal speech: early suppressive visual effects in human auditory cortex. Eur J Neurosci 20:2225–2234PubMedPubMedCentralGoogle Scholar
  8. Boersma P, Weenink D (2013) Praat: doing phonetics by computer. Computer program, Version 5.3.42. Accessed Sept 2019
  9. Chandrasekaran C, Trubanova A, Stillittano S, Caplier A, Ghazanfar A (2009) The natural statistics of audiovisual speech. PLoS Comput Biol 5:e1000436PubMedPubMedCentralGoogle Scholar
  10. Clark A (2013) Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behav Brain Sci 36:181–204PubMedGoogle Scholar
  11. Clementz BA, Barber SK, Dzau JR (2002) Knowledge of stimulus repetition affects the magnitude and spatial distribution of low-frequency event-related brain potentials. Audiol Neurootol 7:303–314PubMedGoogle Scholar
  12. Delorme A, Makeig S (2004) EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics. J Neurosci Methods 134:9–21PubMedGoogle Scholar
  13. Fodor J (1983) The modularity of mind. Massachusetts Institute of Technology, CambridgeGoogle Scholar
  14. Friston K (2005) A theory of cortical responses. Philos Trans R Soc Lond B Biol Sci 360:815–836PubMedPubMedCentralGoogle Scholar
  15. Friston K (2010) The free-energy principle: a unified brain theory? Nat Rev Neurosci 11:127–138PubMedGoogle Scholar
  16. Frtusova JB, Winneke AH, Phillips NA (2013) ERP evidence that auditory–visual speech facilitates working memory in younger and older adults. Psychol Aging 28(2):481–494PubMedGoogle Scholar
  17. Ganesh AC, Berthommier F, Vilain C, Sato M, Schwartz JL (2014) A possible neurophysiological correlate of audiovisual binding and unbinding in speech perception. Front Psychol 5:1340PubMedPubMedCentralGoogle Scholar
  18. Gregory RL (1980) Perceptions as hypotheses. Philos Trans R Soc Lond B Biol Sci 290:181–197PubMedGoogle Scholar
  19. Heilbron M, Chait M (2018) Great expectations: is there evidence for predictive coding in auditory cortex? Neuroscience 389:54–73PubMedGoogle Scholar
  20. Klucharev V, Möttönen R, Sams M (2003) Electrophysiological indicators of phonetic and non-phonetic multisensory interactions during audiovisual speech perception. Brain Res Cogn Brain Res 18:65–75PubMedGoogle Scholar
  21. Knill DC, Pouget A (2004) The Bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci 27:712–719PubMedGoogle Scholar
  22. Laine M, Kwon MS, Hämäläinen H (2007) Automatic auditory change detection in humans is influenced by visual-auditory associative learning. NeuroReport 18(16):1697–1701PubMedGoogle Scholar
  23. Lange K (2009) Brain correlates of early auditory processing are attenuated by expectations for time and pitch. Brain Cogn 69:127–137PubMedGoogle Scholar
  24. Lange K (2013) The ups and downs of temporal orienting: a review of auditory temporal orienting studies and a model associating the heterogeneous findings on the auditory N1 with opposite effects of attention and prediction. Front Integr Neurosci 7:263Google Scholar
  25. Lange K, Röder B (2006) Orienting attention to points in time improves stimulus processing both within and across modalities. J Cogn Neurosci 18:715–729PubMedGoogle Scholar
  26. Lange K, Rösler F, Röder B (2003) Early processing stages are modulated when auditory stimuli are presented at an attended moment in time: an event-related potential study. Psychophysiology 40:806–817PubMedGoogle Scholar
  27. Lange K, Krämer UM, Röder B (2006) Attending points in time and space. Exp Brain Res 173:130–140PubMedGoogle Scholar
  28. Massaro DW (1998) Perceiving talking faces. MIT Press, CambridgeGoogle Scholar
  29. McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Nature 265:746–748Google Scholar
  30. Näätänen R, Picton TW (1987) The N1 wave of the human electric and magnetic response to sound: a review and an analysis of the component structure. Psychophysiology 24:375–425PubMedGoogle Scholar
  31. Nasreddine ZS, Phillips NA, Bedirian V, Charbonneau S, Whitehead V, Collin I, Cummings JL, Chertkow H (2005) The Montreal Cognitive Assessment (MoCA): a brief screening tool for mild cognitive impairment. J Am Geriatr Soc 53:695–699PubMedGoogle Scholar
  32. Neisser U (1967) Cognitive psychology. Appleton-Century-Crofts, New YorkGoogle Scholar
  33. Norris D, McQueen JM, Cutler A (2000) Merging information in speech recognition: feedback is never necessary. Behav Brain Sci 23:299–370PubMedGoogle Scholar
  34. Oldfield RC (1971) The Assessment and analysis of handedness: the Edinburgh inventory. Neuropsychologia 9:97–113PubMedPubMedCentralGoogle Scholar
  35. Paris T, Kim J, Davis C (2016) The processing of attended and predicted sounds in time. J Cogn Neurosci 28(1):158–165PubMedGoogle Scholar
  36. Paris T, Kim J, Davis C (2017) Visual form predictions facilitate auditory processing at the N1. Neuroscience 343:157–164PubMedGoogle Scholar
  37. Pilling M (2009) Auditory event-related potentials (ERPs) in audiovisual speech perception. J Speech Lang Hear Res 52(4):1073–1081PubMedGoogle Scholar
  38. Rao RPN, Ballard DH (1999) Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat Neurosci 2:79–87PubMedGoogle Scholar
  39. Röder B, Krämer UM, Lange K (2007) Congenitally blind humans use different stimulus selection strategies in hearing: an ERP study of spatial and temporal attention. Restor Neurol Neurosci 25:311–322PubMedGoogle Scholar
  40. Rosenblum LD, Dorsi J, Dias JW (2016) The impact and status of Carol Fowler’s supramodal theory of multisensory speech perception. Ecol Psychol 28(4):262–294Google Scholar
  41. Schafer EWP, Amochaev A, Russell MJ (1981) Knowledge of stimulus timing attenuates human evoked cortical potentials. Electroencephalogr Clin Neurophysiol 52:9–17PubMedGoogle Scholar
  42. Scherg M, Von Cramon D (1986) Evoked dipole source potentials of the human auditory cortex. Electroencephalogr Clin Neurol 65:344–360Google Scholar
  43. Schwartz JL, Savariaux C (2014) No, there is no 150 ms lead of visual speech on auditory speech, but a range of audiovisual asynchronies varying from small audio lead to large audio lag. PLoS Comput Biol 10(7):e1003743PubMedPubMedCentralGoogle Scholar
  44. Sohoglu E, Peelle JE, Carlyon RP, Davis MH (2012) Predictive topdown integration of prior knowledge during speech perception. J Neurosci 32:8443–8453PubMedPubMedCentralGoogle Scholar
  45. Stekelenburg JJ, Vroomen J (2007) Neural correlates of multisensory integration of ecologically valid audiovisual events. J Cogn Neurosci 19:1964–1973PubMedGoogle Scholar
  46. Talsma D (2015) Predictive coding and multisensory integration: an attentional account of the multisensory mind. Front Integr Neurosci 19:9Google Scholar
  47. Treille A, Cordeboeuf C, Vilain C, Sato M (2014a) Haptic and visual information speed up the neural processing of auditory speech in live dyadic interactions. Neuropsychologia 57:71–77PubMedGoogle Scholar
  48. Treille A, Vilain C, Sato M (2014b) The sound of your lips: electrophysiological cross-modal interactions during hand-to-face and face-to-face speech perception. Front Psychol 5(420):1–9Google Scholar
  49. Treille A, Vilain C, Kandel S, Sato M (2017) Electrophysiological evidence for a self processing advantage during audiovisual speech integration. Exp Brain Res 235(9):2867–2876PubMedGoogle Scholar
  50. Treille A, Vilain C, Schwartz JL, Hueber T, Sato M (2018) Electrophysiological evidence for audio-visuo-lingual speech integration. Neuropsychologia 109:126–133PubMedGoogle Scholar
  51. van Wassenhove V (2013) Speech through ears and eyes: interfacing the senses with the supramodal brain. Front Psychol 4:1–17Google Scholar
  52. van Wassenhove V, Grant KW, Poeppel D (2005) Visual speech speeds up the neural processing of auditory speech. Proc Natl Acad Sci USA 102:1181–1186PubMedGoogle Scholar
  53. von Helmholtz H (1909) In treatise on physiological optics, vol III, 3rd edn. Voss, LeipzigGoogle Scholar
  54. Vroomen J, Stekelenburg JJ (2010) Visual anticipatory information modulates multisensory interactions of artificial audiovisual stimuli. J Cogn Neurosci 22:1583–1596PubMedGoogle Scholar
  55. Widmann A, Kujala T, Tervaniemi M, Kujala A, Schröger E (2004) From symbols to sounds: visual symbolic information activates sound representations. Psychophysiology 41(5):709–715PubMedGoogle Scholar
  56. Winneke AH, Phillips NA (2011) Does audiovisual speech offer a fountain of youth for old ears? An event-related brain potential study of age differences in audiovisual speech perception. Psychol Aging 26(2):427–438PubMedGoogle Scholar
  57. Woods D (1995) The component structure of the N1 wave of the human auditory evoked potential. Electroencephalogr Clin Neurophysiol Suppl 44:102–109PubMedGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Serge Pinto
    • 1
  • Pascale Tremblay
    • 2
    • 3
  • Anahita Basirat
    • 4
  • Marc Sato
    • 1
    Email author
  1. 1.Laboratoire Parole et Langage, UMR 7309, CNRS, LPLAix Marseille UniversitéAix-en-ProvenceFrance
  2. 2.Département de Réadaptation, Faculté de MédecineUniversité LavalQuebec CityCanada
  3. 3.Cervo Brain Research CentreQuebec CityCanada
  4. 4.Univ. Lille, CNRS, CHU Lille, UMR 9193, SCALab, Sciences Cognitives et Sciences AffectivesLilleFrance

Personalised recommendations