An Object-Based Interpretation of Audiovisual Processing

  • Adrian K. C. LeeEmail author
  • Ross K. Maddox
  • Jennifer K. Bizley
Part of the Springer Handbook of Auditory Research book series (SHAR, volume 68)


Visual cues help listeners follow conversation in a complex acoustic environment. Many audiovisual research studies focus on how sensory cues are combined to optimize perception, either in terms of minimizing the uncertainty in the sensory estimate or maximizing intelligibility, particularly in speech understanding. From an auditory perception perspective, a fundamental question that has not been fully addressed is how visual information aids the ability to select and focus on one auditory object in the presence of competing sounds in a busy auditory scene. In this chapter, audiovisual integration is presented from an object-based attention viewpoint. In particular, it is argued that a stricter delineation of the concepts of multisensory integration versus binding would facilitate a deeper understanding of the nature of how information is combined across senses. Furthermore, using an object-based theoretical framework to distinguish binding as a distinct form of multisensory integration generates testable hypotheses with behavioral predictions that can account for different aspects of multisensory interactions. In this chapter, classic multisensory illusion paradigms are revisited and discussed in the context of multisensory binding. The chapter also describes multisensory experiments that focus on addressing how visual stimuli help listeners parse complex auditory scenes. Finally, it concludes with a discussion of the potential mechanisms by which audiovisual processing might resolve competition between concurrent sounds in order to solve the cocktail party problem.


Binding Cross-modal McGurk illusion Multisensory Object-based attention Scene analysis Sensory integration Sound-induced flash illusion Ventriloquism 


  1. Alais, D., Blake, R., & Lee, S. H. (1998). Visual features that vary together over time group together over space. Nature Neuroscience, 1(2), 160–164.PubMedCrossRefGoogle Scholar
  2. Alais, D., & Burr, D. (2004). The ventriloquist effect results from near-optimal bimodal integration. Current Biology, 14(3), 257–262.Google Scholar
  3. Atilgan, H., Town, S. M., Wood, K. C., Jones, G. P., Maddox, R. K., Lee, A. K. C., & Bizley, J. K. (2018). Integration of visual information in auditory cortex promotes auditory scene analysis through multisensory binding. Neuron, 97, 640–655.PubMedPubMedCentralCrossRefGoogle Scholar
  4. Behrmann, M., Zemel, R. S., & Mozer, M. C. (1998). Object-based attention and occlusion: Evidence from normal participants and a computational model. Journal of Experimental Psychology: Human Perception and Performance, 24(4), 1011–1036.PubMedPubMedCentralGoogle Scholar
  5. Bizley, J. K., & Cohen, Y. E. (2013). The what, where and how of auditory-object perception. Nature Reviews Neuroscience, 14(10), 693–707.PubMedPubMedCentralCrossRefGoogle Scholar
  6. Bizley, J. K., Nodal, F. R., Bajo, V. M., Nelken, I., & King, A. J. (2007). Physiological and anatomical evidence for multisensory interactions in auditory cortex. Cerebral Cortex, 17(9), 2172–2189.PubMedCrossRefPubMedCentralGoogle Scholar
  7. Bizley, J. K., Shinn-Cunningham, B. G., & Lee, A. K. C. (2012). Nothing is irrelevant in a noisy world: Sensory illusions reveal obligatory within-and across-modality integration. Journal of Neuroscience, 32(39), 13402–13410.PubMedCrossRefPubMedCentralGoogle Scholar
  8. Bizley, J. K., Jones, G. P., & Town, S. M. (2016a). Where are multisensory signals combined for perceptual decision-making? Current Opinion in Neurobiology, 40, 31–37.PubMedCrossRefPubMedCentralGoogle Scholar
  9. Bizley, J. K., Maddox, R. K., & Lee, A. K. C. (2016b). Defining auditory-visual objects: Behavioral tests and physiological mechanisms. Trends in Neuroscience, 39(2), 74–85.CrossRefGoogle Scholar
  10. Blake, R., & Lee, S.-H. (2005). The role of temporal structure in human vision. Behavioral and Cognitve Neuroscience Reviews, 4(1), 21–42.CrossRefGoogle Scholar
  11. Blaser, E., Pylyshyn, Z. W., & Holcombe, A. O. (2000). Tracking an object through feature space. Nature, 408(6809), 196–199.PubMedCrossRefPubMedCentralGoogle Scholar
  12. Bruns, P., Maiworm, M., & Röder, B. (2014). Reward expectation influences audiovisual spatial integration. Attention Perception & Psychophysics, 76(6), 1815–1827.CrossRefGoogle Scholar
  13. Chandrasekaran, C., Trubanova, A., Stillittano, S., Caplier, A., & Ghazanfar, A. A. (2009). The natural statistics of audiovisual speech. PLoS Computational Biology, 5(7), e1000436.PubMedPubMedCentralCrossRefGoogle Scholar
  14. Chen, Y. C., & Spence, C. (2013). The time-course of the cross-modal semantic modulation of visual picture processing by naturalistic sounds and spoken words. Multisensory Research, 26, 371–386.PubMedCrossRefPubMedCentralGoogle Scholar
  15. Chen, Y.-C., & Spence, C. (2017). Assessing the role of the ‘unity assumption’ on multisensory integration: A review. Frontiers in Psychology, 8, 445.PubMedPubMedCentralCrossRefGoogle Scholar
  16. Cherry, E. (1953). Some experiments on the recognition of speech, with one and with two ears. The Journal of the Acoustical Society of America, 25(5), 975–979.CrossRefGoogle Scholar
  17. Colin, C., Radeau, M., Deltenre, P., & Morais, J. (2001). Rules of intersensory integration in spatial scene analysis and speechreading. Psychologica Belgica, 41, 131–144.Google Scholar
  18. Colonius, H., & Diederich, A. (2004). Multisensory interaction in saccadic reaction time: A time-window-of-integration model. Journal of Cognitive Neuroscience, 16(6), 1000–1009.PubMedCrossRefPubMedCentralGoogle Scholar
  19. Colonius, H., & Diederich, A. (2010). The optimal time window of visual-auditory integration: A reaction time analysis. Frontiers in Integrative Neuroscience, 4, 11.PubMedPubMedCentralGoogle Scholar
  20. Culling, J. F., & Stone, M. A. (2017). Energetic masking and masking release. In J. C. Middlebrooks, J. Z. Simon, A. N. Popper, & R. R. Fay (Eds.), The auditory system at the cocktail party (pp. 41–74). New York: Springer International.CrossRefGoogle Scholar
  21. Cusack, R., & Roberts, B. (2000). Effects of differences in timbre on sequential grouping. Perception & Psychophysics, 62(5), 1112–1120.CrossRefGoogle Scholar
  22. Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. Annual Review of Neuroscience, 18(1), 193–222.PubMedCrossRefPubMedCentralGoogle Scholar
  23. Doyle, M. C., & Snowden, R. J. (2001). Identification of visual stimuli is improved by accompanying auditory stimuli: The role of eye movements and sound location. Perception, 30(7), 795–810.PubMedCrossRefPubMedCentralGoogle Scholar
  24. Durlach, N. I., & Braida, L. D. (1969). Intensity perception. I. Preliminary theory of intensity resolution. The Journal of the Acoustical Society of America, 46(2), 372–383.PubMedCrossRefPubMedCentralGoogle Scholar
  25. Falchier, A., Schroeder, C. E., Hackett, T. A., Lakatos, P., Nascimento-Silva, S., Ulbert, I., Karmos, G., & Smiley, J. F. (2010). Projection from visual areas V2 and prostriata to caudal auditory cortex in the monkey. Cerebral Cortex, 20(7), 1529–1538.PubMedCrossRefPubMedCentralGoogle Scholar
  26. Feldman, J. (2003). What is a visual object? Trends in Cognitive Sciences, 7(6), 252–256.PubMedCrossRefPubMedCentralGoogle Scholar
  27. Green, D., & Swets, J. (1966). Signal detection theory and psychophysics. New York: Wiley.Google Scholar
  28. Harrington, L. K., & Peck, C. K. (1998). Spatial disparity affects visual-auditory interactions in human sensorimotor processing. Experimental Brain Research, 122(2), 247–252.PubMedCrossRefPubMedCentralGoogle Scholar
  29. Howard, I., & Templeton, W. (1966). Human spatial orientation. New York: Wiley.Google Scholar
  30. Innes-Brown, H., & Crewther, D. (2009). The impact of spatial incongruence on an auditory-visual illusion. PLoS One, 4(7), e6450.PubMedPubMedCentralCrossRefGoogle Scholar
  31. Jackson, C. V. (1953). Visual factors in auditory localization. Quarterly Journal of Experimental Psychology, 5(2), 52–65.CrossRefGoogle Scholar
  32. Knudsen, E. I. (2007). Fundamental components of attention. Annual Review of Neuroscience, 30, 57–78.PubMedCrossRefGoogle Scholar
  33. Körding, K. P., Beierholm, U., Ma, W. J., Quartz, S., Tenenbaum, J. B., & Shams, L. (2007). Causal inference in multisensory perception. PLoS One, 2(9), e943.PubMedPubMedCentralCrossRefGoogle Scholar
  34. Kumpik, D. P., Roberts, H. E., King, A. J., & Bizley, J. K. (2014). Visual sensitivity is a stronger determinant of illusory processes than auditory cue parameters in the sound-induced flash illusion. Journal of Vision, 14(7), 12.PubMedPubMedCentralCrossRefGoogle Scholar
  35. Lakatos, P., Chen, C.-M., Connell, M. N., Mills, A., & Schroeder, C. E. (2007). Neuronal oscillations and multisensory interaction in primary auditory cortex. Neuron, 53(2), 279–292.PubMedPubMedCentralCrossRefGoogle Scholar
  36. Laurienti, P. J., Kraft, R. A., Maldjian, J. A., Burdette, J. H., & Wallace, M. T. (2004). Semantic congruence is a critical factor in multisensory behavioral performance. Experimental Brain Research, 158(4), 405–414.PubMedCrossRefGoogle Scholar
  37. Lee, A. K. C. (2017). Imaging the listening brain. Acoustics Today, 13(3), 35–42.Google Scholar
  38. Lovelace, C. T., Stein, B. E., & Wallace, M. T. (2003). An irrelevant light enhances auditory detection in humans: A psychophysical analysis of multisensory integration in stimulus detection. Cognitive Brain Research, 17(2), 447–453.PubMedCrossRefGoogle Scholar
  39. Maddox, R. K., & Shinn-Cunningham, B. G. (2012). Influence of task-relevant and task-irrelevant feature continuity on selective auditory attention. Journal of the Association for Research in Otolaryngology, 13(1), 119–129.PubMedCrossRefGoogle Scholar
  40. Maddox, R. K., Atilgan, H., Bizley, J. K., & Lee, A. K. C. (2015). Auditory selective attention is enhanced by a task-irrelevant temporally coherent visual stimulus in human listeners. eLife, 4, e04995.PubMedCentralCrossRefPubMedGoogle Scholar
  41. Magnotti, J. F., & Beauchamp, M. S. (2015). The noisy encoding of disparity model of the McGurk effect. Psychonomic Bulletin & Review, 22, 701–709.CrossRefGoogle Scholar
  42. Magnotti, J. F., & Beauchamp, M. S. (2017). A causal inference model explains perception of the McGurk effect and other incongruent audiovisual speech. PLoS Computional Biology, 13(2), e1005229.CrossRefGoogle Scholar
  43. Maiworm, M., Bellantoni, M., Spence, C., & Röder, B. (2012). When emotional valence modulates audiovisual integration. Attention, Perception & Psychophysics, 74(6), 1302–1311.CrossRefGoogle Scholar
  44. Mallick, D. B., Magnotti, J. F., & Beauchamp, M. S. (2015). Variability and stability in the McGurk effect: Contributions of participants, stimuli, time, and response type. Psychonomic Bulletin & Review, 22(5), 1299–1307.CrossRefGoogle Scholar
  45. McCormick, D., & Mamassian, P. (2008). What does the illusory-flash look like? Vision Research, 48(1), 63–69.PubMedCrossRefPubMedCentralGoogle Scholar
  46. McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264(5588), 746–748.PubMedPubMedCentralCrossRefGoogle Scholar
  47. Mercier, M. R., Molholm, S., Fiebelkorn, I. C., Butler, J. S., Schwartz, T. H., & Foxe, J. J. (2015). Neuro-oscillatory phase alignment drives speeded multisensory response times: An electro-corticographic investigation. The Journal of Neuroscience, 35(22), 8546–8557.PubMedPubMedCentralCrossRefGoogle Scholar
  48. Meredith, M. A., Nemitz, J. W., & Stein, B. E. (1987). Determinants of multisensory integration in superior colliculus neurons. I. Temporal factors. The Journal of Neuroscience, 7(10), 3215–3229.PubMedPubMedCentralCrossRefGoogle Scholar
  49. Micheyl, C., & Oxenham, A. J. (2010). Objective and subjective psychophysical measures of auditory stream integration and segregation. Journal of the Association for Research in Otolaryngology, 11(4), 709–724.PubMedPubMedCentralCrossRefGoogle Scholar
  50. Middlebrooks, J. C., Simon, J. Z., Popper, A. N., & Fay, R. R. (2017). The auditory system at the cocktail party. New York: Springer International.CrossRefGoogle Scholar
  51. Mishra, J., Martinez, A., Sejnowski, T. J., & Hillyard, S. A. (2007). Early cross-modal interactions in auditory and visual cortex underlie a sound-induced visual illusion. The Journal of Neuroscience, 27(15), 4120–4131.PubMedPubMedCentralCrossRefGoogle Scholar
  52. Mishra, J., Martinez, A., & Hillyard, S. A. (2013). Audition influences color processing in the sound-induced visual flash illusion. Vision Research, 93, 74–79.PubMedCrossRefPubMedCentralGoogle Scholar
  53. Nahorna, O., Berthommier, F., & Schwartz, J.-L. (2012). Binding and unbinding the auditory and visual streams in the McGurk effect. The Journal of the Acoustical Society of America, 132(2), 1061–1077.PubMedCrossRefPubMedCentralGoogle Scholar
  54. Nahorna, O., Berthommier, F., & Schwartz, J.-L. (2015). Audio-visual speech scene analysis: Characterization of the dynamics of unbinding and rebinding the McGurk effect. The Journal of the Acoustical Society of America, 137(1), 362–377.PubMedCrossRefPubMedCentralGoogle Scholar
  55. Nath, A. R., & Beauchamp, M. S. (2012). A neural basis for interindividual differences in the McGurk effect, a multisensory speech illusion. NeuroImage, 59(1), 781–787.PubMedCrossRefPubMedCentralGoogle Scholar
  56. O’Craven, K. M., Downing, P. E., & Kanwisher, N. (1999). fMRI evidence for objects as the units of attentional selection. Nature, 401(6753), 584–587.PubMedCrossRefPubMedCentralGoogle Scholar
  57. Odgaard, E. C., Arieh, Y., & Marks, L. E. (2004). Brighter noise: Sensory enhancement of perceived loudness by concurrent visual stimulation. Cognitive, Affective, & Behavorial Neuroscience, 4(2), 127–132.CrossRefGoogle Scholar
  58. Peelle, J. E., & Sommers, M. S. (2015). Prediction and constraint in audiovisual speech perception. Cortex, 68, 169–181.PubMedPubMedCentralCrossRefGoogle Scholar
  59. Raposo, D., Sheppard, J. P., Schrater, P. R., & Churchland, A. K. (2012). Multisensory decision-making in rats and humans. The Journal of Neuroscience, 32(11), 3726–3735.PubMedPubMedCentralCrossRefGoogle Scholar
  60. Rohe, T., & Noppeney, U. (2015). Cortical hierarchies perform Bayesian causal inference in multisensory perception. PLoS Biology, 13(2), e1002073.PubMedPubMedCentralCrossRefGoogle Scholar
  61. Shamma, S. A., Elhilali, M., & Micheyl, C. (2011). Temporal coherence and attention in auditory scene analysis. Trends in Neuroscience, 34(3), 114–123.CrossRefGoogle Scholar
  62. Shams, L., Kamitani, Y., & Shimojo, S. (2000). Illusions. What you see is what you hear. Nature, 408(6814), 788.CrossRefGoogle Scholar
  63. Shinn-Cunningham, B. G., Best, V., & Lee, A. K. C. (2017). Auditory object formation and selection. In J. C. Middlebrooks, J. Z. Simon, A. N. Popper, & R. R. Fay (Eds.), The auditory system at the cocktail party (pp. 7–40). New York: Springer International.CrossRefGoogle Scholar
  64. Spence, C. (2013). Just how important is spatial coincidence to multisensory integration? Evaluating the spatial rule. Annals of the New York Academy of Sciences, 1296(1), 31–49.PubMedCrossRefPubMedCentralGoogle Scholar
  65. Spence, C., & Driver, J. (2004). Crossmodal space and crossmodal attention. Oxford: Oxford University Press.CrossRefGoogle Scholar
  66. Spence, C., & McDonald, J. (2004). The crossmodal consequences of the exogenous spatial orienting of attention. In G. A. Calvert, C. Spence, & B. E. Stein (Eds.), The handbook of multisensory processing (pp. 3–25). Cambridge, MA: MIT Press.Google Scholar
  67. Stein, B. E., & Stanford, T. R. (2008). Multisensory integration: Current issues from the perspective of the single neuron. Nature Reviews Neuroscience, 9(4), 255–266.PubMedPubMedCentralCrossRefGoogle Scholar
  68. Stevenson, R. A., Zemtsov, R. K., & Wallace, M. T. (2012). Individual differences in the multisensory temporal binding window predict susceptibility to audiovisual illusions. Journal of Experimental Psychology: Human Perception and Performance, 38(6), 1517–1529.PubMedPubMedCentralGoogle Scholar
  69. Thelen, A., Talsma, D., & Murray, M. M. (2015). Single-trial multisensory memories affect later auditory and visual object discrimination. Cognition, 138, 148–160.PubMedCrossRefPubMedCentralGoogle Scholar
  70. Tiippana, K. (2014). What is the McGurk effect? Frontiers in Psychology, 5, 725.PubMedPubMedCentralCrossRefGoogle Scholar
  71. Tiippana, K., Andersen, T. S., & Sams, M. (2004). Visual attention modulates audiovisual speech perception. European Journal of Cognitive Psychology, 16(3), 457–472.CrossRefGoogle Scholar
  72. Treisman, A. (1998). Feature binding, attention and object perception. Philosophical Transactions of the Royal Society B: Biological Sciences, 353(1373), 1295–1306.CrossRefGoogle Scholar
  73. Van der Burg, E., Olivers, C. N. L., Bronkhorst, A. W., & Theeuwes, J. (2008). Pip and pop: Nonspatial auditory signals improve spatial visual search. Journal of Experimental Psychology: Human Perception and Performance, 34(5), 1053–1065.PubMedGoogle Scholar
  74. van Erp, J. B. F., Philippi, T. G., & Werkhoven, P. (2013). Observers can reliably identify illusory flashes in the illusory flash paradigm. Experimental Brain Research, 226(1), 73–79.PubMedCrossRefGoogle Scholar
  75. van Wassenhove, V., Grant, K. W., & Poeppel, D. (2007). Temporal window of integration in auditory-visual speech perception. Neuropsychologia, 45(3), 598–607.PubMedCrossRefGoogle Scholar
  76. Wallace, M. T., & Stevenson, R. A. (2014). The construct of the multisensory temporal binding window and its dysregulation in developmental disabilities. Neuropsychologia, 64, 105–123.PubMedPubMedCentralCrossRefGoogle Scholar
  77. Welch, R. B., & Warren, D. H. (1980). Immediate perceptual response to intersensory discrepancy. Psychological Bulletin, 80(3), 638–667.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Adrian K. C. Lee
    • 1
    • 2
    Email author
  • Ross K. Maddox
    • 3
    • 4
    • 5
    • 6
  • Jennifer K. Bizley
    • 7
  1. 1.Department of Speech and Hearing SciencesUniversity of WashingtonSeattleUSA
  2. 2.Institute for Learning and Brain Sciences (I-LABS)University of WashingtonSeattleUSA
  3. 3.Department of Biomedical EngineeringUniversity of RochesterRochesterUSA
  4. 4.Department of NeuroscienceUniversity of RochesterRochesterUSA
  5. 5.Del Monte Institute for NeuroscienceUniversity of RochesterRochesterUSA
  6. 6.Center for Visual ScienceUniversity of RochesterRochesterUSA
  7. 7.Ear Institute, University College LondonLondonUK

Personalised recommendations