Skip to main content

An Object-Based Interpretation of Audiovisual Processing

  • Chapter
  • First Online:
Multisensory Processes

Part of the book series: Springer Handbook of Auditory Research ((SHAR,volume 68))

Abstract

Visual cues help listeners follow conversation in a complex acoustic environment. Many audiovisual research studies focus on how sensory cues are combined to optimize perception, either in terms of minimizing the uncertainty in the sensory estimate or maximizing intelligibility, particularly in speech understanding. From an auditory perception perspective, a fundamental question that has not been fully addressed is how visual information aids the ability to select and focus on one auditory object in the presence of competing sounds in a busy auditory scene. In this chapter, audiovisual integration is presented from an object-based attention viewpoint. In particular, it is argued that a stricter delineation of the concepts of multisensory integration versus binding would facilitate a deeper understanding of the nature of how information is combined across senses. Furthermore, using an object-based theoretical framework to distinguish binding as a distinct form of multisensory integration generates testable hypotheses with behavioral predictions that can account for different aspects of multisensory interactions. In this chapter, classic multisensory illusion paradigms are revisited and discussed in the context of multisensory binding. The chapter also describes multisensory experiments that focus on addressing how visual stimuli help listeners parse complex auditory scenes. Finally, it concludes with a discussion of the potential mechanisms by which audiovisual processing might resolve competition between concurrent sounds in order to solve the cocktail party problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This example is best understood if acoustic propagation delay is ignored; modern track and field competitions use a loudspeaker mounted on each starting block, making that a practical reality.

References

  • Alais, D., Blake, R., & Lee, S. H. (1998). Visual features that vary together over time group together over space. Nature Neuroscience, 1(2), 160–164.

    Article  CAS  PubMed  Google Scholar 

  • Alais, D., & Burr, D. (2004). The ventriloquist effect results from near-optimal bimodal integration. Current Biology, 14(3), 257–262.

    Google Scholar 

  • Atilgan, H., Town, S. M., Wood, K. C., Jones, G. P., Maddox, R. K., Lee, A. K. C., & Bizley, J. K. (2018). Integration of visual information in auditory cortex promotes auditory scene analysis through multisensory binding. Neuron, 97, 640–655.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Behrmann, M., Zemel, R. S., & Mozer, M. C. (1998). Object-based attention and occlusion: Evidence from normal participants and a computational model. Journal of Experimental Psychology: Human Perception and Performance, 24(4), 1011–1036.

    CAS  PubMed  Google Scholar 

  • Bizley, J. K., & Cohen, Y. E. (2013). The what, where and how of auditory-object perception. Nature Reviews Neuroscience, 14(10), 693–707.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bizley, J. K., Nodal, F. R., Bajo, V. M., Nelken, I., & King, A. J. (2007). Physiological and anatomical evidence for multisensory interactions in auditory cortex. Cerebral Cortex, 17(9), 2172–2189.

    Article  PubMed  Google Scholar 

  • Bizley, J. K., Shinn-Cunningham, B. G., & Lee, A. K. C. (2012). Nothing is irrelevant in a noisy world: Sensory illusions reveal obligatory within-and across-modality integration. Journal of Neuroscience, 32(39), 13402–13410.

    Article  CAS  PubMed  Google Scholar 

  • Bizley, J. K., Jones, G. P., & Town, S. M. (2016a). Where are multisensory signals combined for perceptual decision-making? Current Opinion in Neurobiology, 40, 31–37.

    Article  CAS  PubMed  Google Scholar 

  • Bizley, J. K., Maddox, R. K., & Lee, A. K. C. (2016b). Defining auditory-visual objects: Behavioral tests and physiological mechanisms. Trends in Neuroscience, 39(2), 74–85.

    Article  CAS  Google Scholar 

  • Blake, R., & Lee, S.-H. (2005). The role of temporal structure in human vision. Behavioral and Cognitve Neuroscience Reviews, 4(1), 21–42.

    Article  Google Scholar 

  • Blaser, E., Pylyshyn, Z. W., & Holcombe, A. O. (2000). Tracking an object through feature space. Nature, 408(6809), 196–199.

    Article  CAS  PubMed  Google Scholar 

  • Bruns, P., Maiworm, M., & Röder, B. (2014). Reward expectation influences audiovisual spatial integration. Attention Perception & Psychophysics, 76(6), 1815–1827.

    Article  Google Scholar 

  • Chandrasekaran, C., Trubanova, A., Stillittano, S., Caplier, A., & Ghazanfar, A. A. (2009). The natural statistics of audiovisual speech. PLoS Computational Biology, 5(7), e1000436.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Chen, Y. C., & Spence, C. (2013). The time-course of the cross-modal semantic modulation of visual picture processing by naturalistic sounds and spoken words. Multisensory Research, 26, 371–386.

    Article  PubMed  Google Scholar 

  • Chen, Y.-C., & Spence, C. (2017). Assessing the role of the ‘unity assumption’ on multisensory integration: A review. Frontiers in Psychology, 8, 445.

    Article  PubMed  PubMed Central  Google Scholar 

  • Cherry, E. (1953). Some experiments on the recognition of speech, with one and with two ears. The Journal of the Acoustical Society of America, 25(5), 975–979.

    Article  Google Scholar 

  • Colin, C., Radeau, M., Deltenre, P., & Morais, J. (2001). Rules of intersensory integration in spatial scene analysis and speechreading. Psychologica Belgica, 41, 131–144.

    Google Scholar 

  • Colonius, H., & Diederich, A. (2004). Multisensory interaction in saccadic reaction time: A time-window-of-integration model. Journal of Cognitive Neuroscience, 16(6), 1000–1009.

    Article  PubMed  Google Scholar 

  • Colonius, H., & Diederich, A. (2010). The optimal time window of visual-auditory integration: A reaction time analysis. Frontiers in Integrative Neuroscience, 4, 11.

    PubMed  PubMed Central  Google Scholar 

  • Culling, J. F., & Stone, M. A. (2017). Energetic masking and masking release. In J. C. Middlebrooks, J. Z. Simon, A. N. Popper, & R. R. Fay (Eds.), The auditory system at the cocktail party (pp. 41–74). New York: Springer International.

    Chapter  Google Scholar 

  • Cusack, R., & Roberts, B. (2000). Effects of differences in timbre on sequential grouping. Perception & Psychophysics, 62(5), 1112–1120.

    Article  CAS  Google Scholar 

  • Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. Annual Review of Neuroscience, 18(1), 193–222.

    Article  CAS  PubMed  Google Scholar 

  • Doyle, M. C., & Snowden, R. J. (2001). Identification of visual stimuli is improved by accompanying auditory stimuli: The role of eye movements and sound location. Perception, 30(7), 795–810.

    Article  CAS  PubMed  Google Scholar 

  • Durlach, N. I., & Braida, L. D. (1969). Intensity perception. I. Preliminary theory of intensity resolution. The Journal of the Acoustical Society of America, 46(2), 372–383.

    Article  CAS  PubMed  Google Scholar 

  • Falchier, A., Schroeder, C. E., Hackett, T. A., Lakatos, P., Nascimento-Silva, S., Ulbert, I., Karmos, G., & Smiley, J. F. (2010). Projection from visual areas V2 and prostriata to caudal auditory cortex in the monkey. Cerebral Cortex, 20(7), 1529–1538.

    Article  PubMed  Google Scholar 

  • Feldman, J. (2003). What is a visual object? Trends in Cognitive Sciences, 7(6), 252–256.

    Article  PubMed  Google Scholar 

  • Green, D., & Swets, J. (1966). Signal detection theory and psychophysics. New York: Wiley.

    Google Scholar 

  • Harrington, L. K., & Peck, C. K. (1998). Spatial disparity affects visual-auditory interactions in human sensorimotor processing. Experimental Brain Research, 122(2), 247–252.

    Article  CAS  PubMed  Google Scholar 

  • Howard, I., & Templeton, W. (1966). Human spatial orientation. New York: Wiley.

    Google Scholar 

  • Innes-Brown, H., & Crewther, D. (2009). The impact of spatial incongruence on an auditory-visual illusion. PLoS One, 4(7), e6450.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Jackson, C. V. (1953). Visual factors in auditory localization. Quarterly Journal of Experimental Psychology, 5(2), 52–65.

    Article  Google Scholar 

  • Knudsen, E. I. (2007). Fundamental components of attention. Annual Review of Neuroscience, 30, 57–78.

    Article  CAS  PubMed  Google Scholar 

  • Körding, K. P., Beierholm, U., Ma, W. J., Quartz, S., Tenenbaum, J. B., & Shams, L. (2007). Causal inference in multisensory perception. PLoS One, 2(9), e943.

    Article  PubMed  PubMed Central  Google Scholar 

  • Kumpik, D. P., Roberts, H. E., King, A. J., & Bizley, J. K. (2014). Visual sensitivity is a stronger determinant of illusory processes than auditory cue parameters in the sound-induced flash illusion. Journal of Vision, 14(7), 12.

    Article  PubMed  PubMed Central  Google Scholar 

  • Lakatos, P., Chen, C.-M., Connell, M. N., Mills, A., & Schroeder, C. E. (2007). Neuronal oscillations and multisensory interaction in primary auditory cortex. Neuron, 53(2), 279–292.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Laurienti, P. J., Kraft, R. A., Maldjian, J. A., Burdette, J. H., & Wallace, M. T. (2004). Semantic congruence is a critical factor in multisensory behavioral performance. Experimental Brain Research, 158(4), 405–414.

    Article  PubMed  Google Scholar 

  • Lee, A. K. C. (2017). Imaging the listening brain. Acoustics Today, 13(3), 35–42.

    Google Scholar 

  • Lovelace, C. T., Stein, B. E., & Wallace, M. T. (2003). An irrelevant light enhances auditory detection in humans: A psychophysical analysis of multisensory integration in stimulus detection. Cognitive Brain Research, 17(2), 447–453.

    Article  PubMed  Google Scholar 

  • Maddox, R. K., & Shinn-Cunningham, B. G. (2012). Influence of task-relevant and task-irrelevant feature continuity on selective auditory attention. Journal of the Association for Research in Otolaryngology, 13(1), 119–129.

    Article  PubMed  Google Scholar 

  • Maddox, R. K., Atilgan, H., Bizley, J. K., & Lee, A. K. C. (2015). Auditory selective attention is enhanced by a task-irrelevant temporally coherent visual stimulus in human listeners. eLife, 4, e04995.

    Article  PubMed Central  Google Scholar 

  • Magnotti, J. F., & Beauchamp, M. S. (2015). The noisy encoding of disparity model of the McGurk effect. Psychonomic Bulletin & Review, 22, 701–709.

    Article  Google Scholar 

  • Magnotti, J. F., & Beauchamp, M. S. (2017). A causal inference model explains perception of the McGurk effect and other incongruent audiovisual speech. PLoS Computional Biology, 13(2), e1005229.

    Article  CAS  Google Scholar 

  • Maiworm, M., Bellantoni, M., Spence, C., & Röder, B. (2012). When emotional valence modulates audiovisual integration. Attention, Perception & Psychophysics, 74(6), 1302–1311.

    Article  Google Scholar 

  • Mallick, D. B., Magnotti, J. F., & Beauchamp, M. S. (2015). Variability and stability in the McGurk effect: Contributions of participants, stimuli, time, and response type. Psychonomic Bulletin & Review, 22(5), 1299–1307.

    Article  Google Scholar 

  • McCormick, D., & Mamassian, P. (2008). What does the illusory-flash look like? Vision Research, 48(1), 63–69.

    Article  PubMed  Google Scholar 

  • McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264(5588), 746–748.

    Article  CAS  PubMed  Google Scholar 

  • Mercier, M. R., Molholm, S., Fiebelkorn, I. C., Butler, J. S., Schwartz, T. H., & Foxe, J. J. (2015). Neuro-oscillatory phase alignment drives speeded multisensory response times: An electro-corticographic investigation. The Journal of Neuroscience, 35(22), 8546–8557.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Meredith, M. A., Nemitz, J. W., & Stein, B. E. (1987). Determinants of multisensory integration in superior colliculus neurons. I. Temporal factors. The Journal of Neuroscience, 7(10), 3215–3229.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Micheyl, C., & Oxenham, A. J. (2010). Objective and subjective psychophysical measures of auditory stream integration and segregation. Journal of the Association for Research in Otolaryngology, 11(4), 709–724.

    Article  PubMed  PubMed Central  Google Scholar 

  • Middlebrooks, J. C., Simon, J. Z., Popper, A. N., & Fay, R. R. (2017). The auditory system at the cocktail party. New York: Springer International.

    Book  Google Scholar 

  • Mishra, J., Martinez, A., Sejnowski, T. J., & Hillyard, S. A. (2007). Early cross-modal interactions in auditory and visual cortex underlie a sound-induced visual illusion. The Journal of Neuroscience, 27(15), 4120–4131.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Mishra, J., Martinez, A., & Hillyard, S. A. (2013). Audition influences color processing in the sound-induced visual flash illusion. Vision Research, 93, 74–79.

    Article  PubMed  Google Scholar 

  • Nahorna, O., Berthommier, F., & Schwartz, J.-L. (2012). Binding and unbinding the auditory and visual streams in the McGurk effect. The Journal of the Acoustical Society of America, 132(2), 1061–1077.

    Article  PubMed  Google Scholar 

  • Nahorna, O., Berthommier, F., & Schwartz, J.-L. (2015). Audio-visual speech scene analysis: Characterization of the dynamics of unbinding and rebinding the McGurk effect. The Journal of the Acoustical Society of America, 137(1), 362–377.

    Article  PubMed  Google Scholar 

  • Nath, A. R., & Beauchamp, M. S. (2012). A neural basis for interindividual differences in the McGurk effect, a multisensory speech illusion. NeuroImage, 59(1), 781–787.

    Article  PubMed  Google Scholar 

  • O’Craven, K. M., Downing, P. E., & Kanwisher, N. (1999). fMRI evidence for objects as the units of attentional selection. Nature, 401(6753), 584–587.

    Article  PubMed  Google Scholar 

  • Odgaard, E. C., Arieh, Y., & Marks, L. E. (2004). Brighter noise: Sensory enhancement of perceived loudness by concurrent visual stimulation. Cognitive, Affective, & Behavorial Neuroscience, 4(2), 127–132.

    Article  Google Scholar 

  • Peelle, J. E., & Sommers, M. S. (2015). Prediction and constraint in audiovisual speech perception. Cortex, 68, 169–181.

    Article  PubMed  PubMed Central  Google Scholar 

  • Raposo, D., Sheppard, J. P., Schrater, P. R., & Churchland, A. K. (2012). Multisensory decision-making in rats and humans. The Journal of Neuroscience, 32(11), 3726–3735.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Rohe, T., & Noppeney, U. (2015). Cortical hierarchies perform Bayesian causal inference in multisensory perception. PLoS Biology, 13(2), e1002073.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Shamma, S. A., Elhilali, M., & Micheyl, C. (2011). Temporal coherence and attention in auditory scene analysis. Trends in Neuroscience, 34(3), 114–123.

    Article  CAS  Google Scholar 

  • Shams, L., Kamitani, Y., & Shimojo, S. (2000). Illusions. What you see is what you hear. Nature, 408(6814), 788.

    Article  CAS  PubMed  Google Scholar 

  • Shinn-Cunningham, B. G., Best, V., & Lee, A. K. C. (2017). Auditory object formation and selection. In J. C. Middlebrooks, J. Z. Simon, A. N. Popper, & R. R. Fay (Eds.), The auditory system at the cocktail party (pp. 7–40). New York: Springer International.

    Chapter  Google Scholar 

  • Spence, C. (2013). Just how important is spatial coincidence to multisensory integration? Evaluating the spatial rule. Annals of the New York Academy of Sciences, 1296(1), 31–49.

    Article  PubMed  Google Scholar 

  • Spence, C., & Driver, J. (2004). Crossmodal space and crossmodal attention. Oxford: Oxford University Press.

    Book  Google Scholar 

  • Spence, C., & McDonald, J. (2004). The crossmodal consequences of the exogenous spatial orienting of attention. In G. A. Calvert, C. Spence, & B. E. Stein (Eds.), The handbook of multisensory processing (pp. 3–25). Cambridge, MA: MIT Press.

    Google Scholar 

  • Stein, B. E., & Stanford, T. R. (2008). Multisensory integration: Current issues from the perspective of the single neuron. Nature Reviews Neuroscience, 9(4), 255–266.

    Article  CAS  PubMed  Google Scholar 

  • Stevenson, R. A., Zemtsov, R. K., & Wallace, M. T. (2012). Individual differences in the multisensory temporal binding window predict susceptibility to audiovisual illusions. Journal of Experimental Psychology: Human Perception and Performance, 38(6), 1517–1529.

    PubMed  Google Scholar 

  • Thelen, A., Talsma, D., & Murray, M. M. (2015). Single-trial multisensory memories affect later auditory and visual object discrimination. Cognition, 138, 148–160.

    Article  PubMed  Google Scholar 

  • Tiippana, K. (2014). What is the McGurk effect? Frontiers in Psychology, 5, 725.

    Article  PubMed  PubMed Central  Google Scholar 

  • Tiippana, K., Andersen, T. S., & Sams, M. (2004). Visual attention modulates audiovisual speech perception. European Journal of Cognitive Psychology, 16(3), 457–472.

    Article  Google Scholar 

  • Treisman, A. (1998). Feature binding, attention and object perception. Philosophical Transactions of the Royal Society B: Biological Sciences, 353(1373), 1295–1306.

    Article  CAS  Google Scholar 

  • Van der Burg, E., Olivers, C. N. L., Bronkhorst, A. W., & Theeuwes, J. (2008). Pip and pop: Nonspatial auditory signals improve spatial visual search. Journal of Experimental Psychology: Human Perception and Performance, 34(5), 1053–1065.

    PubMed  Google Scholar 

  • van Erp, J. B. F., Philippi, T. G., & Werkhoven, P. (2013). Observers can reliably identify illusory flashes in the illusory flash paradigm. Experimental Brain Research, 226(1), 73–79.

    Article  PubMed  Google Scholar 

  • van Wassenhove, V., Grant, K. W., & Poeppel, D. (2007). Temporal window of integration in auditory-visual speech perception. Neuropsychologia, 45(3), 598–607.

    Article  PubMed  Google Scholar 

  • Wallace, M. T., & Stevenson, R. A. (2014). The construct of the multisensory temporal binding window and its dysregulation in developmental disabilities. Neuropsychologia, 64, 105–123.

    Article  PubMed  PubMed Central  Google Scholar 

  • Welch, R. B., & Warren, D. H. (1980). Immediate perceptual response to intersensory discrepancy. Psychological Bulletin, 80(3), 638–667.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adrian K. C. Lee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Lee, A.K.C., Maddox, R.K., Bizley, J.K. (2019). An Object-Based Interpretation of Audiovisual Processing. In: Lee, A., Wallace, M., Coffin, A., Popper, A., Fay, R. (eds) Multisensory Processes. Springer Handbook of Auditory Research, vol 68. Springer, Cham. https://doi.org/10.1007/978-3-030-10461-0_4

Download citation

Publish with us

Policies and ethics