An oscillatory neural network model that demonstrates the benefits of multisensory learning
Since the world consists of objects that stimulate multiple senses, it is advantageous for a vertebrate to integrate all the sensory information available. However, the precise mechanisms governing the temporal dynamics of multisensory processing are not well understood. We develop a computational modeling approach to investigate these mechanisms. We present an oscillatory neural network model for multisensory learning based on sparse spatio-temporal encoding. Recently published results in cognitive science show that multisensory integration produces greater and more efficient learning. We apply our computational model to qualitatively replicate these results. We vary learning protocols and system dynamics, and measure the rate at which our model learns to distinguish superposed presentations of multisensory objects. We show that the use of multiple channels accelerates learning and recall by up to 80%. When a sensory channel becomes disabled, the performance degradation is less than that experienced during the presentation of non-congruent stimuli. This research furthers our understanding of fundamental brain processes, paving the way for multiple advances including the building of machines with more human-like capabilities.
KeywordsOscillatory neural networks Synchronization Binding Multisensory processing Learning Audio–visual processing
The author greatly appreciates helpful comments from the reviewers, which improved this manuscript.
- Coco M, Badino L, Cipresso P, Chirico A, Ferrari E, Riva G, Gaggioli A, D’Ausilio A (2016) Multilevel behavioral synchronisation in a joint tower-building task. IEEE Trans Cognit Dev Syst 99:1–1Google Scholar
- Darrell T, Fisher Iii JW, Viola P (2000) Audio-visual segmentation and the cocktail party effect. In: Advances in multimodal interfaces ICMI 2000. Springer, pp 32–40Google Scholar
- Davis ET, Scott K, Pair J, Hodges LF, Oliverio J (1999) Can audio enhance visual perception and performance in a virtual environment? In: Proceedings of the human factors and ergonomics society annual meeting, vol. 43, no. 22. SAGE Publications, pp 1197–1201Google Scholar
- Feng Y, Lapata M (2010) Visual information in semantic representation. In: Human language technologies: the, (2010) annual conference of the north American chapter of the association for computational linguistics. Association for Computational Linguistics, pp 91–99Google Scholar
- Jamone L, Ugur E, Cangelosi A, Fadiga L, Bernardino A, Piater J, Santos-Victor J (2016) Affordances in psychology, neuroscience and robotics: a survey. IEEE Trans Cognit Dev Syst 99:1–1Google Scholar
- Rao AR, Cecchi G (2013) Multi-sensory integration using sparse spatio-temporal encoding. In: Neural networks (IJCNN), The 2013 international joint conference on. IEEE, pp 1–8Google Scholar
- Socher R, Lin CC, Manning C, Ng AY, (2011) Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 129–136Google Scholar
- Van Rullen R (2017) Perception science in the age of deep neural networks, Frontiers in Psychology, vol. 8, p. 142, [Online]. Available: http://journal.frontiersin.org/article/10.3389/fpsyg.2017.00142