Computational Scene Analysis

  • DeLiang Wang
Part of the Studies in Computational Intelligence book series (SCI, volume 63)


A remarkable achievement of the perceptual system is its scene analysis capability, which involves two basic perceptual processes: the segmentation of a scene into a set of coherent patterns (objects) and the recognition of memorized ones. Although the perceptual system performs scene analysis with apparent ease, computational scene analysis remains a tremendous challenge as foreseen by Frank Rosenblatt. This chapter discusses scene analysis in the field of computational intelligence, particularly visual and auditory scene analysis. The chapter first addresses the question of the goal of computational scene analysis. A main reason why scene analysis is difficult in computational intelligence is the binding problem, which refers to how a collection of features comprising an object in a scene is represented in a neural network. In this context, temporal correlation theory is introduced as a biologically plausible representation for addressing the binding problem. The LEGION network lays a computational foundation for oscillatory correlation, which is a special form of temporal correlation. Recent results on visual and auditory scene analysis are described in the oscillatory correlation framework, with emphasis on real-world scenes. Also discussed are the issues of attention, feature-based versus model-based analysis, and representation versus learning. Finally, the chapter points out that the time dimension and David Marr's framework for understanding perception are essential for computational scene analysis.


Feature Detector Perceptual Organization Scene Analysis Stream Segregation Global Inhibitor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Allen JB (2005) Articulation and intelligibility. Morgan & ClaypoolGoogle Scholar
  2. [2]
    Arbib MA ed (2003) Handbook of brain theory and neural networks. 2nd ed, MIT Press, Cambridge MAzbMATHGoogle Scholar
  3. [3]
    Barlow HB (1972) Single units and cognition: A neurone doctrine for perceptual psychology. Percept 1:371-394 CrossRefGoogle Scholar
  4. [4]
    Biederman I (1987) Recognition-by-component: A theory of human image understanding. Psychol Rev 94:115-147 CrossRefGoogle Scholar
  5. [5]
    Black MJ, Anandan P (1996) The robust estimation of multiple motions: parametric and piecewise-smooth flow fields. CVGIP: Image Understand-ing 63:75-104 CrossRefGoogle Scholar
  6. [6]
    Bregman AS (1990) Auditory scene analysis. MIT Press, Cambridge MAGoogle Scholar
  7. [7]
    Campbell SR, Wang DL, Jayaprakash C (1999) Synchrony and desynchrony in integrate-and-fire oscillators. Neural Comp 11:1595-1619 CrossRefGoogle Scholar
  8. [8]
    Cesmeli E, Wang DL (2000) Motion segmentation based on motion/ brightness integration and oscillatory correlation. IEEE Trans Neural Net 11:935-947 CrossRefGoogle Scholar
  9. [9]
    Chang P (2004) Exploration of behavioral, physiological, and computa-tional approaches to auditory scene analysis. MS Thesis, The Ohio State University Department of Computer Science and Engineering (available at
  10. [10]
    Chen K, Wang DL, Liu X (2000) Weight adaptation and oscillatory correlation for image segmentation. IEEE Trans Neural Net 11:1106-1123CrossRefGoogle Scholar
  11. [11]
    Cherry EC (1953) Some experiments on the recognition of speech, with one and with two ears. J Acoust Soc Am 25:975-979 CrossRefGoogle Scholar
  12. [12]
    Cowan N (2001) The magic number 4 in short-term memory: a reconsideration of mental storage capacity. Behav Brain Sci 24:87-185 CrossRefGoogle Scholar
  13. [13]
    Darwin CJ (1997) Auditory grouping. Trends Cogn Sci 1:327-333 CrossRefGoogle Scholar
  14. [14]
    Domijan D (2004) Recurrent network with large representational capacity. Neural Comp 16:1917-1942 zbMATHCrossRefGoogle Scholar
  15. [15]
    Driver J, Baylis GC (1998) Attention and visual object recognition. In: Parasuraman R (ed) The attentive brain. MIT Press Cambridge MA, pp. 299-326 Google Scholar
  16. [16]
    Duncan J, Humphreys GW (1989) Visual search and stimulus similarity. Psychol Rev, 96:433-458CrossRefGoogle Scholar
  17. [17]
    Fabre-Thorpe M, Delorme A, Marlot C, Thorpe S (2001) A limit to the speed of processing in ultra-rapid visual categorization of novel natural scenes. J Cog Neurosci 13:1-10 CrossRefGoogle Scholar
  18. [18]
    Field DJ, Hayes A, Hess RF (1993) Contour integration by the human visual system: Evidence for a local “association field”. Vis Res 33:173-193CrossRefGoogle Scholar
  19. [19]
    FitzHugh R (1961) Impulses and physiological states in models of nerve membrane. Biophys J 1:445-466 CrossRefGoogle Scholar
  20. [20]
    Fukushima K, Imagawa T (1993) Recognition and segmentation of connected characters with selective attention. Neural Net 6:33-41 CrossRefGoogle Scholar
  21. [21]
    Gibson JJ (1966) The senses considered as perceptual systems. Greenwood Press, Westport CTGoogle Scholar
  22. [22]
    Gold B, Morgan N (2000) Speech and audio signal processing. Wiley & Sons, New YorkGoogle Scholar
  23. [23]
    Gray CM (1999) The temporal correlation hypothesis of visual feature integration: still alive and well. Neuron 24:31-47CrossRefGoogle Scholar
  24. [24]
    Kahneman D, Treisman A, Gibbs B (1992) The reviewing of object files: object-specific integration of information. Cognit Psychol 24:175-219 CrossRefGoogle Scholar
  25. [25]
    Kareev Y (1995) Through a narrow window: Working memory capacity and the detection of covariation. Cognition 56:263-269 CrossRefGoogle Scholar
  26. [26]
    Knill DC, Richards W eds (1996) Perception as Bayesian inference. Cambridge University Press, New YorkzbMATHGoogle Scholar
  27. [27]
    Koffka K (1935) Principles of Gestalt psychology. Harcourt, New York Google Scholar
  28. [28]
    Konen W, von der Malsburg C (1993) Learning to generalize from single examples in the dynamic link architecture. Neural Comp 5:719-735CrossRefGoogle Scholar
  29. [29]
    MacGregor JN (1987) Short-term memory capacity: Limitation or opti-mization? Psychol Rev 94:107-108CrossRefGoogle Scholar
  30. [30]
    Marr D (1982) Vision. Freeman, New YorkGoogle Scholar
  31. [31]
    Mattingley JB, Davis G, Driver J (1997) Preattentive filling-in of visual surfaces in parietal extinction. Science 275:671-674CrossRefGoogle Scholar
  32. [32]
    Milner, PM (1974) A model for visual shape recognition. Psychol Rev 81(6):521-535CrossRefGoogle Scholar
  33. [33]
    Minsky ML, Papert SA (1969) Perceptrons. MIT Press, Cambridge MAzbMATHGoogle Scholar
  34. [34]
    Minsky ML, Papert SA (1988) Perceptrons (Expanded ed). MIT Press, Cambridge MAzbMATHGoogle Scholar
  35. [35]
    Morris C, Lecar H (1981) Voltage oscillations in the barnacle giant muscle fiber. Biophys J 35:193-213CrossRefGoogle Scholar
  36. [36]
    Nagumo J, Arimoto S, Yoshizawa S (1962) An active pulse transmission line simulating nerve axon. Proc IRE 50:2061-2070 CrossRefGoogle Scholar
  37. [37]
    Nakayama K, He ZJ, Shimojo S (1995) Visual surface representation: A critical link between lower-level and higher-level vision. In: Kosslyn SM, Osherson DN (eds) An invitation to cognitive science. MIT Press, Cambridge MA, pp. 1-70 Google Scholar
  38. [38]
    Norris M (2003) Assessment and extension of Wang's oscillatory model of auditory stream segregation. PhD Dissertation, University of Queensland School of Information Technology and Electrical EngineeringGoogle Scholar
  39. [39]
    Olshausen BA, Anderson CH, Van Essen DC (1993) A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J Neurosci 13:4700-4719Google Scholar
  40. [40]
    Palmer SE (1999) Vision science. MIT Press, Cambridge MAGoogle Scholar
  41. [41]
    Parasuraman R ed (1998) The attentive brain. MIT Press, Cambridge MA Google Scholar
  42. [42]
    Pashler HE (1998) The psychology of attention. MIT Press, Cambridge MAGoogle Scholar
  43. [43]
    Reynolds JH, Desimone R (1999) The role of neural mechanisms of attention in solving the binding problem. Neuron 24:19-29 CrossRefGoogle Scholar
  44. [44]
    Riesenhuber M, Poggio T (1999) Are cortical models really bound by the “binding problem”? Neuron 24:87-93 CrossRefGoogle Scholar
  45. [45]
    Roman N, Wang DL, Brown GJ (2003) Speech segregation based on sound localization. J Acoust Soc Am 114:2236-2252 CrossRefGoogle Scholar
  46. [46]
    Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65:386-408 CrossRefMathSciNetGoogle Scholar
  47. [47]
    Rosenblatt F (1962) Principles of neural dynamics. Spartan, New YorkGoogle Scholar
  48. [48]
    Rumelhart DE, McClelland JL eds (1986) Parallel distributed processing 1: Foundations. MIT Press, Cambridge MA Google Scholar
  49. [49]
    Russell S, Norvig P (2003) Artificial intelligence: A modern approach. 2nd ed Prentice Hall, Upper Saddle River, NJ Google Scholar
  50. [50]
    Shadlen MN, Movshon JA (1999) Synchrony unbound: a critical evaluation of the temporal binding hypothesis. Neuron 24:67-77.CrossRefGoogle Scholar
  51. [51]
    Somers D, Kopell N (1993) Rapid synchrony through fast threshold modulation. Biol Cybern, 68:393-407 CrossRefGoogle Scholar
  52. [52]
    Terman D, Wang DL (1995) Global competition and local cooperation in a network of neural oscillators, Physica D 81:148-176 zbMATHCrossRefMathSciNetGoogle Scholar
  53. [53]
    Thorpe S, Fabre-Thorpe M (2003) Fast visual processing. In: Arbib MA (ed) Handbook of Brain Theory and Neural Networks. MIT Press, Cambridge MA, pp. 441-444 Google Scholar
  54. [54]
    Thorpe S, Fize D, Marlot C (1996) Speed of processing in the human visual system. Nature 381:520-522CrossRefGoogle Scholar
  55. [55]
    Treisman A (1986) Features and objects in visual processing. Sci Am, November, Reprinted in The perceptual world, Rock I (ed). Freeman and Company, New York, pp. 97-110 Google Scholar
  56. [56]
    Treisman A (1999) Solutions to the binding problem: progress through controversy and convergence. Neuron 24:105-110 CrossRefGoogle Scholar
  57. [57]
    Treisman A, Gelade G (1980) A feature-integration theory of attention. Cognit Psychol 12:97-136CrossRefGoogle Scholar
  58. [58]
    van der Pol B (1926) On “relaxation oscillations”. Phil Mag 2(11):978-992Google Scholar
  59. [59]
    von der Malsburg C (1981) The correlation theory of brain function. Internal Report 81-2, Max-Planck-Institute for Biophysical Chemistry, Reprinted in Models of neural networks II, Domany E, van Hemmen JL, Schulten K, eds (1994) Springer, BerlinGoogle Scholar
  60. [60]
    von der Malsburg C (1999) The what and why of binding: the modeler's perspective. Neuron 24:95-104CrossRefGoogle Scholar
  61. [61]
    von der Malsburg C, Schneider W (1986) A neural cocktail-party processor. Biol Cybern 54:29-40CrossRefGoogle Scholar
  62. [62]
    Wang DL (1995) Emergent synchrony in locally coupled neural oscillators. IEEE Trans Neural Net 6(4):941-948CrossRefGoogle Scholar
  63. [63]
    Wang DL (1996) Primitive auditory segregation based on oscillatory correlation. Cognit Sci 20:409-456CrossRefGoogle Scholar
  64. [64]
    Wang DL (2000) On connectedness: a solution based on oscillatory correlation. Neural Comp 12:131-139CrossRefGoogle Scholar
  65. [65]
    Wang DL (2005) The time dimension for scene analysis. IEEE Trans Neural Net 16:1401-1426CrossRefGoogle Scholar
  66. [66]
    Wang DL, Brown GJ (1999) Separation of speech from interfering sounds based on oscillatory correlation. IEEE Trans Neural Net 10:684-697 CrossRefGoogle Scholar
  67. [67]
    Wang DL, Kristjansson A, Nakayama K (2005) Efficient visual search without top-down or bottom-up guidance. Percept Psychophys 67:239-253 Google Scholar
  68. [68]
    Wang DL, Terman D (1995) Locally excitatory globally inhibitory oscillator networks. IEEE Trans Neural Net 6(1):283-286 CrossRefMathSciNetGoogle Scholar
  69. [69]
    Wang DL, Terman D (1997) Image segmentation based on oscillatory correlation. Neural Comp 9:805-836 (for errata see Neural Comp 9:1623-1626)CrossRefGoogle Scholar
  70. [70]
    Wersing H, Steil JJ, Ritter H (2001) A competitive-layer model for feature binding and sensory segmentation. Neural Comp 13:357-388 zbMATHCrossRefGoogle Scholar
  71. [71]
    Wertheimer M (1923) Untersuchungen zur Lehre von der Gestalt, II. Psychol Forsch 4:301-350 CrossRefGoogle Scholar
  72. [72]
    Wrigley SN, Brown GJ (2004) A computational model of auditory selective attention. IEEE Trans Neural Net 15:1151-1163 CrossRefGoogle Scholar
  73. [73]
    Yantis S (1998) Control of visual attention. In: Pashler H (ed) Attention.Psychology Press, London, pp. 223-256Google Scholar
  74. [74]
    Yen SC, Finkel LH (1998) Extraction of perceptually salient contours by striate cortical networks. Vis Res 38:719-741 CrossRefGoogle Scholar
  75. [75]
    Zhang X, Minai AA (2004) Temporally sequenced intelligent blockmatching and motion-segmentation using locally coupled networks. IEEE Trans Neural Net 15:1202-1214 CrossRefGoogle Scholar
  76. [76]
    Zhao L, Macau EEN (2001) A network of dynamically coupled chaotic maps for scene segmentation. IEEE Trans Neural Net 12:1375-1385CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • DeLiang Wang
    • 1
  1. 1.Department of Computer Science & Engineering and Center for Cognitive ScienceThe Ohio State UniversityColumbusUSA

Personalised recommendations