Auditory Gist Perception: An Alternative to Attentional Selection of Auditory Streams?

  • Sue Harding
  • Martin Cooke
  • Peter König
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4840)


The idea that the gist of a visual scene is perceived before attention is focused on the details of a particular object is becoming increasingly popular. In the auditory system, on the other hand, it is typically assumed that the sensory signal is first broken down into streams and then attention is applied to select one of the streams. We consider evidence for an alternative: that, in close analogy with the visual system, the gist of an auditory scene is perceived and only afterwards attention is paid to relevant constituents. We find that much experimental evidence is consistent with such a proposal, and we suggest some possibilities for gist representations.


Attention gist perception auditory scene analysis 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Driver, J.: A selective review of selective attention research from the past century. Brit. J. Psychol. 92, 53–78 (2001)CrossRefGoogle Scholar
  2. 2.
    Biederman, I.: Visual object recognition. In: Kosslyn, S.M., Osherson, D.N. (eds.) An Invitation to Cognitive Science: Visual Cognition, 2nd edn., vol. 2, pp. 121–165. MIT Press, Cambridge (1995)Google Scholar
  3. 3.
    Treisman, A.M., Gelade, G.: Feature-integration theory of attention. Cognitive Psychol. 12(1), 97–136 (1980)CrossRefGoogle Scholar
  4. 4.
    Bregman, A.S.: Auditory scene analysis: The perceptual organization of sound. MIT Press, Cambridge, MA (1990)Google Scholar
  5. 5.
    Wolfe, J.M.: Visual memory: What do you know about what you saw? Curr. Biol. 8(9), 303–304 (1998)CrossRefGoogle Scholar
  6. 6.
    Potter, M.C.: Short-term conceptual memory for pictures. J. Exp. Psychol. Hum. L. 2(5), 509–522 (1976)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Simons, D.J.: Current approaches to change blindness. Vis. Cogn. 7(1-3), 1–15 (2000)CrossRefGoogle Scholar
  8. 8.
    Ramachandran, V.S.: Perception of shape from shading. Nature 331(6152), 163–166 (1988)CrossRefGoogle Scholar
  9. 9.
    Enns, J.T., Rensink, R.A.: Influence of scene-based properties on visual-search. Science 247(4943), 721–723 (1990)CrossRefGoogle Scholar
  10. 10.
    Liberman, A.M., Isenberg, D., Rakerd, B.: Duplex perception of cues for stop consonants: Evidence for a phonetic mode. Percept. Psychophys. 30, 133–143 (1981)CrossRefGoogle Scholar
  11. 11.
    Jusczyk, P.W., Luce, P.A.: Speech perception and spoken word recognition: Past and present. Ear and Hearing 23(1), 2–40 (2002)CrossRefGoogle Scholar
  12. 12.
    Slaney, M.: A critique of pure audition. In: Rosenthal, D., Okuno, H. (eds.) Proc. 1st Workshop CASA, IJCAI, Montreal, Canada, pp. 13–18 (1995)Google Scholar
  13. 13.
    Navon, D.: Forest before trees: The precedence of global features in visual perception. Cognitive Psychol. 9, 353–383 (1977)CrossRefGoogle Scholar
  14. 14.
    Rensink, R.A.: The dynamic representation of scenes. Vis. Cogn. 7(1-3), 17–42 (2000)CrossRefGoogle Scholar
  15. 15.
    Hochstein, S., Ahissar, M.: View from the top: Hierarchies and reverse hierarchies in the visual system. Neuron 36(5), 791–804 (2002)CrossRefGoogle Scholar
  16. 16.
    Nelken, I., Ahissar, M.: High-level and low-level processing in the auditory system: The role of primary auditory cortex. In: Divenyi, P., Greenberg, S., Meyer, G. (eds.) Dynamics of speech production and perception. NATO Science Series, I: Life and Behavioural Sciences, vol. 374, pp. 343–353. IOS Press, Amsterdam (2006)Google Scholar
  17. 17.
    Cusack, R., Deeks, J., Aikman, G., Carlyon, R.P.: Effects of location, frequency region, and time course of selective attention on auditory scene analysis. J. Exp. Psychol. Hum. P. 30(4), 643–656 (2004)CrossRefGoogle Scholar
  18. 18.
    Li, F.F., VanRullen, R., Koch, C., Perona, P.: Rapid natural scene categorization in the near absence of attention. P. Natl. Acad. Sci. USA 99(14), 9596–9601 (2002)CrossRefGoogle Scholar
  19. 19.
    Oliva, A.: Gist of the scene. In: Itti, L., Rees, G., Tsotsos, J. (eds.) Neurobiology of Attention, pp. 251–256. Academic Press, Elsevier (2005)Google Scholar
  20. 20.
    Evans, K.K., Treisman, A.: Perception of objects in natural scenes: Is it really attention free? J. Exp. Psychol. Hum. P. 31(6), 1476–1492 (2005)CrossRefGoogle Scholar
  21. 21.
    Schyns, P.G., Oliva, A.: From blobs to boundary edges: evidence for time-scale-dependent and spatial-scale-dependent scene recognition. Psychol. Sci. 5(4), 195–200 (1994)CrossRefGoogle Scholar
  22. 22.
    Rousselet, G.A., Joubert, O.R., Fabre-Thorpe, M.: How long to get to the “gist” of real-world natural scenes? Vis. Cogn. 12(6), 852–877 (2005)CrossRefGoogle Scholar
  23. 23.
    Bransford, J.D., Franks, J.J.: The abstraction of linguistic ideas: A review. Acta Acust. Acust. 1(2-3), 211–249 (1972)Google Scholar
  24. 24.
    Roediger, H.L., McDermott, K.B.: Creating false memories: remembering words not presented in lists. J. Exp. Psychol. Learn. 21(4), 803–814 (1995)CrossRefGoogle Scholar
  25. 25.
    Koutstaal, W., Schacter, D.L.: Gist-based false recognition of pictures in older and younger adults. J. Mem. Lang. 37(4), 555–583 (1997)CrossRefGoogle Scholar
  26. 26.
    Reyna, V.F., Brainerd, C.J.: Fuzzy-trace theory: An interim synthesis. Learn. Individ. Differ. 7(1), 1–75 (1995)CrossRefGoogle Scholar
  27. 27.
    Crick, F., Koch, C.: A framework for consciousness. Nat. Neurosci. 6(2), 119–126 (2003)CrossRefGoogle Scholar
  28. 28.
    Wolfe, J.M.: Inattentional amnesia. In: Coltheart, V. (ed.) Fleeting memories: Cognition of brief visual stimuli, MIT Press, Cambridge, MA (1999)Google Scholar
  29. 29.
    Kahneman, D., Treisman, A., Gibbs, B.J.: The reviewing of object files: object-specific integration of information. Cognitive Psychol. 24(2), 175–219 (1992)CrossRefGoogle Scholar
  30. 30.
    Johnston, J.C., McLelland, J.L.: Perception of letters in words: Seek not and ye shall find. Science 184, 1192–1194 (1974)CrossRefGoogle Scholar
  31. 31.
    Kimchi, R.: Primacy of wholistic processing and global/local paradigm: A critical review. Psychol. Bull. 112(1), 24–38 (1992)CrossRefGoogle Scholar
  32. 32.
    Oliva, A., Schyns, P.G.: Coarse blobs or fine edges? Evidence that information diagnosticity changes the perception of complex visual stimuli. Cognitive Psychol. 34(1), 72–107 (1997)CrossRefGoogle Scholar
  33. 33.
    Levin, D.T., Takarae, Y., Miner, A.G., Keil, F.: Efficient visual search by category: Specifying the features that mark the difference between artifacts and animals in preattentive vision. Percept. Psychophys. 63(4), 676–697 (2001)CrossRefGoogle Scholar
  34. 34.
    Bar, M., Kassam, K.S., Ghuman, A.S., Boshyan, J., Schmidt, A.M., Dale, A.M., Hamalainen, M.S., Marinkovic, K., Schacter, D.L., Rosen, B.R., Halgren, E.: Top-down facilitation of visual recognition. P. Natl. Acad. Sci. USA 103(2), 449–454 (2006)CrossRefGoogle Scholar
  35. 35.
    Rensink, R.A., O’Regan, J.K., Clark, J.J.: To see or not to see: The need for attention to perceive changes in scenes. Psychol. Sci. 8(5), 368–373 (1997)CrossRefGoogle Scholar
  36. 36.
    Mack, A., Rock, I.: Inattentional blindness. MIT Press, Cambridge, MA (1998)Google Scholar
  37. 37.
    Palmer, S.E.: Vision science - photons to phenomenology. MIT Press, Cambridge MA (1999)Google Scholar
  38. 38.
    Browman, C.P., Goldstein, L.: Dynamics and articulatory phonology. In: van Gelder, T., Port, R.F. (eds.) Mind as Motion, MIT Press, Cambridge, MA (1995)Google Scholar
  39. 39.
    Moore, B.C.J.: An introduction to the psychology of hearing, 5th edn. Academic Press, London (2003)Google Scholar
  40. 40.
    Broadbent, D.E.: A note on binaural fusion. Q. J. Exp. Psychol. 7, 46–47 (1955)CrossRefGoogle Scholar
  41. 41.
    Lindblom, B., Brownlee, S., Davis, B., Moon, S.J.: Speech transforms. Speech Commun. 11(4-5), 357–368 (1992)CrossRefGoogle Scholar
  42. 42.
    Green, K.P., Tomiak, G.R., Kuhl, P.K.: The encoding of rate and talker information during phonetic perception. Percept. Psychophys. 59(5), 675–692 (1997)CrossRefGoogle Scholar
  43. 43.
    Carlyon, R.P., Cusack, R., Foxton, J.M., Robertson, I.H.: Effects of attention and unilateral neglect on auditory stream segregation. J. Exp. Psychol. Hum. P. 27(1), 115–127 (2001)CrossRefGoogle Scholar
  44. 44.
    Lippmann, R.P.: Speech recognition by machines and humans. Speech Commun. 22(1), 1–15 (1997)CrossRefGoogle Scholar
  45. 45.
    Shannon, R.V., Zeng, F.-G., Kamath, V., Wygonski, J., Ekelid, M.: Speech recognition with primarily temporal cues. Science 270, 303–304 (1995)CrossRefGoogle Scholar
  46. 46.
    Saberi, K., Perrott, D.R.: Cognitive restoration of reversed speech. Nature 398(6730), 760 (1999)CrossRefGoogle Scholar
  47. 47.
    Warren, R.M.: Perceptual restoration of missing speech sounds. Science 167, 393–395 (1970)CrossRefGoogle Scholar
  48. 48.
    Bailey, P.J., Dorman, M.F., Summerfield, A.Q.: Identification of sine-wave analogs of CV syllables in speech and non-speech modes. J. Acoust. Soc. Am. 61(S(A) (1977)Google Scholar
  49. 49.
    Lecumberri, M.L.G., Cooke, M.P.: Effect of masker type on native and non-native consonant perception in noise. J. Acoust. Soc. Am. 119, 2445–2454 (2006)CrossRefGoogle Scholar
  50. 50.
    Robinson, K., Patterson, R.D.: The stimulus-duration required to identify vowels, their octave, and their pitch chroma. J. Acoust. Soc. Am. 98(4), 1858–1865 (1995)CrossRefGoogle Scholar
  51. 51.
    Robinson, K., Patterson, R.D.: The duration required to identify the instrument, the octave, or the pitch chroma of a musical note. Music Perception 13(1), 1–15 (1995)CrossRefGoogle Scholar
  52. 52.
    Moore, B.C.J., Gockel, H.: Factors influencing sequential stream segregation. Acta Acust. Acust. 88(3), 320–333 (2002)Google Scholar
  53. 53.
    Warren, R.M., Obusek, C.J., Farmer, R.M., Warren, R.P.: Auditory sequence: Confusion of patterns other than speech or music. Science 164(3879), 586 (1969)CrossRefGoogle Scholar
  54. 54.
    Green, D.M.: Temporal acuity as a function of frequency. J. Acoust. Soc. Am. 54, 373–379 (1973)CrossRefGoogle Scholar
  55. 55.
    Jacobsen, T., Schroger, E., Alter, K.: Pre-attentive perception of vowel phonemes from variable speech stimuli. Psychophysiology 41(4), 654–659 (2004)CrossRefGoogle Scholar
  56. 56.
    Tervaniemi, M., Winkler, I., Naatanen, R.: Pre-attentive categorization of sounds by timbre as revealed by event-related potentials. Neuroreport 8(11), 2571–2574 (1997)CrossRefGoogle Scholar
  57. 57.
    Murray, M.M., Camen, C., Andino, S.L.G., Bovet, P., Clarke, S.: Rapid brain discrimination of sounds of objects. J. Neurosci. 26(4), 1293–1302 (2006)CrossRefGoogle Scholar
  58. 58.
    Alain, C., Reinke, K., He, Y., Wang, C.H., Lobaugh, N.: Hearing two things at once: Neurophysiological indices of speech segregation and identification. J. Cognitive Neurosci. 17(5), 811–818 (2005)CrossRefGoogle Scholar
  59. 59.
    Alain, C., Izenberg, A.: Effects of attentional load on auditory scene analysis. J. Cognitive Neurosci. 15(7), 1063–1073 (2003)CrossRefGoogle Scholar
  60. 60.
    Sussman, E.S.: Integration and segregation in auditory scene analysis. J. Acoust. Soc. Am. 117(3), 1285–1298 (2005)CrossRefGoogle Scholar
  61. 61.
    Darwin, C.J.: Auditory grouping. Trends Cogn. Sci. 1(9), 327–333 (1997)CrossRefGoogle Scholar
  62. 62.
    McKeown, J.D., Patterson, R.D.: The time-course of auditory segregation: Concurrent vowels that vary in duration. J. Acoust. Soc. Am. 98(4), 1866–1877 (1995)CrossRefGoogle Scholar
  63. 63.
    Kewley-Port, D.: Vowel formant discrimination II: Effects of stimulus uncertainty, consonantal context, and training. J. Acoust. Soc. Am. 110(4), 2141–2155 (2001)CrossRefGoogle Scholar
  64. 64.
    Lively, S.E., Pisoni, D.B., Yamada, R.A., Tohkura, Y., Yamada, T.: Training Japanese listeners to identify English /r/ and /l/. III. Long-term retention of new phonetic categories. J. Acoust. Soc. Am. 96(4), 2076–2087 (1994)CrossRefGoogle Scholar
  65. 65.
    Cherry, E.C.: Some experiments on the recognition of speech with one and with two ears. J. Acoust. Soc. Am. 25, 975–979 (1953)CrossRefGoogle Scholar
  66. 66.
    Brochard, R., Drake, C., Botte, M.C., McAdams, S.: Perceptual organization of complex auditory sequences: Effect of number of simultaneous subsequences and frequency separation. J. Exp. Psychol. Hum. P. 25(6), 1742–1759 (1999)CrossRefGoogle Scholar
  67. 67.
    Vitevitch, M.S.: Change deafness: The inability to detect changes between two voices. J. Exp. Psychol. Hum. P. 29(2), 333–342 (2003)CrossRefGoogle Scholar
  68. 68.
    Mackay, D.: Aspects of the theory of comprehension, memory and attention. Q. J. Exp. Psychol. 25, 22–40 (1973)CrossRefGoogle Scholar
  69. 69.
    Banks, W.P., Roberts, D., Ciranni, M.: Negative priming in auditory attention. J. Exp. Psychol. Hum. P. 21(6), 1354–1361 (1995)CrossRefGoogle Scholar
  70. 70.
    Rensink, R.A., Enns, J.T.: Preemption effects in visual search: Evidence for low-level grouping. Psychol. Rev. 102(1), 101–130 (1995)CrossRefGoogle Scholar
  71. 71.
    Wolfe, J.M., Bennett, S.C.: Preattentive object files: Shapeless bundles of basic features. Vision Res. 37(1), 25–43 (1997)CrossRefGoogle Scholar
  72. 72.
    Oliva, A., Torralba, A.: Building the gist of a scene: the role of global image features in recognition. In: Martinez-Conde, Macknik, Martinez, Alonso, Tze (eds.) Progress in Brain Research, vol. 155, pp. 23–36. Elsevier, Amsterdam (2006)Google Scholar
  73. 73.
    Siagian, C., Itti, L.: Rapid biologically-inspired scene classification using features shared with visual attention. IEEE T. Patt. Anal. Mach. Intell. 29(2), 300–312 (2007)CrossRefGoogle Scholar
  74. 74.
    Cooke, M.: A glimpsing model of speech perception in noise. J. Acoust. Soc. Am. 119(3), 1562–1573 (2006)CrossRefGoogle Scholar
  75. 75.
    Harding, S.M.: Multi-resolution auditory scene analysis for speech perception: experimental evidence and a model. PhD thesis, Keele University (2003)Google Scholar
  76. 76.
    Mesgarani, N., Slaney, M., Shamma, S.A.: Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations. IEEE T. Audi. Speech. Lang. P. 14(3), 920–930 (2006)CrossRefGoogle Scholar
  77. 77.
    Itti, L., Koch, C.: Computational modelling of visual attention. Nat. Rev. Neurosci. 2(3), 194–203 (2001)CrossRefGoogle Scholar
  78. 78.
    Torralba, A.: Modeling global scene factors in attention. J. Opt. Soc. Am. A 20(7), 1407–1418 (2003)CrossRefGoogle Scholar
  79. 79.
    Kayser, C., Petkov, C.I., Lippert, M., Logothetis, N.K.: Mechanisms for allocating auditory attention: An auditory saliency map. Curr. Biol. 15(21), 1943–1947 (2005)CrossRefGoogle Scholar
  80. 80.
    Cusack, R., Carlyon, R.P.: Perceptual asymmetries in audition. J. Exp. Psychol. Hum. P. 29(3), 713–725 (2003)CrossRefGoogle Scholar
  81. 81.
    Fecteau, J.H., Munoz, D.P.: Salience, relevance, and firing: a priority map for target selection. Trends Cogn. Sci. 10(8), 382–390 (2006)CrossRefGoogle Scholar
  82. 82.
    Laidler, J., Cooke, M., Lawrence, N.: Model-driven detection of clean speech patches in noise. In: Proc. Interspeech, Antwerp (2007)Google Scholar
  83. 83.
    Cooke, M.: Auditory organisation and speech perception: Arguments for an integrated computational theory. In: Ainsworth, W., Greenberg, S. (eds.) Proc. ESCA Workshop Aud. Basis Speech Perc., Keele, Worth Printing Ltd, pp. 186–193 (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Sue Harding
    • 1
  • Martin Cooke
    • 1
  • Peter König
    • 2
  1. 1.Speech and Hearing Research Group, Department of Computer Science, University of Sheffield, 211 Portobello Street, Sheffield S1 4DPUK
  2. 2.Neurobiopsychologie Labor, Institut für Kognitionswissenschaft, Universität Osnabrück, Albrechtstraße 28, 49069 OsnabrückGermany

Personalised recommendations