Skip to main content

Modeling the Cocktail Party Problem

  • Chapter
  • First Online:
The Auditory System at the Cocktail Party

Part of the book series: Springer Handbook of Auditory Research ((SHAR,volume 60))

Abstract

Modeling the cocktail party problem entails developing a computational framework able to describe what the auditory system does when faced with a complex auditory scene. While completely intuitive and omnipresent in humans and animals alike, translating this remarkable ability into a quantitative model remains a challenge. This chapter touches on difficulties facing the field in terms of defining the theoretical principles that govern auditory scene analysis, as well as reconciling current knowledge about perceptual and physiological data with their formulation into computational models. The chapter reviews some of the computational theories, algorithmic strategies, and neural infrastructure proposed in the literature for developing information systems capable of processing multisource sound inputs. Because of divergent interests from various disciplines in the cocktail party problem, the body of literature modeling this effect is equally diverse and multifaceted. The chapter touches on the various approaches used in modeling auditory scene analysis from biomimetic models to strictly engineering systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Akeroyd, M. A., Carlyon, R. P., & Deeks, J. M. (2005). Can dichotic pitches form two streams? The Journal of the Acoustical Society of America, 118(2), 977–981.

    Article  PubMed  Google Scholar 

  • Alais, D., Blake, R., & Lee, S. H. (1998). Visual features that vary together over time group together over space. Nature Neuroscience, 1(2), 160–164.

    Article  CAS  PubMed  Google Scholar 

  • Alinaghi, A., Jackson, P. J., Liu, Q., & Wang, W. (2014). Joint mixing vector and binaural model based stereo source separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(9), 1434–1448.

    Article  Google Scholar 

  • Almajai, I., & Milner, B. (2011). Visually derived wiener filters for speech enhancement. IEEE Transactions on Audio, Speech and Language Processing, 19(6), 1642–1651.

    Article  Google Scholar 

  • Anemuller, J., Bach, J., Caputo, B., Havlena, M., et al. (2008). The DIRAC AWEAR audio-visual platform for detection of unexpected and incongruent events. In International Conference on Multimodal Interaction, (pp. 289–293).

    Google Scholar 

  • Arbogast, T. L., Mason, C. R., & Kidd, G. (2002). The effect of spatial separation on informational and energetic masking of speech. The Journal of the Acoustical Society of America, 112(5 Pt 1), 2086–2098.

    Article  PubMed  Google Scholar 

  • Aubin, T. (2004). Penguins and their noisy world. Annals of the Brazilian Academy of Sciences, 76(2), 279–283.

    Article  Google Scholar 

  • Bandyopadhyay, S., & Young, E. D. (2013). Nonlinear temporal receptive fields of neurons in the dorsal cochlear nucleus. Journal of Neurophysiology, 110(10), 2414–2425.

    Article  PubMed  PubMed Central  Google Scholar 

  • Barchiesi, D., Giannoulis, D., Stowell, D., & Plumbley, M. D. (2015). Acoustic scene classification: Classifying environments from the sounds they produce. IEEE Signal Processing Magazine, 32(3), 16–34.

    Article  Google Scholar 

  • Bastos, A. M., Usrey, W. M., Adams, R. A., Mangun, G. R., et al. (2012). Canonical microcircuits for predictive coding. Neuron, 76(4), 695–711.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Beauvois, M. W., & Meddis, R. (1996). Computer simulation of auditory stream segregation in alternating-tone sequences. The Journal of the Acoustical Society of America, 99(4), 2270–2280.

    Article  CAS  PubMed  Google Scholar 

  • Bell, A. J., & Sejnowski, T. J. (1995). An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 7(6), 1129–1159.

    Article  CAS  PubMed  Google Scholar 

  • Bizley, J. K., & Cohen, Y. E. (2013). The what, where and how of auditory-object perception. Nature Reviews Neuroscience, 14(10), 693–707.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Blake, R., & Lee, S. H. (2005). The role of temporal structure in human vision. Behavioral and Cognitive Neuroscience Review, 4(1), 21–42.

    Article  Google Scholar 

  • Bregman, A. S. (1981). Asking the ‘what for’ question in auditory perception. In M. Kubovy & J. Pomerantz (Eds.), Perceptual organization (pp. 99–118). Hillsdale, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Bregman, A. S. (1990). Auditory scene analysis: The perceptual organization of sound. Cambridge, MA: MIT Press.

    Google Scholar 

  • Bregman, A. S., & Campbell, J. (1971). Primary auditory stream segregation and perception of order in rapid sequences of tones. Journal of Experimental Psychology, 89(2), 244–249.

    Article  CAS  PubMed  Google Scholar 

  • Brown, G. J., & Cooke, M. (1994). Computational auditory scene analysis. Computer Speech & Language, 8(4), 297–336.

    Article  Google Scholar 

  • Brown, G. J., & Cooke, M. (1998). Temporal synchronization in a neural oscillator model of primitive auditory stream segregation. In D. L. Wang & G. Brown (Eds.), Computational auditory scene analysis (pp. 87–103). London: Lawrence Erlbaum Associates.

    Google Scholar 

  • Brown, G. J., Barker, J., & Wang, D. (2001). A neural oscillator sound separator for missing data speech recognition. In Proceedings of International Joint Conference on Neural Networks, 2001 (IJCNN ’01) (Vol. 4, pp. 2907–2912).

    Google Scholar 

  • Buxton, H. (2003). Learning and understanding dynamic scene activity: A review. Image and Vision Computing, 21(1), 125–136.

    Article  Google Scholar 

  • Carlyon, R. P. (2004). How the brain separates sounds. Trends in Cognitive Sciences, 8(10), 465–471.

    Article  PubMed  Google Scholar 

  • Carlyon, R. P., Cusack, R., Foxton, J. M., & Robertson, I. H. (2001). Effects of attention and unilateral neglect on auditory stream segregation. Journal of Experimental Psychology: Human Perception and Performance, 27(1), 115–127.

    CAS  PubMed  Google Scholar 

  • Chen, F., & Jokinen, K. (Eds.). (2010). Speech technology: Theory and applications. New York: Springer Science+Business Media.

    Google Scholar 

  • Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and with two ears. The Journal of the Acoustical Society of America, 25(5), 975–979.

    Article  Google Scholar 

  • Cherry, E. C. (1957). On human communication. Cambridge, MA: MIT Press.

    Google Scholar 

  • Christison-Lagay, K. L., Gifford, A. M., & Cohen, Y. E. (2015). Neural correlates of auditory scene analysis and perception. International Journal of Psychophysiology, 95(2), 238–245.

    Article  PubMed  Google Scholar 

  • Ciocca, V. (2008). The auditory organization of complex sounds. Frontiers in Bioscience, 13, 148–169.

    Article  PubMed  Google Scholar 

  • Cisek, P., Drew, T., & Kalaska, J. (Eds.). (2007). Computational neuroscience: Theoretical insights into brain function. Philadelphia: Elsevier.

    Google Scholar 

  • Colburn, H. S., & Kulkarni, A. (2005). Models of sound localization. In A. N. Popper & R. R. Fay (Eds.), Sound source localization (pp. 272–316). New York: Springer Science+Business Media.

    Chapter  Google Scholar 

  • Collins, N. (2009). Introduction to computer music. Hoboken, NJ: Wiley.

    Google Scholar 

  • Cooke, M., & Ellis, D. (2001). The auditory organization of speech and other sources in listeners and computational models. Speech Communication, 35, 141–177.

    Article  Google Scholar 

  • Cusack, R., & Roberts, B. (1999). Effects of similarity in bandwidth on the auditory sequential streaming of two-tone complexes. Perception, 28(10), 1281–1289.

    Article  CAS  PubMed  Google Scholar 

  • Cusack, R., & Roberts, B. (2000). Effects of differences in timbre on sequential grouping. Perception and Psychophysics, 62(5), 1112–1120.

    Article  CAS  PubMed  Google Scholar 

  • Darwin, C. J., & Carlyon, R. P. (1995). Auditory grouping. In B. C. J. Moore (Ed.), Hearing (pp. 387–424). Orlando, FL: Academic Press.

    Chapter  Google Scholar 

  • Darwin, C. J., & Hukin, R. W. (1999). Auditory objects of attention: The role of interaural time differences. Journal of Experimental Psychology: Human Perception and Performance, 25(3), 617–629.

    CAS  PubMed  Google Scholar 

  • deCharms, R. C., Blake, D. T., & Merzenich, M. M. (1998). Optimizing sound features for cortical neurons. Science, 280(5368), 1439–1443.

    Google Scholar 

  • Deng, L., Li, J., Huang, J., Yao, K., et al. (2013). Recent advances in deep learning for speech research at Microsoft. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada, May 26–31, 2013 (pp. 8604–8608).

    Google Scholar 

  • Depireux, D. A., Simon, J. Z., Klein, D. J., & Shamma, S. A. (2001). Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. Journal of Neurophysiology, 85(3), 1220–1234.

    CAS  PubMed  Google Scholar 

  • Doclo, S., & Moonen, M. (2003). adaptive. EURASIP Journal of Applied Signal Processing, 11, 1110–1124.

    Article  Google Scholar 

  • Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern classification. Hoboken, NJ: Wiley.

    Google Scholar 

  • Eggermont, J. J. (2013). The STRF: Its origin, evolution and current application. In D. Depireux & M. Elhilali (Eds.), Handbook of modern techniques in auditory cortex (pp. 1–32). Hauppauge, NY: Nova Science Publishers.

    Google Scholar 

  • Elhilali, M. (2013). Bayesian inference in auditory scenes. In Proceedings of the 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Osaka, Japan, (pp. 2792–2795).

    Google Scholar 

  • Elhilali, M., & Shamma, S. A. (2008). A cocktail party with a cortical twist: How cortical mechanisms contribute to sound segregation. The Journal of the Acoustical Society of America, 124(6), 3751–3771.

    Article  PubMed  PubMed Central  Google Scholar 

  • Elhilali, M., Ma, L., Micheyl, C., Oxenham, A. J., & Shamma, S. A. (2009). Temporal coherence in the perceptual organization and cortical representation of auditory scenes. Neuron, 61(2), 317–329.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Elhilali, M., Ma, L., Micheyl, C., Oxenham, A., & Shamma, S. (2010). Rate vs. temporal code? A spatio-temporal coherence model of the cortical basis of streaming. In E. Lopez-Poveda, A. Palmer & R. Meddis (Eds.), Auditory physiology, perception and models (pp. 497–506). New York: Springer Science+Business Media.

    Google Scholar 

  • Elhilali, M., Shamma, S. A., Simon, J. Z., & Fritz, J. B. (2013). A linear systems view to the concept of STRF. In D. Depireux & M. Elhilali (Eds.), Handbook of modern techniques in auditory cortex (pp. 33–60). Hauppauge, NY: Nova Science Publishers.

    Google Scholar 

  • Escabi, M. A., & Schreiner, C. E. (2002). Nonlinear spectrotemporal sound analysis by neurons in the auditory midbrain. The Journal of Neuroscience, 22(10), 4114–4131.

    CAS  PubMed  Google Scholar 

  • Farmani, M., Pedersen, M. S., Tan, Z. H., & Jensen, J. (2015). On the influence of microphone array geometry on HRTF-based sound source localization. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (pp. 439–443).

    Google Scholar 

  • Friston, K. J. (2010). The free-energy principle: A unified brain theory? Nature Reviews Neuroscience, 11(2), 127–138.

    Article  CAS  PubMed  Google Scholar 

  • Fritz, J. B., Elhilali, M., David, S. V., & Shamma, S. A. (2007). Auditory attention–focusing the searchlight on sound. Current Opinion in Neurobiology, 17(4), 437–455.

    Article  CAS  PubMed  Google Scholar 

  • Gilkey, R., & Anderson, T. R. (Eds.). (2014). Binaural and spatial hearing in real and virtual environments. New York: Psychology Press.

    Google Scholar 

  • Gockel, H., Carlyon, R. P., & Micheyl, C. (1999). Context dependence of fundamental-frequency discrimination: Lateralized temporal fringes. The Journal of the Acoustical Society of America, 106(6), 3553–3563.

    Article  CAS  PubMed  Google Scholar 

  • Grimault, N., Bacon, S. P., & Micheyl, C. (2002). Auditory stream segregation on the basis of amplitude-modulation rate. The Journal of the Acoustical Society of America, 111(3), 1340–1348.

    Article  PubMed  Google Scholar 

  • Hartmann, W., & Johnson, D. (1991). Stream segregation and peripheral channeling. Music Perception, 9(2), 155–184.

    Article  Google Scholar 

  • Haykin, S., & Chen, Z. (2005). The cocktail party problem. Neural Computation, 17(9), 1875–1902.

    Article  PubMed  Google Scholar 

  • Herbrich, R. (2001). Learning kernel classifiers: Theory and algorithms. Cambridge, MA: MIT Press.

    Google Scholar 

  • Hinton, G., Deng, L., Yu, D., Dahl, G. E., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, IEEE, 29(6), 82–97.

    Article  Google Scholar 

  • Hyvarinen, A., Karhunen, J., & Oja, E. (2001). Independent component analysis. Hoboken, NJ: Wiley.

    Book  Google Scholar 

  • Itatani, N., & Klump, G. M. (2011). Neural correlates of auditory streaming of harmonic complex sounds with different phase relations in the songbird forebrain. Journal of Neurophysiology, 105(1), 188–199.

    Article  PubMed  Google Scholar 

  • Izumi, A. (2002). Auditory stream segregation in Japanese monkeys. Cognition, 82(3), B113–B122.

    Article  PubMed  Google Scholar 

  • Jadhav, S. D., & Bhalchandra, A. S. (2008). Blind source separation: Trends of new age—a review. In IET International Conference on Wireless, Mobile and Multimedia Networks, 2008, Mumbai, India, January 11–12, 2008 (pp. 251–254).

    Google Scholar 

  • Jang, G. J., & Lee, T. W. (2003). A maximum likelihood approach to single-channel source separation. Journal of Machine Learning Research, 4(7–8), 1365–1392.

    Google Scholar 

  • Jeffress, L. A. (1948). A place theory of sound localization. Journal of Comparative and Physiological Psychology, 41(1), 35–39.

    Article  CAS  PubMed  Google Scholar 

  • Jutten, C., & Karhunen, J. (2004). Advances in blind source separation (BSS) and independent component analysis (ICA) for nonlinear mixtures. International Journal of Neural Systems, 14(5), 267–292.

    Article  PubMed  Google Scholar 

  • Kaya, E. M., & Elhilali, M. (2013). Abnormality detection in noisy biosignals. In Proceedings of the 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Osaka, Japan (pp. 3949–3952).

    Google Scholar 

  • Kaya, E. M., & Elhilali, M. (2014). Investigating bottom-up auditory attention. Frontiers in Human Neuroscience, 8(327), doi:10.3389/fnhum.2014.00327

  • Kilgard, M. P., Pandya, P. K., Vazquez, J., Gehi, A., et al. (2001). Sensory input directs spatial and temporal plasticity in primary auditory cortex. Journal of Neurophysiology, 86(1), 326–338.

    CAS  PubMed  Google Scholar 

  • Klein, D. J., Depireux, D. A., Simon, J. Z., & Shamma, S. A. (2000). Robust spectrotemporal reverse correlation for the auditory system: Optimizing stimulus design. Journal of Computational Neuroscience, 9(1), 85–111.

    Article  CAS  PubMed  Google Scholar 

  • Klein, D. J., Konig, P., & Kording, K. P. (2003). Sparse spectrotemporal coding of sounds. EURASIP Journal on Applied Signal Processing, 2003(7), 659–667.

    Article  Google Scholar 

  • Korenberg, M., & Hunter, I. (1996). The identification of nonlinear biological systems: Volterra kernel approaches. Annals of Biomedical Engineering, 24(4), 250–268.

    Article  Google Scholar 

  • Krim, H., & Viberg, M. (1996). Two decades of array signal processing research: The parametric approach. IEEE Signal Processing Magazine, 13(4), 67–94.

    Article  Google Scholar 

  • Krishnan, L., Elhilali, M., & Shamma, S. (2014). Segregating complex sound sources through temporal coherence. PLoS Computational Biology, 10(12), e1003985.

    Article  PubMed  PubMed Central  Google Scholar 

  • Kristjansson, T., Hershey, J., Olsen, P., Rennie, S., & Gopinath, R. (2006). Super-human multi-talker speech recognition: The IBM 2006 speech separation challenge system. In International Conference on Spoken Language Processing, Pittsburgh, PA, September 17–21, 2006.

    Google Scholar 

  • Lakatos, P., Shah, A. S., Knuth, K. H., Ulbert, I., et al. (2005). An oscillatory hierarchy controlling neuronal excitability and stimulus processing in the auditory cortex. Journal of Neurophysiology, 94(3), 1904–1911.

    Article  PubMed  Google Scholar 

  • Lee, T. S., & Mumford, D. (2003). Hierarchical bayesian inference in the visual cortex. Journal of the Optical Society of America, 20(7), 1434–1448.

    Article  PubMed  Google Scholar 

  • Le Roux, J., Hershey, J. R., & Weninger. F. (2015). Deep NMF for speech separation. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, April 19–24, 2015 (pp. 66–70).

    Google Scholar 

  • Lewicki, M. S., Olshausen, B. A., Surlykke, A., & Moss, C. F. (2014). Scene analysis in the natural environment. Frontiers in Psychology, 5, 199.

    PubMed  PubMed Central  Google Scholar 

  • Loizou, P. C. (2013). Speech enhancement: Theory and practice (2nd ed.). Boca Raton, FL: CRC Press.

    Google Scholar 

  • Lu, T., Liang, L., & Wang, X. (2001). Temporal and rate representations of time-varying signals in the auditory cortex of awake primates. Nature Neuroscience, 4(11), 1131–1138.

    Article  CAS  PubMed  Google Scholar 

  • Macken, W. J., Tremblay, S., Houghton, R. J., Nicholls, A. P., & Jones, D. M. (2003). Does auditory streaming require attention? Evidence from attentional selectivity in short-term memory. Journal of Experimental Psychology: Human Perception and Performance, 29(1), 43–51.

    PubMed  Google Scholar 

  • Madhu, N., & Martin, R. (2011). A versatile framework for speaker separation using a model-based speaker localization approach. IEEE Transactions on Audio, Speech and Language Processing, 19(7), 1900–1912.

    Article  Google Scholar 

  • Marin-Hurtado, J. I., Parikh, D. N., & Anderson, D. V. (2012). Perceptually inspired noise-reduction method for binaural hearing aids. IEEE Transactions on Audio, Speech and Language Processing, 20(4), 1372–1382.

    Article  Google Scholar 

  • Marr, D. (1982). Vision. San Francisco: Freeman and Co.

    Google Scholar 

  • McCabe, S. L., & Denham, M. J. (1997). A model of auditory streaming. The Journal of the Acoustical Society of America, 101(3), 1611–1621.

    Article  Google Scholar 

  • Mesgarani, N., & Chang, E. F. (2012). Selective cortical representation of attended speaker in multi-talker speech perception. Nature, 485(7397), 233–236.

    Article  CAS  PubMed  Google Scholar 

  • Micheyl, C., Carlyon, R. P., Gutschalk, A., Melcher, J. R., et al. (2007). The role of auditory cortex in the formation of auditory streams. Hearing Research, 229(1–2), 116–131.

    Article  PubMed  PubMed Central  Google Scholar 

  • Micheyl, C., Hanson, C., Demany, L., Shamma, S., & Oxenham, A. J. (2013). Auditory stream segregation for alternating and synchronous tones. Journal of Experimental Psychology: Human Perception and Performance, 39(6), 1568–1580.

    PubMed  PubMed Central  Google Scholar 

  • Middlebrooks, J. C., Dykes, R. W., & Merzenich, M. M. (1980). Binaural response-specific bands in primary auditory cortex (AI) of the cat: Topographical organization orthogonal to isofrequency contours. Brain Research, 181(1), 31–48.

    Article  CAS  PubMed  Google Scholar 

  • Mill, R. W., Bohm, T. M., Bendixen, A., Winkler, I., & Denham, S. L. (2013). Modelling the emergence and dynamics of perceptual organisation in auditory streaming. PLoS Computational Biology, 9(3), e1002925.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Miller, L. M., Escabi, M. A., Read, H. L., & Schreiner, C. E. (2002). Spectrotemporal receptive fields in the lemniscal auditory thalamus and cortex. Journal of Neurophysiology, 87(1), 516–527.

    PubMed  Google Scholar 

  • Ming, J., Srinivasan, R., Crookes, D., & Jafari, A. (2013). CLOSE—A data-driven approach to speech separation. IEEE Transactions on Audio, Speech and Language Processing, 21(7), 1355–1368.

    Article  Google Scholar 

  • Mirbagheri, M., Akram, S., & Shamma, S. (2012). An auditory inspired multimodal framework for speech enhancement. In Proceedings of the 13th Annual Conference of the International Speech Communication Association (INTERSPEECH), Portland, OR.

    Google Scholar 

  • Moore, B. C. J., & Gockel, H. (2002). Factors influencing sequential stream segregation. Acta Acustica, 88, 320–333.

    Google Scholar 

  • Mumford, D. (1992). On the computational architecture of the neocortex. II. The role of cortico-cortical loops. Biological Cybernetics, 66(3), 241–251.

    Article  CAS  PubMed  Google Scholar 

  • Naik, G., & Wang, W. (Eds.). (2014). Blind source separation: Advances in theory, algorithms and applications. Berlin/Heidelberg: Springer-Verlag.

    Google Scholar 

  • Nelken, I. (2004). Processing of complex stimuli and natural scenes in the auditory cortex. Current Opinion in Neurobiology, 14(4), 474–480.

    Article  CAS  PubMed  Google Scholar 

  • Nelken, I., & Bar-Yosef, O. (2008). Neurons and objects: The case of auditory cortex. Frontiers in Neuroscience, 2(1), 107–113.

    Article  PubMed  PubMed Central  Google Scholar 

  • Parsons, T. W. (1976). Separation of speech from interfering speech by means of harmonic selection. The Journal of the Acoustical Society of America, 60(4), 911–918.

    Article  Google Scholar 

  • Patil, K., & Elhilali, M. (2013). Multiresolution auditory representations for scene recognition. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, October 20–23, 2013.

    Google Scholar 

  • Poggio, T. (2012). The levels of understanding framework, revised. Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2012–014. Cambridge, MA: Massachusetts Institute of Technology.

    Google Scholar 

  • Pressnitzer, D., Sayles, M., Micheyl, C., & Winter, I. M. (2008). Perceptual organization of sound begins in the auditory periphery. Current Biology, 18(15), 1124–1128.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Rabiner, L., & Juang, B. (1993). Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice Hall.

    Google Scholar 

  • Rao, R. P. (2005). Bayesian inference and attentional modulation in the visual cortex. NeuroReport, 16(16), 1843–1848.

    Article  PubMed  Google Scholar 

  • Rao, R. P., & Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1), 79–87.

    Article  CAS  PubMed  Google Scholar 

  • Riesenhuber, M., & Poggio, T. (2002). Neural mechanisms of object recognition. Current Opinion in Neurobiology, 12(2), 162–168.

    Article  CAS  PubMed  Google Scholar 

  • Roberts, B., Glasberg, B. R., & Moore, B. C. (2002). Primitive stream segregation of tone sequences without differences in fundamental frequency or passband. The Journal of the Acoustical Society of America, 112(5), 2074–2085.

    Article  PubMed  Google Scholar 

  • Roweis, S. T. (2001). One microphone source separation. Advances in Neural Information Processing Systems, 13, 793–799.

    Google Scholar 

  • Schreiner, C. E. (1998). Spatial distribution of responses to simple and complex sounds in the primary auditory cortex. Audiology and Neuro-Otology, 3(2–3), 104–122.

    Article  CAS  PubMed  Google Scholar 

  • Schreiner, C. E., & Sutter, M. L. (1992). Topography of excitatory bandwidth in cat primary auditory cortex: Single-neuron versus multiple-neuron recordings. Journal of Neurophysiology, 68(5), 1487–1502.

    CAS  PubMed  Google Scholar 

  • Schroger, E., Bendixen, A., Denham, S. L., Mill, R. W., et al. (2014). Predictive regularity representations in violation detection and auditory stream segregation: From conceptual to computational models. Brain Topography, 27(4), 565–577.

    Article  PubMed  Google Scholar 

  • Shamma, S., & Fritz, J. (2014). Adaptive auditory computations. Current Opinion in Neurobiology, 25, 164–168.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Shamma, S. A., Elhilali, M., & Micheyl, C. (2011). Temporal coherence and attention in auditory scene analysis. Trends in Neurosciences, 34(3), 114–123.

    Article  CAS  PubMed  Google Scholar 

  • Sharpee, T. O., Atencio, C. A., & Schreiner, C. E. (2011). Hierarchical representations in the auditory cortex. Current Opinion in Neurobiology, 21(5), 761–767.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Sheft, S. (2008). Envelope processing and sound-source perception. In W. A. Yost, A. Popper, & R. R. Fay (Eds.), Auditory perception of sound sources (pp. 233–280). New York: Springer Science+Business Media.

    Google Scholar 

  • Shinn-Cunningham, B. G. (2008). Object-based auditory and visual attention. Trends in Cognitive Sciences, 12(5), 182–186.

    Article  PubMed  PubMed Central  Google Scholar 

  • Simpson, A. J. (2015). Probabilistic binary-mask cocktail-party source separation in a convolutional deep neural network. arXiv Preprint arXiv:1503.06962.

    Google Scholar 

  • Souden, M., Araki, S., Kinoshita, K., Nakatani, T., & Sawada, H. (2013). A multichannel MMSE-based framework for speech source separation and noise reduction. IEEE Transactions on Audio, Speech and Language Processing, 21(9), 1913–1928.

    Article  Google Scholar 

  • Stern, R., Brown, G., & Wang, D. L. (2005). Binaural sound localization. In D. L. Wang & G. Brown (Eds.), Computational auditory scene analysis: Principles, algorithms and applications (pp. 147–186). Hoboken, NJ: Wiley-IEEE Press.

    Google Scholar 

  • Suga, N., Yan, J., & Zhang, Y. (1997). Cortical maps for hearing and egocentric selection for self-organization. Trends in Cognitive Sciences, 1(1), 13–20.

    Article  CAS  PubMed  Google Scholar 

  • Sussman, E. S., Horvath, J., Winkler, I., & Orr, M. (2007). The role of attention in the formation of auditory streams. Perception and Psychophysics, 69(1), 136–152.

    Article  PubMed  Google Scholar 

  • Trahiotis, C., Bernstein, L. R., Stern, R. M., & Buel, T. N. (2005). Interaural correlation as the basis of a working model of binaural processing: An introduction. In A. N. Popper & R. R. Fay (Eds.), Sound source localization (pp. 238–271). New York: Springer Science+Business Media.

    Chapter  Google Scholar 

  • van der Kouwe, A. W., Wang, D. L., & Brown, G. J. (2001). A comparison of auditory and blind separation techniques for speech segregation. IEEE Transactions on Speech and Audio Processing, 9(3), 189–195.

    Article  Google Scholar 

  • van Noorden, L. P. A. S. (1975). Temporal coherence in the perception of tone sequences. Ph.D. dissertation. Eindhoven, The Netherlands: Eindhoven University of Technology.

    Google Scholar 

  • van Noorden, L. P. A. S. (1977). Minimum differences of level and frequency for perceptual fission of tone sequences ABAB. The Journal of the Acoustical Society of America, 61(4), 1041–1045.

    Article  PubMed  Google Scholar 

  • Van Veen, B. D., & Buckley, K. M. (1988). Beamforming: A versatile approach to spatial filtering. IEEE ASSP Magazine, 5(2), 4–24.

    Article  Google Scholar 

  • Varga, A. P., & Moore, R. K. (1990). Hidden Markov model decomposition of speech and noise. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Albuquerque, NM, April 3–6, 1990 (pp. 845–848).

    Google Scholar 

  • Versnel, H., Kowalski, N., & Shamma, S. A. (1995). Ripple analysis in ferret primary auditory cortex. III. Topographic distribution of ripple response parameters. Journal of Auditory Neuroscience, 1, 271–286.

    Google Scholar 

  • Virtanen, T., Singh, R., & Bhiksha, R. (Eds.). (2012). Techniques for noise robustness in automatic speech recognition. Hoboken, NJ: Wiley.

    Google Scholar 

  • Vliegen, J., & Oxenham, A. J. (1999). Sequential stream segregation in the absence of spectral cues. The Journal of the Acoustical Society of America, 105(1), 339–346.

    Article  CAS  PubMed  Google Scholar 

  • von der Malsburg, C. (1994). The correlation theory of brain function. In E. Domany, L. Van Hemmenm, & K. Schulten (Eds.), Models of neural networks (pp. 95–119). Berlin: Springer.

    Chapter  Google Scholar 

  • Waibel, A., & Lee, K. (1990). Readings in speech recognition. Burlington, MA: Morgan Kaufmann.

    Google Scholar 

  • Wang, D., & Chang, P. (2008). An oscillatory correlation model of auditory streaming. Cognitive Neurodynamics, 2(1), 7–19.

    Article  PubMed  PubMed Central  Google Scholar 

  • Wang, D. L., & Brown, G. J. (1999). Separation of speech from interfering sounds based on oscillatory correlation. IEEE Transactions on Neural Networks, 10(3), 684–697.

    Article  CAS  PubMed  Google Scholar 

  • Wang, D. L., & Brown, G. J. (Eds.). (2006). Computational auditory scene analysis: Principles, algorithms and applications. Hoboken, NJ: Wiley-IEEE Press.

    Google Scholar 

  • Weinberger, N. M. (2001). Receptive field plasticity and memory in the auditory cortex: Coding the learned importance of events. In J. Steinmetz, M. Gluck, & P. Solomon (Eds.), Model systems and the neurobiology of associative learning (pp. 187–216). Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Weintraub, M. (1985). A theory and computational model of auditory monaural sound separation. Ph.D. dissertation. Stanford University.

    Google Scholar 

  • Whiteley, L., & Sahani, M. (2012). Attention in a bayesian framework. Frontiers in Human Neuroscience, 6(100), doi:10.3389/fnhum.2012.00100

  • Winkler, I., Denham, S. L., & Nelken, I. (2009). Modeling the auditory scene: Predictive regularity representations and perceptual objects. Trends in Cognitive Sciences, 13(12), 532–540.

    Article  PubMed  Google Scholar 

  • Xu, Y., & Chun, M. M. (2009). Selecting and perceiving multiple visual objects. Trends in Cognitive Sciences, 13(4), 167–174.

    Article  PubMed  PubMed Central  Google Scholar 

  • Yoon, J. S., Park, J. H., & Kim, H. K. (2009). Acoustic model combination to compensate for residual noise in multi-channel source separation. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, April 19–24, 2009 (pp. 3925–3928).

    Google Scholar 

Download references

Acknowledgements

Dr. Elhilali’s work is supported by grants from The National Institutes of Health (NIH: R01HL133043) and the Office of Naval Research (ONR: N000141010278, N000141612045, and N000141210740).

Compliance with Ethics Requirements

Mounya Elhilali declares that she has no conflict of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mounya Elhilali .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Elhilali, M. (2017). Modeling the Cocktail Party Problem. In: Middlebrooks, J., Simon, J., Popper, A., Fay, R. (eds) The Auditory System at the Cocktail Party. Springer Handbook of Auditory Research, vol 60. Springer, Cham. https://doi.org/10.1007/978-3-319-51662-2_5

Download citation

Publish with us

Policies and ethics