Speech Enhancement and Segregation Based on Human Auditory Mechanisms

  • Masato Akagi
  • Mitsunori Mizumachi
  • Yuichi Ishimoto
  • Masashi Unoki


Humans can perceive specific desired sounds without difficulty, even in noisy environments. This is a useful ability that many animals possess, and is referred to as the ‘Cocktail party effect’. We believe that by modeling this mechanism we will be able to produce tools for speech enhancement and segregation, or for other problems in speech recognition and analysis.


Speech Signal Automatic Speech Recognition Speech Enhancement Instantaneous Amplitude Sound Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Akagi, M, Mizumachi, M. (1997): Noise reduction by paired microphones. Proc. EUROSPEECH97, Rodes, 335–338Google Scholar
  2. 2.
    Bregman, A.S. (1990): Auditory Scene Analysis. Academic Press.Google Scholar
  3. 3.
    Bregman, A.S. (1993): Auditory Scene Analysis: hearing in complex environments. In: Thinking in Sounds. Oxford University Press, New York, pp. 10–36Google Scholar
  4. 4.
    Cooke, M. P., Brown, G.J. (1993): Computational auditory scene analysis: Exploiting principles of perceived continuity. Speech Communication 13, 391–399Google Scholar
  5. 5.
    Culling, J. F., Summerfield, Q. (1995): Perceptual separation of concurrent speech sounds: Absence of across-frequency grouping by common interaural delay. J. Acoust. Soc. Am. 98 (2), 785–797CrossRefGoogle Scholar
  6. 6.
    de Cheveigne, A. (1993): Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancellation model of auditory processing. J. Acoust. Soc. Am. 93 (6), 3271–3290CrossRefGoogle Scholar
  7. 7.
    Durlach, N. L. (1963): Equalization and Cancellation Theory of Binaural Masking-Level Difference. J. Acoust. Soc. Am. 35 (8), 1206–1218CrossRefGoogle Scholar
  8. 8.
    Ellis, D. P. W. (1996): Prediction-driven computational auditory scene analysis. Ph.D. thesis, MIT Media LabGoogle Scholar
  9. 9.
    Flanagan, J. L, et al. (1991): Autodirective microphone systems. Acoustica 73 (2), 58–71Google Scholar
  10. 10.
    Mizumachi, M, Akagi, M. (1998): Noise reduction by paired-microphones using spectral subtraction. Proc. ICASSP98 II, 1001–1004Google Scholar
  11. 11.
    Mizumachi, M., Akagi, M. (1999): Noise reduction method that is equipped for robust direction finder in adverse environments. Proc. Workshop on Robust Methods for Speech Recognition in Adverse Conditions, Tampere, Finland. 179–182Google Scholar
  12. 196.
    Masato Akagi et al.Google Scholar
  13. 12.
    Mizumachi, M., Akagi, M. (1999): An objective distortion estimator for hearing aids and its application to noise reduction. Proc. EUROSPEECH99, Budapest, 2619–2622Google Scholar
  14. 13.
    Mizumachi, M., et al. (2000): Design of robust subtractive beamformer for noisy speech recognition. Proc. ICSLP2000, Beijing, IV-57–60Google Scholar
  15. 14.
    Nakatani, T., et al. (1994): Unified Architecture for Auditory Scene Analysis and Spoken Language Processing. Proc. ICSLP’94, Yokohama, 24 (3)Google Scholar
  16. 15.
    Unoki, M., Akagi, M. (1998): Signal Extraction from Noisy Signal based on Auditory Scene Analysis. Proc. ICSLP’98, Sydney, 1515–1518Google Scholar
  17. 16.
    Unoki, M., Akagi, M. (1997): A method of signal extraction from noisy signal. Proc. EUROSPEECH97, Rodes, 2587–2590Google Scholar
  18. 17.
    Unoki, M., Akagi, M. (1999a): Signal Extraction from Noisy Signal based on Auditory Scene Analysis. Speech Communication 27 (3), pp. 261–279CrossRefGoogle Scholar
  19. 18.
    Unoki, M., Akagi, M. (1999b): Segregation of vowel in background noise using the method of segregating two acoustic sources based on auditory scene. Proc. EUROSPEECH99, Budapest, 2575–2578Google Scholar

Copyright information

© Springer Japan 2002

Authors and Affiliations

  • Masato Akagi
    • 1
  • Mitsunori Mizumachi
    • 2
  • Yuichi Ishimoto
    • 1
  • Masashi Unoki
    • 3
  1. 1.Japan Advanced Institute of Science and TechnologyIshikawaJapan
  2. 2.ATR Spoken Language Translation Research LaboratoriesKyotoJapan
  3. 3.CNBH, Physiology DepartmentUniversity of CambridgeUK

Personalised recommendations