Speech Enhancement and Segregation Based on Human Auditory Mechanisms

Akagi, Masato; Mizumachi, Mitsunori; Ishimoto, Yuichi; Unoki, Masashi

doi:10.1007/978-4-431-66979-1_18

Masato Akagi⁷,
Mitsunori Mizumachi⁸,
Yuichi Ishimoto⁷ &
…
Masashi Unoki⁹

128 Accesses

Abstract

Humans can perceive specific desired sounds without difficulty, even in noisy environments. This is a useful ability that many animals possess, and is referred to as the ‘Cocktail party effect’. We believe that by modeling this mechanism we will be able to produce tools for speech enhancement and segregation, or for other problems in speech recognition and analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Akagi, M, Mizumachi, M. (1997): Noise reduction by paired microphones. Proc. EUROSPEECH97, Rodes, 335–338
Google Scholar
Bregman, A.S. (1990): Auditory Scene Analysis. Academic Press.
Google Scholar
Bregman, A.S. (1993): Auditory Scene Analysis: hearing in complex environments. In: Thinking in Sounds. Oxford University Press, New York, pp. 10–36
Google Scholar
Cooke, M. P., Brown, G.J. (1993): Computational auditory scene analysis: Exploiting principles of perceived continuity. Speech Communication 13, 391–399
Google Scholar
Culling, J. F., Summerfield, Q. (1995): Perceptual separation of concurrent speech sounds: Absence of across-frequency grouping by common interaural delay. J. Acoust. Soc. Am. 98 (2), 785–797
Article Google Scholar
de Cheveigne, A. (1993): Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancellation model of auditory processing. J. Acoust. Soc. Am. 93 (6), 3271–3290
Article Google Scholar
Durlach, N. L. (1963): Equalization and Cancellation Theory of Binaural Masking-Level Difference. J. Acoust. Soc. Am. 35 (8), 1206–1218
Article Google Scholar
Ellis, D. P. W. (1996): Prediction-driven computational auditory scene analysis. Ph.D. thesis, MIT Media Lab
Google Scholar
Flanagan, J. L, et al. (1991): Autodirective microphone systems. Acoustica 73 (2), 58–71
Google Scholar
Mizumachi, M, Akagi, M. (1998): Noise reduction by paired-microphones using spectral subtraction. Proc. ICASSP98 II, 1001–1004
Google Scholar
Mizumachi, M., Akagi, M. (1999): Noise reduction method that is equipped for robust direction finder in adverse environments. Proc. Workshop on Robust Methods for Speech Recognition in Adverse Conditions, Tampere, Finland. 179–182
Google Scholar
Masato Akagi et al.
Google Scholar
Mizumachi, M., Akagi, M. (1999): An objective distortion estimator for hearing aids and its application to noise reduction. Proc. EUROSPEECH99, Budapest, 2619–2622
Google Scholar
Mizumachi, M., et al. (2000): Design of robust subtractive beamformer for noisy speech recognition. Proc. ICSLP2000, Beijing, IV-57–60
Google Scholar
Nakatani, T., et al. (1994): Unified Architecture for Auditory Scene Analysis and Spoken Language Processing. Proc. ICSLP’94, Yokohama, 24 (3)
Google Scholar
Unoki, M., Akagi, M. (1998): Signal Extraction from Noisy Signal based on Auditory Scene Analysis. Proc. ICSLP’98, Sydney, 1515–1518
Google Scholar
Unoki, M., Akagi, M. (1997): A method of signal extraction from noisy signal. Proc. EUROSPEECH97, Rodes, 2587–2590
Google Scholar
Unoki, M., Akagi, M. (1999a): Signal Extraction from Noisy Signal based on Auditory Scene Analysis. Speech Communication 27 (3), pp. 261–279
Article Google Scholar
Unoki, M., Akagi, M. (1999b): Segregation of vowel in background noise using the method of segregating two acoustic sources based on auditory scene. Proc. EUROSPEECH99, Budapest, 2575–2578
Google Scholar

Download references

Author information

Authors and Affiliations

Japan Advanced Institute of Science and Technology, Ishikawa, 923-1292, Japan
Masato Akagi & Yuichi Ishimoto
ATR Spoken Language Translation Research Laboratories, Kyoto, 619-0288, Japan
Mitsunori Mizumachi
CNBH, Physiology Department, University of Cambridge, CB2 3EG, UK
Masashi Unoki

Authors

Masato Akagi
View author publications
You can also search for this author in PubMed Google Scholar
Mitsunori Mizumachi
View author publications
You can also search for this author in PubMed Google Scholar
Yuichi Ishimoto
View author publications
You can also search for this author in PubMed Google Scholar
Masashi Unoki
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The University of Aizu, 965-8580, Aizu-Wakamatsu, Fukushima, Japan
Qun Jin
University of Tsukuba, 305-8573, Tsukuba, Ibaraki, Japan
Jie Li
Hiroshima Shudo University, 731-3195, Hiroshima, Japan
Nan Zhang
Saitama University, 338-8570, Saitama, Japan
Jingde Cheng
University of Illinois at Chicago, 60612, Chicago, IL, USA
Clement Yu
Sendai Foundation for Applied Information Sciences, 983-0852, Sendai, Miyagi, Japan
Shoichi Noguchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Akagi, M., Mizumachi, M., Ishimoto, Y., Unoki, M. (2002). Speech Enhancement and Segregation Based on Human Auditory Mechanisms. In: Jin, Q., Li, J., Zhang, N., Cheng, J., Yu, C., Noguchi, S. (eds) Enabling Society with Information Technology. Springer, Tokyo. https://doi.org/10.1007/978-4-431-66979-1_18

Download citation

DOI: https://doi.org/10.1007/978-4-431-66979-1_18
Publisher Name: Springer, Tokyo
Print ISBN: 978-4-431-66981-4
Online ISBN: 978-4-431-66979-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics