Spectral difference for statistical model-based speech enhancement in speech recognition

Lee, Soojeong; Chang, Joon-Hyuk

doi:10.1007/s11042-016-4122-7

Spectral difference for statistical model-based speech enhancement in speech recognition

Published: 18 November 2016

Volume 76, pages 24917–24929, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Soojeong Lee¹ &
Joon-Hyuk Chang¹

295 Accesses
2 Citations
Explore all metrics

Abstract

In this paper, we propose a statistical model-based speech enhancement technique using the spectral difference scheme for the speech recognition in virtual reality. In the analyzing step, two principal parameters, the weighting parameter in the decision-directed (DD) method and the long-term smoothing parameter in noise estimation, are uniquely determined as optimal operating points according to the spectral difference under various noise conditions. These optimal operating points, which are specific according to different spectral differences, are estimated based on the composite measure, which is a relevant criterion in terms of speech quality. An efficient mapping function is also presented to provide an index of the metric table associated with the spectral difference so that operating points can be determined according to various noise conditions for an on-line step. In the on-line speech enhancement step, different parameters are chosen on a frame-by-frame basis under the metric table of the spectral difference. The performance of the proposed method is evaluated using objective and subjective speech quality measures in various noise environments. Our experimental results show that the proposed algorithm yields better performances than conventional algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Psychoacoustic model-driven spectral subtraction for monaural speech enhancement

Article 18 November 2023

Comparative performance evaluation of MMSE-based speech enhancement techniques through simulation and real-time implementation

Article 26 October 2018

An Optimized Fixed Equalizer for Speech Enhancement

Article 10 June 2022

References

Chang JH (2006) Perceptual weighting filter for robust speech modification. Signal Process 86(15):1089–1093
Article MATH Google Scholar
Choi JH, Chang JH, Kim DK, Kim SH (2011) Speech enhancement besed on adaptive noise power estimation using spectral difference. IEICE Trans Fundam E94-A (10):2031–2034
Article Google Scholar
Choi JH, Chang JH (2012) On using acoustic environment classification for statistical model-based speech enhancement. Speech Commun 54(3):477–490
Article Google Scholar
Cohen I, Berdugo B (2002) Speech enhancement for non-stationary noise environments. Signal Process 81(11):2403–2418
Article MATH Google Scholar
Ephraim Y, Malah D (1985) Speech enhancement using a minimum mean-square error log spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process ASSP-33(2):443–445
Article Google Scholar
Hu Y, Loizou P (2008) Evaluation of objective quality measures for speech enhancement. IEEE Trans Audio Speech Lang Process 16(1):229–238
Article Google Scholar
ITU-T Rec. P. 862 (2000) Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs
Kim NS, Chang JH (2000) Spectral enhancement based on global soft decision. IEEE Signal Process Lett 7(5):108–110
Article Google Scholar
Krishnamurthy N, Hansen J (2006) Noise update modeling for speech enhancement: when do we do enough?. In: Proceedings of interspeech 2006, pp 1431–1434
Lee S, Kim SH (2008) Noise reduction using noise power estimates and updated gain function for speech enhancement in stationary and non-stationary noisy environments. Int J Control Autom Syst 6(6):818–827
Google Scholar
Lee S, Lim C, Chang JH (2014) A new a priori SNR estimator based on multiple linear regression technique for speech enhancement. Digital Signal Process 30 (7):154–164
Article Google Scholar
Lee S, Chang JH (2016) On using multivariate polynomial regression model with spectral difference for statistical model-based speech enhancement. J Syst Archit 64:76–85
Article Google Scholar
Lee S, Park CH, Chang JH (2016) Improved Gaussian mixture regression based on pseudo feature generation using bootstrap in blood pressure measurement. IEEE Trans Ind Inf. doi:10.1109/TII.2015.2484278
Google Scholar
McAuay RJ, Malpass ML (1980) Speech enhancement using a soft decision noise suppression filter. IEEE Trans Acoust Speech Signal Process ASSP-28(2):137–145
Article Google Scholar
Park YS, Chang JH (2007) A novel approach to a robust a priori SNR estimator in speech enhancement. IEICE Transations on Communications E90-B(8):2182–2185
Article Google Scholar
Sangwan A, Krishnamurthy N, Hansen JHL (2007) Environmentally aware voice activity detector. In: Proceedings of interspeech 2007, pp 2929–2932
TIA/EIA/IS-127 (1996) Enhanced variable rate codec, speech service option 3 for wideband spread spectrum digital systems
Westerlund N, Dahl M, Claesson I (2005) Speech enhancement for personal communication using an adaptive gain equalizer. Signal Process 85(6):1089–1101
Article MATH Google Scholar

Download references

Acknowledgments

This work was also supported by National Research Foundation (NRF) of Korea grant funded by (2014R1A2A1A10049735).

Author information

Authors and Affiliations

Department of Electronic Engineering, Hanyang University, Seoul, 133-791, Korea
Soojeong Lee & Joon-Hyuk Chang

Authors

Soojeong Lee
View author publications
You can also search for this author in PubMed Google Scholar
Joon-Hyuk Chang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joon-Hyuk Chang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, S., Chang, JH. Spectral difference for statistical model-based speech enhancement in speech recognition. Multimed Tools Appl 76, 24917–24929 (2017). https://doi.org/10.1007/s11042-016-4122-7

Download citation

Received: 28 June 2016
Revised: 12 October 2016
Accepted: 02 November 2016
Published: 18 November 2016
Issue Date: December 2017
DOI: https://doi.org/10.1007/s11042-016-4122-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spectral difference for statistical model-based speech enhancement in speech recognition

Abstract

Access this article

Similar content being viewed by others

Psychoacoustic model-driven spectral subtraction for monaural speech enhancement

Comparative performance evaluation of MMSE-based speech enhancement techniques through simulation and real-time implementation

An Optimized Fixed Equalizer for Speech Enhancement

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Spectral difference for statistical model-based speech enhancement in speech recognition

Abstract

Access this article

Similar content being viewed by others

Psychoacoustic model-driven spectral subtraction for monaural speech enhancement

Comparative performance evaluation of MMSE-based speech enhancement techniques through simulation and real-time implementation

An Optimized Fixed Equalizer for Speech Enhancement

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation