Abstract
In this paper, we propose a statistical model-based speech enhancement technique using the spectral difference scheme for the speech recognition in virtual reality. In the analyzing step, two principal parameters, the weighting parameter in the decision-directed (DD) method and the long-term smoothing parameter in noise estimation, are uniquely determined as optimal operating points according to the spectral difference under various noise conditions. These optimal operating points, which are specific according to different spectral differences, are estimated based on the composite measure, which is a relevant criterion in terms of speech quality. An efficient mapping function is also presented to provide an index of the metric table associated with the spectral difference so that operating points can be determined according to various noise conditions for an on-line step. In the on-line speech enhancement step, different parameters are chosen on a frame-by-frame basis under the metric table of the spectral difference. The performance of the proposed method is evaluated using objective and subjective speech quality measures in various noise environments. Our experimental results show that the proposed algorithm yields better performances than conventional algorithms.
Similar content being viewed by others
References
Chang JH (2006) Perceptual weighting filter for robust speech modification. Signal Process 86(15):1089–1093
Choi JH, Chang JH, Kim DK, Kim SH (2011) Speech enhancement besed on adaptive noise power estimation using spectral difference. IEICE Trans Fundam E94-A (10):2031–2034
Choi JH, Chang JH (2012) On using acoustic environment classification for statistical model-based speech enhancement. Speech Commun 54(3):477–490
Cohen I, Berdugo B (2002) Speech enhancement for non-stationary noise environments. Signal Process 81(11):2403–2418
Ephraim Y, Malah D (1985) Speech enhancement using a minimum mean-square error log spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process ASSP-33(2):443–445
Hu Y, Loizou P (2008) Evaluation of objective quality measures for speech enhancement. IEEE Trans Audio Speech Lang Process 16(1):229–238
ITU-T Rec. P. 862 (2000) Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs
Kim NS, Chang JH (2000) Spectral enhancement based on global soft decision. IEEE Signal Process Lett 7(5):108–110
Krishnamurthy N, Hansen J (2006) Noise update modeling for speech enhancement: when do we do enough?. In: Proceedings of interspeech 2006, pp 1431–1434
Lee S, Kim SH (2008) Noise reduction using noise power estimates and updated gain function for speech enhancement in stationary and non-stationary noisy environments. Int J Control Autom Syst 6(6):818–827
Lee S, Lim C, Chang JH (2014) A new a priori SNR estimator based on multiple linear regression technique for speech enhancement. Digital Signal Process 30 (7):154–164
Lee S, Chang JH (2016) On using multivariate polynomial regression model with spectral difference for statistical model-based speech enhancement. J Syst Archit 64:76–85
Lee S, Park CH, Chang JH (2016) Improved Gaussian mixture regression based on pseudo feature generation using bootstrap in blood pressure measurement. IEEE Trans Ind Inf. doi:10.1109/TII.2015.2484278
McAuay RJ, Malpass ML (1980) Speech enhancement using a soft decision noise suppression filter. IEEE Trans Acoust Speech Signal Process ASSP-28(2):137–145
Park YS, Chang JH (2007) A novel approach to a robust a priori SNR estimator in speech enhancement. IEICE Transations on Communications E90-B(8):2182–2185
Sangwan A, Krishnamurthy N, Hansen JHL (2007) Environmentally aware voice activity detector. In: Proceedings of interspeech 2007, pp 2929–2932
TIA/EIA/IS-127 (1996) Enhanced variable rate codec, speech service option 3 for wideband spread spectrum digital systems
Westerlund N, Dahl M, Claesson I (2005) Speech enhancement for personal communication using an adaptive gain equalizer. Signal Process 85(6):1089–1101
Acknowledgments
This work was also supported by National Research Foundation (NRF) of Korea grant funded by (2014R1A2A1A10049735).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lee, S., Chang, JH. Spectral difference for statistical model-based speech enhancement in speech recognition. Multimed Tools Appl 76, 24917–24929 (2017). https://doi.org/10.1007/s11042-016-4122-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-4122-7