Abstract
A robust voice activity detector (VAD) is expected to increase the accuracy of ASR in noisy environments. This study focuses on how to extract robust information for designing a robust VAD. To do so, we construct a noise eigenspace by the principal component analysis of the noise covariance matrix. Projecting noise speech onto the eigenspace, it is found that available information with higher SNR is generally located in the channels with smaller eigenvalues. According to this finding, the available components of the speech are obtained by sorting the noise eigenspace. Based on the extracted high-SNR components, we proposed a robust voice activity detector. The threshold for deciding the available channels is determined using a histogram method. A probability-weighted speech presence is used to increase the reliability of the VAD. The proposed VAD is evaluated using TIMIT database mixed with a number of noises. Experiments showed that our algorithm performs better than traditional VAD algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A Silence Compression Scheme for G.729 Optimized for Terminals Conforming to Recommendation V.70, ITU, ITU-T Rec. G.729-Annex B (1996)
Tucker, R.: Voice activity detection using a periodicity measure. Proc. Inst. Elect. Eng.  139(4), 377–380 (1992)
Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoustic., Speech, Signal processing ASSP-32, 1109–1121 (1984)
Sohn, J., Sung, W.: A voice activity detector employing soft decision based noise spectrum adaptation. In: Proc. ICASSP, pp. 365–368 (1998)
Gazor, S., Zhang, W.: A soft voice activity detector based on a Laplacian-Gaussian model. IEEE Trans. Speech Audio Process 11(5), 498–505 (2003)
Nemer, E., Goubran, R., Mahmoud, S.: Robust voice activity detection using higherorder statistics in the lpc residual domain. IEEE Trans. Speech Audio Process 9(3), 217–231 (2001)
Li, Q., Zheng, J., Tsai, A., Zhou, Q.: Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Trans. Speech Audio Process 10(3), 146–157 (2002)
Ramirez, J., Segura, J.C., et al.: An effective subband osf-based VAD with noise reduction for robust speech recognition. IEEE Trans. Speech Audio Process 11(5), 498–505 (2003)
Shi, Y., Soong, F.K., Zhou, J.-L.: Auto-segmentation based partitioning and clustering approach to robust end pointing. In: Proc. ICASSP 2006 (2006)
Ris, C., Dupont, S.: Assessing local noise level estimation methods: application to noise robust ASR. Speech Communication 34, 141–158 (2001)
ETSI ES 2011 08 recommendation. Speech processing, transmission and quality aspects (STQ); distributed speech recognition; front-end feature extraction algorithm; compression algorithms (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ying, D., Shi, Y., Soong, F., Dang, J., Lu, X. (2006). A Robust Voice Activity Detection Based on Noise Eigenspace Projection. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_12
Download citation
DOI: https://doi.org/10.1007/11939993_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49665-6
Online ISBN: 978-3-540-49666-3
eBook Packages: Computer ScienceComputer Science (R0)