A Robust Voice Activity Detection Based on Noise Eigenspace Projection

Ying, Dongwen; Shi, Yu; Soong, Frank; Dang, Jianwu; Lu, Xugang

doi:10.1007/11939993_12

Dongwen Ying²²,
Yu Shi²³,
Frank Soong²³,
Jianwu Dang²² &
…
Xugang Lu²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4274))

Included in the following conference series:

International Symposium on Chinese Spoken Language Processing

1602 Accesses
1 Citations

Abstract

A robust voice activity detector (VAD) is expected to increase the accuracy of ASR in noisy environments. This study focuses on how to extract robust information for designing a robust VAD. To do so, we construct a noise eigenspace by the principal component analysis of the noise covariance matrix. Projecting noise speech onto the eigenspace, it is found that available information with higher SNR is generally located in the channels with smaller eigenvalues. According to this finding, the available components of the speech are obtained by sorting the noise eigenspace. Based on the extracted high-SNR components, we proposed a robust voice activity detector. The threshold for deciding the available channels is determined using a histogram method. A probability-weighted speech presence is used to increase the reliability of the VAD. The proposed VAD is evaluated using TIMIT database mixed with a number of noises. Experiments showed that our algorithm performs better than traditional VAD algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A Silence Compression Scheme for G.729 Optimized for Terminals Conforming to Recommendation V.70, ITU, ITU-T Rec. G.729-Annex B (1996)
Google Scholar
Tucker, R.: Voice activity detection using a periodicity measure. Proc. Inst. Elect. Eng. 139(4), 377–380 (1992)
Google Scholar
Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoustic., Speech, Signal processing ASSP-32, 1109–1121 (1984)
Article Google Scholar
Sohn, J., Sung, W.: A voice activity detector employing soft decision based noise spectrum adaptation. In: Proc. ICASSP, pp. 365–368 (1998)
Google Scholar
Gazor, S., Zhang, W.: A soft voice activity detector based on a Laplacian-Gaussian model. IEEE Trans. Speech Audio Process 11(5), 498–505 (2003)
Article Google Scholar
Nemer, E., Goubran, R., Mahmoud, S.: Robust voice activity detection using higherorder statistics in the lpc residual domain. IEEE Trans. Speech Audio Process 9(3), 217–231 (2001)
Article Google Scholar
Li, Q., Zheng, J., Tsai, A., Zhou, Q.: Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Trans. Speech Audio Process 10(3), 146–157 (2002)
Article Google Scholar
Ramirez, J., Segura, J.C., et al.: An effective subband osf-based VAD with noise reduction for robust speech recognition. IEEE Trans. Speech Audio Process 11(5), 498–505 (2003)
Article Google Scholar
Shi, Y., Soong, F.K., Zhou, J.-L.: Auto-segmentation based partitioning and clustering approach to robust end pointing. In: Proc. ICASSP 2006 (2006)
Google Scholar
Ris, C., Dupont, S.: Assessing local noise level estimation methods: application to noise robust ASR. Speech Communication 34, 141–158 (2001)
Article MATH Google Scholar
ETSI ES 2011 08 recommendation. Speech processing, transmission and quality aspects (STQ); distributed speech recognition; front-end feature extraction algorithm; compression algorithms (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Japan Advanced Institute of Science and Technology, Nomi city, Ishikawa, 923-1292, Japan
Dongwen Ying, Jianwu Dang & Xugang Lu
Microsoft Research Asia, Beijing
Yu Shi & Frank Soong

Authors

Dongwen Ying
View author publications
You can also search for this author in PubMed Google Scholar
Yu Shi
View author publications
You can also search for this author in PubMed Google Scholar
Frank Soong
View author publications
You can also search for this author in PubMed Google Scholar
Jianwu Dang
View author publications
You can also search for this author in PubMed Google Scholar
Xugang Lu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, The University of Hong Kong, Hong Kong
Qiang Huo
Human Language Technology Department, Institute for Infocomm Research (I2R), 119613, Singapore
Bin Ma
School of Computer Engineering, Nanyang Technological University (NTU), 639798, Singapore
Eng-Siong Chng
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore
Haizhou Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ying, D., Shi, Y., Soong, F., Dang, J., Lu, X. (2006). A Robust Voice Activity Detection Based on Noise Eigenspace Projection. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_12

Download citation

DOI: https://doi.org/10.1007/11939993_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49665-6
Online ISBN: 978-3-540-49666-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics