Skip to main content

A Robust Voice Activity Detection Based on Noise Eigenspace Projection

  • Conference paper
Chinese Spoken Language Processing (ISCSLP 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4274))

Included in the following conference series:

Abstract

A robust voice activity detector (VAD) is expected to increase the accuracy of ASR in noisy environments. This study focuses on how to extract robust information for designing a robust VAD. To do so, we construct a noise eigenspace by the principal component analysis of the noise covariance matrix. Projecting noise speech onto the eigenspace, it is found that available information with higher SNR is generally located in the channels with smaller eigenvalues. According to this finding, the available components of the speech are obtained by sorting the noise eigenspace. Based on the extracted high-SNR components, we proposed a robust voice activity detector. The threshold for deciding the available channels is determined using a histogram method. A probability-weighted speech presence is used to increase the reliability of the VAD. The proposed VAD is evaluated using TIMIT database mixed with a number of noises. Experiments showed that our algorithm performs better than traditional VAD algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A Silence Compression Scheme for G.729 Optimized for Terminals Conforming to Recommendation V.70, ITU, ITU-T Rec. G.729-Annex B (1996)

    Google Scholar 

  2. Tucker, R.: Voice activity detection using a periodicity measure. Proc. Inst. Elect. Eng.  139(4), 377–380 (1992)

    Google Scholar 

  3. Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoustic., Speech, Signal processing ASSP-32, 1109–1121 (1984)

    Article  Google Scholar 

  4. Sohn, J., Sung, W.: A voice activity detector employing soft decision based noise spectrum adaptation. In: Proc. ICASSP, pp. 365–368 (1998)

    Google Scholar 

  5. Gazor, S., Zhang, W.: A soft voice activity detector based on a Laplacian-Gaussian model. IEEE Trans. Speech Audio Process 11(5), 498–505 (2003)

    Article  Google Scholar 

  6. Nemer, E., Goubran, R., Mahmoud, S.: Robust voice activity detection using higherorder statistics in the lpc residual domain. IEEE Trans. Speech Audio Process 9(3), 217–231 (2001)

    Article  Google Scholar 

  7. Li, Q., Zheng, J., Tsai, A., Zhou, Q.: Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Trans. Speech Audio Process 10(3), 146–157 (2002)

    Article  Google Scholar 

  8. Ramirez, J., Segura, J.C., et al.: An effective subband osf-based VAD with noise reduction for robust speech recognition. IEEE Trans. Speech Audio Process 11(5), 498–505 (2003)

    Article  Google Scholar 

  9. Shi, Y., Soong, F.K., Zhou, J.-L.: Auto-segmentation based partitioning and clustering approach to robust end pointing. In: Proc. ICASSP 2006 (2006)

    Google Scholar 

  10. Ris, C., Dupont, S.: Assessing local noise level estimation methods: application to noise robust ASR. Speech Communication 34, 141–158 (2001)

    Article  MATH  Google Scholar 

  11. ETSI ES 2011 08 recommendation. Speech processing, transmission and quality aspects (STQ); distributed speech recognition; front-end feature extraction algorithm; compression algorithms (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ying, D., Shi, Y., Soong, F., Dang, J., Lu, X. (2006). A Robust Voice Activity Detection Based on Noise Eigenspace Projection. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_12

Download citation

  • DOI: https://doi.org/10.1007/11939993_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49665-6

  • Online ISBN: 978-3-540-49666-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics