Voice Activity Detection Using Generalized Gamma Distribution

Almpanidis, George; Kotropoulos, Constantine

doi:10.1007/11752912_3

George Almpanidis²² &
Constantine Kotropoulos²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3955))

Included in the following conference series:

Hellenic Conference on Artificial Intelligence

1769 Accesses

Abstract

In this work, we model speech samples with a two-sided generalized Gamma distribution and evaluate its efficiency for voice activity detection. Using a computationally inexpensive maximum likelihood approach, we employ the Bayesian Information Criterion for identifying the phoneme boundaries in noisy speech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rabiner, L.R., Sambur, M.R.: An algorithm for determining the endpoints of isolated utterances. Bell Syst. Tech. Journal 54(2), 297–315 (1975)
Article Google Scholar
Ying, G.S., Mitchell, C.D., Jamieson, L.H.: Endpoint detection of isolated utterances based on a modified Teager energy measurement. In: Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, pp. 732–735 (1992)
Google Scholar
Ganapathiraju, A., Webster, L., Trimble, J., Bush, K., Kornman, P.: Comparison of Energy-Based Endpoint Detectors for Speech Signal Processing. In: Proc. IEEE Southeastcon Bringing Together Education, Science and Technology, Florida, April 1996, pp. 500–503 (1996)
Google Scholar
Tanyer, S., Ozer, H.: Voice activity detection in nonstationary noise. IEEE Trans. Speech and Audio Processing 8(4), 478–482 (2000)
Article Google Scholar
Sohn, J., Kim, N.S., Sung, W.: A statistical model based voice activity detection. IEEE Signal Processing Letters 6(1), 1–3 (1999)
Article Google Scholar
Chang, J., Shin, J., Kim, N.S.: Likelihood ratio test with complex Laplacian model for voice activity detection. In: Proc. European Conf. Speech Communication Technology (2003)
Google Scholar
Nemer, E., Goubran, R., Mahmould, S.: Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Trans. Speech and Audio Processing 9(3), 217–231 (2001)
Article Google Scholar
Schwartz, G.: Estimating the dimension of a model. Annals of Statistics 6, 461–464 (1978)
Article MathSciNet Google Scholar
Chen, S., Gopalakrishnam, P.: Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In: DARPA Broadcast News Workshop (1998)
Google Scholar
Grunwald, P.: Minimum description length tutorial. In: Advances in Minimum Description Length: Theory and Applications, pp. 23–80. MIT Press, Cambridge, MA
Google Scholar
Delacourt, P., Wellekens, C.J.: DISTBIC: a speaker-based segmentation for audio data indexing. Speech Communication 32(1-2), 111–126 (2000)
Article Google Scholar
Tritschler, A., Gopinath, R.: Improved speaker segmentation and segments clustering using the Bayesian information criterion. In: Proc. 1999 European Speech Processing, vol. 2, pp. 679–682 (1999)
Google Scholar
Gazor, S., Zhang, W.: Speech probability distribution. IEEE Signal Processing Letters 10(7), 204–207 (2003)
Article Google Scholar
Gazor, S., Zhang, W.: A soft voice activity detector based on a Laplacian-Gaussian model. IEEE Trans. on Speech and Audio Processing 11(5), 498–505 (2003)
Article Google Scholar
Martin, R.: Speech enhancement using short time spectral estimation with Gamma distributed priors. In: Proc. IEEE Int. Conf. Acoustics, Speech, Signal Proc., vol. 1, pp. 253–256 (2005)
Google Scholar
Nakamura, A.: Acoustic modeling for speech recognition based on a generalized Laplacian mixture distribution. Electronics and Communications in Japan Part II: Electronics 85(11), 32–42 (2002)
Article Google Scholar
Shin, W.-H., Lee, B.-S., Lee, Y.-K., Lee, J.-S.: Speech/non-speech classification using multiple features for robust endpoint detection. In: Proc. IEEE Intl Conf. Acoustics, Speech, and Signal Processing, vol. 3, pp. 1399–1402 (2000)
Google Scholar
Shin, J.W., Chang, J.-H.: Statistical Modeling of Speech Signals Based on Generalized Gamma Distribution. IEEE Signal Processing Letters 12(3), 258–261 (2005)
Article Google Scholar
Pigeon, S., Vandendorpe, L.: The M2VTS multimodal face database. In: Bigün, J., Borgefors, G., Chollet, G. (eds.) AVBPA 1997. LNCS, vol. 1206, pp. 403–409. Springer, Heidelberg (1997)
Chapter Google Scholar
TIMIT Acoustic-Phonetic Continuous Speech Corpus. National Institute of Standards and Technology Speech. Disc 1-1.1, NTIS Order No. PB91-505065 (1990)
Google Scholar
Varga, A., Steeneken, H., Tomlinson, M., Jones, D.: The NOISEX-92 study on the affect of additive noise on automatic speech recognition, Technical Report, DRA Speech Research Unit, Malvern, England (1992)
Google Scholar
Shi, J.W., Chang, J.-H., Yun, H.S., Kim, N.S.: Voice Activity Detection based on Generalized Gamma Distribution. In: Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 1, pp. 781–784 (2005)
Google Scholar
Ramirez, J., Segura, C., Benitez, C., Torre, A., Rubio, A.: A new Kullback-Leibler VAD for speech recognition in noise. IEEE Signal Processing Letters 11(2), 266–269 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, Aristotle University of Thessaloniki, Box 451, Thessaloniki, GR-54124, Greece
George Almpanidis & Constantine Kotropoulos

Authors

George Almpanidis
View author publications
You can also search for this author in PubMed Google Scholar
Constantine Kotropoulos
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department of University of Crete, Greece
Grigoris Antoniou
Institute of Computer Science, Foundation for Research & Technology – Hellas (FORTH), Vassilika Vouton, P.O. Box 1385, 71110, Heraklion, Greece
George Potamias
Institute of Informatics and Telecommunications, NCSR "Demokritos", 15310 A., Paraskevi Attikis, Greece
Costas Spyropoulos
Institute of Computer Science, FO.R.T.H., Vassilika Vouton, P.O. Box 1385, GR 71110, Heraklion, Greece
Dimitris Plexousakis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Almpanidis, G., Kotropoulos, C. (2006). Voice Activity Detection Using Generalized Gamma Distribution. In: Antoniou, G., Potamias, G., Spyropoulos, C., Plexousakis, D. (eds) Advances in Artificial Intelligence. SETN 2006. Lecture Notes in Computer Science(), vol 3955. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11752912_3

Download citation

DOI: https://doi.org/10.1007/11752912_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34117-8
Online ISBN: 978-3-540-34118-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics