Abstract
An effective speech event detector is presented in this work for improving the performance of speech processing systems working in noisy environment. The proposed method is based on a trained support vector machine (SVM) that defines an optimized non-linear decision rule involving the subband SNRs of the input speech. It is analyzed the classification rule in the input space and the ability of the SVM model to learn how the signal is masked by the background noise. The algorithm also incorporates a noise reduction block working in tandem with the voice activity detector (VAD) that has shown to be very effective in high noise environments. The experimental analysis carried out on the Spanish SpeechDat-Car database shows clear improvements over standard VADs including ITU G.729, ETSI AMR and ETSI AFE for distributed speech recognition (DSR), and other recently reported VADs.
Chapter PDF
Similar content being viewed by others
Keywords
- Support Vector Machine
- False Alarm Rate
- Support Vector Machine Model
- Voice Activity Detector
- Sequential Minimal Optimization
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Vapnik, V.: Estimation of Dependences Based on Empirical Data. Springer, New York (1982)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Berlin (1995)
Vapnik, V.: Statistical Learning Theory. John Wiley and Sons, Inc., New York (1998)
Enqing, D., Guizhong, L., Yatong, Z., Xiaodi, Z.: Applying support vector machines to voice activity detection. In: 6th International Conference on Signal Processing, vol. 2, pp. 1124–1127 (2002)
ITU: A silence compression scheme for G.729 optimized for terminals conforming to recommendation V.70. ITU-T Recommendation G.729-Annex B (1996)
Enqing, D., Heming, Z., Yongli, L.: Low bit and variable rate speech coding using local cosine transform. In: Proc. of the 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering (TENCON 2002), vol. 1, pp. 423–426 (2002)
Qi, F., Bao, C., Liu, Y.: A novel two-step SVM classifier for voiced/unvoiced/silence classification of speech. In: International Symposium on Chinese Spoken Language Processing, pp. 77–80 (2004)
ETSI: Voice activity detector (VAD) for Adaptive Multi-Rate (AMR) speech traffic channels. ETSI EN 301 708 Recommendation (1999)
ETSI: Speech processing, transmission and quality aspects (STQ); distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithms. ETSI ES 201 108 Recommendation (2002)
Sohn, J., Kim, N.S., Sung, W.: A statistical model-based voice activity detection. IEEE Signal Processing Letters 16, 1–3 (1999)
Woo, K., Yang, T., Park, K., Lee, C.: Robust voice activity detection algorithm for estimating noise spectrum. Electronics Letters 36, 180–181 (2000)
Li, Q., Zheng, J., Tsai, A., Zhou, Q.: Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Transactions on Speech and Audio Processing 10, 146–157 (2002)
Marzinzik, M., Kollmeier, B.: Speech pause detection for noise spectrum estimation by tracking power envelope dynamics. IEEE Transactions on Speech and Audio Processing 10, 341–351 (2002)
Platt, J.: Fast Training of Support Vector Machines using Sequential Minimal Optimization. In: Advances in Kernel Methods - Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)
Clarkson, P., Moreno, P.: On the use of support vector machines for phonetic classification. In: Proc. of the IEEE Int. Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 585–588 (1999)
Ganapathiraju, A., Hamaker, J., Picone, J.: Applications of support vector machines to speech recognition. IEEE Transactions on Signal Processing 52, 2348–2355 (2004)
Chang, C., Lin, C.J.: LIBSVM: a library for support vector machines. Technical report, Dept. of Computer Science and Information Engineering, National Taiwan University (2001)
Cortes, C., Vapnik, V.: Support-vector network. Machine Learning (1995)
Moreno, A., Borge, L., Christoph, D., Gael, R., Khalid, C., Stephan, E., Jeffrey, A.: SpeechDat-Car: A Large Speech Database for Automotive Environments. In: Proceedings of the II LREC Conference (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yélamos, P., Ramírez, J., Górriz, J.M., Puntonet, C.G., Segura, J.C. (2006). Speech Event Detection Using Support Vector Machines. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds) Computational Science – ICCS 2006. ICCS 2006. Lecture Notes in Computer Science, vol 3991. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11758501_50
Download citation
DOI: https://doi.org/10.1007/11758501_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34379-0
Online ISBN: 978-3-540-34380-6
eBook Packages: Computer ScienceComputer Science (R0)