SVMs for Automatic Speech Recognition: A Survey

Solera-Ureña, R.; Padrell-Sendra, J.; Martín-Iglesias, D.; Gallardo-Antolín, A.; Peláez-Moreno, C.; Díaz-de-María, F.

doi:10.1007/978-3-540-71505-4_11

R. Solera-Ureña¹,
J. Padrell-Sendra¹,
D. Martín-Iglesias¹,
A. Gallardo-Antolín¹,
C. Peláez-Moreno¹ &
…
F. Díaz-de-María¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4391))

1237 Accesses
18 Citations

Abstract

Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). Nevertheless, we are still far from achieving high-performance ASR systems. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the late eighties and early nineties. Some of them tackled the ASR problem using predictive ANNs, while others proposed hybrid HMM/ANN systems. However, despite some achievements, nowadays, the preponderance of Markov Models is a fact.

During the last decade, however, a new tool appeared in the field of machine learning that has proved to be able to cope with hard classification problems in several fields of application: the Support Vector Machines (SVMs). The SVMs are effective discriminative classifiers with several outstanding characteristics, namely: their solution is that with maximum margin; they are capable to deal with samples of a very higher dimensionality; and their convergence to the minimum of the associated cost function is guaranteed.

These characteristics have made SVMs very popular and successful. In this chapter we discuss their strengths and weakness in the ASR context and make a review of the current state-of-the-art techniques. We organize the contributions in two parts: isolated-word recognition and continuous speech recognition. Within the first part we review several techniques to produce the fixed-dimension vectors needed for original SVMs. Afterwards we explore more sophisticated techniques based on the use of kernels capable to deal with sequences of different length. Among them is the DTAK kernel, simple and effective, which rescues an old technique of speech recognition: Dynamic Time Warping (DTW). Within the second part, we describe some recent approaches to tackle more complex tasks like connected digit recognition or continuous speech recognition using SVMs. Finally we draw some conclusions and outline several ongoing lines of research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sakoe, H., Isotani, R., Yoshida, K., Iso, K., Watanabe, T.: Speaker-Independent Word Recognition using Dynamic Programming Neural Networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Glasgow, Scotland, pp. 439–442 (1989)
Google Scholar
Iso, K., Watanabe, T.: Speaker-Independent Word Recognition using a Neural Prediction Model. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Alburquerque, New Mexico, USA, pp. 441–444 (1990)
Google Scholar
Tebelskis, J., Waibel, A., Petek, B., Schmidbauer, O.: Continuous Speech Recognition using Predictive Neural Networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, pp. 61–64 (1991)
Google Scholar
Bourlard, H., Morgan, N.: Connectionist speech recognition: a hybrid approach. Kluwer Academic Publishers, Dordrecht (1994)
Google Scholar
Schlkopf, B., Smola, A.: Learning with kernels. MIT Press, Cambridge (2002)
Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley, Chichester (1998)
MATH Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
MATH Google Scholar
Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K.: Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech and Signal Processing 37, 328–339 (1989)
Article Google Scholar
Robinson, T., Fallside, F.: A recurrent error propagation network speech recognition system. Computer, Speech and Language 5, 259–274 (1991)
Article Google Scholar
Trentin, E., Gori, M.: A survey of hybrid ann/hmm models for automatic speech recognition. Neurocomputing 37, 91–126 (2001)
Article MATH Google Scholar
Bourlard, H., Morgan, N.: Continuous speech recognition by connectionist statistical methods. IEEE Transactions on Neural Networks 4, 893–909 (1993)
Article Google Scholar
Robinson, T., Hochberg, M., Renals, S.: The Use of Recurrent Neural Networks in Continuous Speech Recognition. In: Automatic Speech and Speaker Recognition - Advanced Topics, pp. 159–184. Kluwer Academic Publishers, Norwell (1995)
Google Scholar
Reichl, W., Ruske, G.: A hybrid rbf-hmm system for continuous speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Detroit, MI, USA, pp. 3335–3338 (1995)
Google Scholar
Ellis, D., Singh, R., Sivadas, S.: Tandem-acoustic modeling in large-vocabulary recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Salt Lake City, Utah, USA, pp. 517–520 (2001)
Google Scholar
Boser, B.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Computational Learning Theory, pp. 144–152 (1992)
Google Scholar
Pérez-Cruz, F., Bousquet, O.: Kernel Methods and Their Potential Use in Signal Processing. IEEE Signal Processing Magazine 21(3), 57–65 (2004)
Article Google Scholar
Fletcher, R.: Practical Methods of Optimization. Wiley-Interscience, New York (1987)
MATH Google Scholar
Navia-Vázquez, A., Pérez-Cruz, F., Artés-Rodríguez, A., Figueiras-Vidal, A.R.: Weighted Least Squares Training of Support Vector Classifiers leading to Compact and Adaptive Schemes. IEEE Transactions on Neural Networks 12(5), 1047–1059 (2001)
Article Google Scholar
Fine, S., Navratil, J., Gopinath, R.A.: A hybrid gmm/svm approach to speaker identification. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, Salt Lake City, Utah, USA, pp. 417–420 (2001)
Google Scholar
Le, Q., Bengio, S.: Client Dependent GMM-SVM Models for Speaker Verification. In: Kaynak, O., Alpaydın, E., Oja, E., Xu, L. (eds.) ICANN 2003 and ICONIP 2003. LNCS, vol. 2714, pp. 443–451. Springer, Heidelberg (2003)
Google Scholar
Ma, C., Randolph, M.A., Drish, J.: A support vector machines-based rejection technique for speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, Salt Lake City, Utah, USA, pp. 381–384 (2001)
Google Scholar
Hsu, C.W., Lin, C.J.: A Comparison of Methods for Multi-class Support Vector Machines. IEEE Transactions on Neural Networks 13(2), 415–425 (2002)
Article Google Scholar
Ganapathiraju, A., Hamaker, J.E., Picone, J.: Applications of support vector machines to speech recognition. IEEE Transactions on Signal Processing 52, 2348–2355 (2004)
Article Google Scholar
Thubthong, N., Kijsirikul, B.: Support vector machines for thai phoneme recognition. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 9, 803–813 (2001)
MATH Google Scholar
Clarkson, P., Moreno, P.J.: On the use of support vector machines for phonetic classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, Phoenix, Arizona, USA, pp. 585–588 (1999)
Google Scholar
Sekhar, C., Lee, W.F., Takeda, K., Itakura, F.: Acoustic modelling of subword units using support vector machines. In: Workshop on spoken language processing, Mumbai, India (2003)
Google Scholar
Young, S.: HTK-Hidden Markov Model Toolkit (ver 2.1). Cambridge University Press, Cambridge (1995)
Google Scholar
García-Cabellos, J.M., Peláez-Moreno, C., Gallardo-Antolín, A., Pérez-Cruz, F., Díaz-de-María, F.: SVM Classifiers for ASR: A Discusion about Parameterization. In: Proceedings of EUSIPCO 2004, Wien, Austria, pp. 2067–2070 (2004)
Google Scholar
Ech-Cherif, A., Kohili, M., Benyettou, A., Benyettou, M.: Lagrangian support vector machines for phoneme classification. In: Proceedings of the 9th International Conference on Neural Information Processing (ICONIP ’02), vol. 5, Singapore, pp. 2507–2511 (2002)
Google Scholar
Martín-Iglesias, D., Bernal-Chaves, J., Peláez-Moreno, C., Gallardo-Antolín, A., Díaz-de-María, F.: A Speech Recognizer Based on Multiclass SVMs with HMM-Guided Segmentation. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds.) NOLISP 2005. LNCS (LNAI), vol. 3817, pp. 257–266. Springer, Heidelberg (2006)
Chapter Google Scholar
Solera-Ureña, R., Martín-Iglesias, D., Gallardo-Antolín, A., Peláez-Moreno, C., Díaz-de-María, F.: Robust ASR using Support Vector Machines. Speech Communication, Elsevier, submitted (2006)
Google Scholar
Gangashetty, S.V., Sekhar, C., Yegnanarayana, B.: Combining evidence from multiple classifiers for recognition of consonant-vowel units of speech in multiple languages. In: Proceedings of the International Conference on Intelligent Sensing and Information Processing, Chennai, India, pp. 387–391 (2005)
Google Scholar
Shimodaira, H., Noma, K.I., Nakai, M., Sagayama, S.: Support vector machine with dynamic time-alignment kernel for speech recognition. In: Proceedings of Eurospeech, Aalborg, Denmark, pp. 1841–1844 (2001)
Google Scholar
Shimodaira, H., Noma, K., Nakai, M.: Dynamic Time-Alignment Kernel in Support Vector Machine. In: Advances in Neural Information Processing Systems 14, vol. 2, pp. 921–928. MIT Press, Cambridge (2002)
Google Scholar
Rabiner, L.R., Rosenberg, A.E., Levinson, S.E.: Considerations in Dynamic Time Warping Algorithms for Discrete Word Recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 26(6), 575–582 (1978)
Article MATH Google Scholar
Glass, J.R.: A probabilistic framework for segment-based speech recognition. Computer Speech and Language 17, 137–152 (2003)
Article Google Scholar
Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. Technical report, Dept. of Computer Science, Univ. of California (1998)
Google Scholar
Smith, N.D., Gales, M.J.F.: Using SVMs and discriminative models for speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, Orlando, Florida, USA, pp. 77–80 (2002)
Google Scholar
Smith, N.D., Gales, M.J.F.: Speech recognition using SVMs. In: Advances in Neural Information Processing Systems 14, pp. 1197–1204. MIT Press, Cambridge (2002)
Google Scholar
Smith, N.D., Niranjan, M.: Data-dependent Kernels in SVM Classification of Speech Patterns. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP), vol. 1, Beijing, China, pp. 297–300 (2000)
Google Scholar
Wan, V., Renals, S.: Speaker verification using sequence discriminant support vector machines. IEEE Transactions on Speech and Audio Processing 13, 203–210 (2005)
Article Google Scholar
Ganapathiraju, A., Hamaker, J., Picone, J.: Hybrid SVM/HMM Architectures for Speech Recognition. In: Proceedings of the 2000 Speech Transcription Workshop, vol. 4, Maryland, USA, May 2000, pp. 504–507 (2000)
Google Scholar
Padrell-Sendra, J., Martín-Iglesias, D., Díaz-de-María, F.: Support vector machines for continuous speech recognition. In: Proceedings of the 14th European Signal Processing Conference, Florence, Italy (2006)
Google Scholar
Young, S.J., Russell, N.H., Thornton, J.H.S.: Token Passing: a Conceptual Model for Connected Speech Recognition Systems. Technical report, CUED Cambridge University (1989)
Google Scholar
Cosi, P.: Hybrid HMM-NN architectures for connected digit recognition. In: Proceedings of the International Joint Conference on Neural Networks, vol. 5, pp. 85–90 (2000)
Google Scholar
Juneja, A., Espy-Wilson, C.: Segmentation of continuous speech using acoustic-phonetic parameters and statistical learning. In: Proceedings of the 9th International Conference on Neural Information Processing (ICONIP ’02), vol. 2, pp. 726–730 (2002)
Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2004)
Google Scholar
Platt, J.C.: Fast Training of Support Vector Machines Using Sequential Minimal Optimization. In: Advances in Kernel Methods: Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)
Google Scholar
Platt, J.C.: Probabilities for SV Machines. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press, Cambridge (1999)
Google Scholar
Wu, T.F., Lin, C.J., Weng, R.C.: Probability estimates for multi-class classification by pairwise coupling. The Journal of Machine Learning Research 5, 975–1005 (2004)
MathSciNet Google Scholar
Burges, C.J.C.: Simplified support vector decision rules. In: Proceedings of the Thirteenth International Conference on Machine Learning, Bari, Italy, pp. 71–77 (1996)
Google Scholar
Osuna, E., Freund, R., Girosi, F.: An improved training algorithm for support vector machines. In: IEEE Workshop on Neural Networks for Signal Processing, Amelia Island, Florida, USA, pp. 276–285 (1997)
Google Scholar
Gutiérrez, D., Parrado, E., Navia, A.: Mega-GSVC: Training SVMs with Millions of Data. In: Proceedings of the Learning’04 International Conference (2004)
Google Scholar
Parrado, E., Arenas, J., Mora, I., Figueiras, A., Navia, A.: Growing Support Vector Classifiers with Controlled Complexity. Pattern Recognition 36, 1479–1488 (2003)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Signal Theory and Communications Department, EPS-Universidad Carlos III de Madrid, Avda. de la Universidad, 30, 28911-Leganés (Madrid), Spain
R. Solera-Ureña, J. Padrell-Sendra, D. Martín-Iglesias, A. Gallardo-Antolín, C. Peláez-Moreno & F. Díaz-de-María

Authors

R. Solera-Ureña
View author publications
You can also search for this author in PubMed Google Scholar
J. Padrell-Sendra
View author publications
You can also search for this author in PubMed Google Scholar
D. Martín-Iglesias
View author publications
You can also search for this author in PubMed Google Scholar
A. Gallardo-Antolín
View author publications
You can also search for this author in PubMed Google Scholar
C. Peláez-Moreno
View author publications
You can also search for this author in PubMed Google Scholar
F. Díaz-de-María
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Yannis Stylianou Marcos Faundez-Zanuy Anna Esposito

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Solera-Ureña, R., Padrell-Sendra, J., Martín-Iglesias, D., Gallardo-Antolín, A., Peláez-Moreno, C., Díaz-de-María, F. (2007). SVMs for Automatic Speech Recognition: A Survey. In: Stylianou, Y., Faundez-Zanuy, M., Esposito, A. (eds) Progress in Nonlinear Speech Processing. Lecture Notes in Computer Science, vol 4391. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71505-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-71505-4_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71503-0
Online ISBN: 978-3-540-71505-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics