Skip to main content

SVMs for Automatic Speech Recognition: A Survey

  • Chapter
Progress in Nonlinear Speech Processing

Abstract

Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). Nevertheless, we are still far from achieving high-performance ASR systems. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the late eighties and early nineties. Some of them tackled the ASR problem using predictive ANNs, while others proposed hybrid HMM/ANN systems. However, despite some achievements, nowadays, the preponderance of Markov Models is a fact.

During the last decade, however, a new tool appeared in the field of machine learning that has proved to be able to cope with hard classification problems in several fields of application: the Support Vector Machines (SVMs). The SVMs are effective discriminative classifiers with several outstanding characteristics, namely: their solution is that with maximum margin; they are capable to deal with samples of a very higher dimensionality; and their convergence to the minimum of the associated cost function is guaranteed.

These characteristics have made SVMs very popular and successful. In this chapter we discuss their strengths and weakness in the ASR context and make a review of the current state-of-the-art techniques. We organize the contributions in two parts: isolated-word recognition and continuous speech recognition. Within the first part we review several techniques to produce the fixed-dimension vectors needed for original SVMs. Afterwards we explore more sophisticated techniques based on the use of kernels capable to deal with sequences of different length. Among them is the DTAK kernel, simple and effective, which rescues an old technique of speech recognition: Dynamic Time Warping (DTW). Within the second part, we describe some recent approaches to tackle more complex tasks like connected digit recognition or continuous speech recognition using SVMs. Finally we draw some conclusions and outline several ongoing lines of research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sakoe, H., Isotani, R., Yoshida, K., Iso, K., Watanabe, T.: Speaker-Independent Word Recognition using Dynamic Programming Neural Networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Glasgow, Scotland, pp. 439–442 (1989)

    Google Scholar 

  2. Iso, K., Watanabe, T.: Speaker-Independent Word Recognition using a Neural Prediction Model. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Alburquerque, New Mexico, USA, pp. 441–444 (1990)

    Google Scholar 

  3. Tebelskis, J., Waibel, A., Petek, B., Schmidbauer, O.: Continuous Speech Recognition using Predictive Neural Networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, pp. 61–64 (1991)

    Google Scholar 

  4. Bourlard, H., Morgan, N.: Connectionist speech recognition: a hybrid approach. Kluwer Academic Publishers, Dordrecht (1994)

    Google Scholar 

  5. Schlkopf, B., Smola, A.: Learning with kernels. MIT Press, Cambridge (2002)

    Google Scholar 

  6. Vapnik, V.: Statistical Learning Theory. Wiley, Chichester (1998)

    MATH  Google Scholar 

  7. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    MATH  Google Scholar 

  8. Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K.: Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech and Signal Processing 37, 328–339 (1989)

    Article  Google Scholar 

  9. Robinson, T., Fallside, F.: A recurrent error propagation network speech recognition system. Computer, Speech and Language 5, 259–274 (1991)

    Article  Google Scholar 

  10. Trentin, E., Gori, M.: A survey of hybrid ann/hmm models for automatic speech recognition. Neurocomputing 37, 91–126 (2001)

    Article  MATH  Google Scholar 

  11. Bourlard, H., Morgan, N.: Continuous speech recognition by connectionist statistical methods. IEEE Transactions on Neural Networks 4, 893–909 (1993)

    Article  Google Scholar 

  12. Robinson, T., Hochberg, M., Renals, S.: The Use of Recurrent Neural Networks in Continuous Speech Recognition. In: Automatic Speech and Speaker Recognition - Advanced Topics, pp. 159–184. Kluwer Academic Publishers, Norwell (1995)

    Google Scholar 

  13. Reichl, W., Ruske, G.: A hybrid rbf-hmm system for continuous speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Detroit, MI, USA, pp. 3335–3338 (1995)

    Google Scholar 

  14. Ellis, D., Singh, R., Sivadas, S.: Tandem-acoustic modeling in large-vocabulary recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Salt Lake City, Utah, USA, pp. 517–520 (2001)

    Google Scholar 

  15. Boser, B.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Computational Learning Theory, pp. 144–152 (1992)

    Google Scholar 

  16. Pérez-Cruz, F., Bousquet, O.: Kernel Methods and Their Potential Use in Signal Processing. IEEE Signal Processing Magazine 21(3), 57–65 (2004)

    Article  Google Scholar 

  17. Fletcher, R.: Practical Methods of Optimization. Wiley-Interscience, New York (1987)

    MATH  Google Scholar 

  18. Navia-Vázquez, A., Pérez-Cruz, F., Artés-Rodríguez, A., Figueiras-Vidal, A.R.: Weighted Least Squares Training of Support Vector Classifiers leading to Compact and Adaptive Schemes. IEEE Transactions on Neural Networks 12(5), 1047–1059 (2001)

    Article  Google Scholar 

  19. Fine, S., Navratil, J., Gopinath, R.A.: A hybrid gmm/svm approach to speaker identification. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, Salt Lake City, Utah, USA, pp. 417–420 (2001)

    Google Scholar 

  20. Le, Q., Bengio, S.: Client Dependent GMM-SVM Models for Speaker Verification. In: Kaynak, O., Alpaydın, E., Oja, E., Xu, L. (eds.) ICANN 2003 and ICONIP 2003. LNCS, vol. 2714, pp. 443–451. Springer, Heidelberg (2003)

    Google Scholar 

  21. Ma, C., Randolph, M.A., Drish, J.: A support vector machines-based rejection technique for speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, Salt Lake City, Utah, USA, pp. 381–384 (2001)

    Google Scholar 

  22. Hsu, C.W., Lin, C.J.: A Comparison of Methods for Multi-class Support Vector Machines. IEEE Transactions on Neural Networks 13(2), 415–425 (2002)

    Article  Google Scholar 

  23. Ganapathiraju, A., Hamaker, J.E., Picone, J.: Applications of support vector machines to speech recognition. IEEE Transactions on Signal Processing 52, 2348–2355 (2004)

    Article  Google Scholar 

  24. Thubthong, N., Kijsirikul, B.: Support vector machines for thai phoneme recognition. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 9, 803–813 (2001)

    MATH  Google Scholar 

  25. Clarkson, P., Moreno, P.J.: On the use of support vector machines for phonetic classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, Phoenix, Arizona, USA, pp. 585–588 (1999)

    Google Scholar 

  26. Sekhar, C., Lee, W.F., Takeda, K., Itakura, F.: Acoustic modelling of subword units using support vector machines. In: Workshop on spoken language processing, Mumbai, India (2003)

    Google Scholar 

  27. Young, S.: HTK-Hidden Markov Model Toolkit (ver 2.1). Cambridge University Press, Cambridge (1995)

    Google Scholar 

  28. García-Cabellos, J.M., Peláez-Moreno, C., Gallardo-Antolín, A., Pérez-Cruz, F., Díaz-de-María, F.: SVM Classifiers for ASR: A Discusion about Parameterization. In: Proceedings of EUSIPCO 2004, Wien, Austria, pp. 2067–2070 (2004)

    Google Scholar 

  29. Ech-Cherif, A., Kohili, M., Benyettou, A., Benyettou, M.: Lagrangian support vector machines for phoneme classification. In: Proceedings of the 9th International Conference on Neural Information Processing (ICONIP ’02), vol. 5, Singapore, pp. 2507–2511 (2002)

    Google Scholar 

  30. Martín-Iglesias, D., Bernal-Chaves, J., Peláez-Moreno, C., Gallardo-Antolín, A., Díaz-de-María, F.: A Speech Recognizer Based on Multiclass SVMs with HMM-Guided Segmentation. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds.) NOLISP 2005. LNCS (LNAI), vol. 3817, pp. 257–266. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  31. Solera-Ureña, R., Martín-Iglesias, D., Gallardo-Antolín, A., Peláez-Moreno, C., Díaz-de-María, F.: Robust ASR using Support Vector Machines. Speech Communication, Elsevier, submitted (2006)

    Google Scholar 

  32. Gangashetty, S.V., Sekhar, C., Yegnanarayana, B.: Combining evidence from multiple classifiers for recognition of consonant-vowel units of speech in multiple languages. In: Proceedings of the International Conference on Intelligent Sensing and Information Processing, Chennai, India, pp. 387–391 (2005)

    Google Scholar 

  33. Shimodaira, H., Noma, K.I., Nakai, M., Sagayama, S.: Support vector machine with dynamic time-alignment kernel for speech recognition. In: Proceedings of Eurospeech, Aalborg, Denmark, pp. 1841–1844 (2001)

    Google Scholar 

  34. Shimodaira, H., Noma, K., Nakai, M.: Dynamic Time-Alignment Kernel in Support Vector Machine. In: Advances in Neural Information Processing Systems 14, vol. 2, pp. 921–928. MIT Press, Cambridge (2002)

    Google Scholar 

  35. Rabiner, L.R., Rosenberg, A.E., Levinson, S.E.: Considerations in Dynamic Time Warping Algorithms for Discrete Word Recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 26(6), 575–582 (1978)

    Article  MATH  Google Scholar 

  36. Glass, J.R.: A probabilistic framework for segment-based speech recognition. Computer Speech and Language 17, 137–152 (2003)

    Article  Google Scholar 

  37. Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. Technical report, Dept. of Computer Science, Univ. of California (1998)

    Google Scholar 

  38. Smith, N.D., Gales, M.J.F.: Using SVMs and discriminative models for speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, Orlando, Florida, USA, pp. 77–80 (2002)

    Google Scholar 

  39. Smith, N.D., Gales, M.J.F.: Speech recognition using SVMs. In: Advances in Neural Information Processing Systems 14, pp. 1197–1204. MIT Press, Cambridge (2002)

    Google Scholar 

  40. Smith, N.D., Niranjan, M.: Data-dependent Kernels in SVM Classification of Speech Patterns. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP), vol. 1, Beijing, China, pp. 297–300 (2000)

    Google Scholar 

  41. Wan, V., Renals, S.: Speaker verification using sequence discriminant support vector machines. IEEE Transactions on Speech and Audio Processing 13, 203–210 (2005)

    Article  Google Scholar 

  42. Ganapathiraju, A., Hamaker, J., Picone, J.: Hybrid SVM/HMM Architectures for Speech Recognition. In: Proceedings of the 2000 Speech Transcription Workshop, vol. 4, Maryland, USA, May 2000, pp. 504–507 (2000)

    Google Scholar 

  43. Padrell-Sendra, J., Martín-Iglesias, D., Díaz-de-María, F.: Support vector machines for continuous speech recognition. In: Proceedings of the 14th European Signal Processing Conference, Florence, Italy (2006)

    Google Scholar 

  44. Young, S.J., Russell, N.H., Thornton, J.H.S.: Token Passing: a Conceptual Model for Connected Speech Recognition Systems. Technical report, CUED Cambridge University (1989)

    Google Scholar 

  45. Cosi, P.: Hybrid HMM-NN architectures for connected digit recognition. In: Proceedings of the International Joint Conference on Neural Networks, vol. 5, pp. 85–90 (2000)

    Google Scholar 

  46. Juneja, A., Espy-Wilson, C.: Segmentation of continuous speech using acoustic-phonetic parameters and statistical learning. In: Proceedings of the 9th International Conference on Neural Information Processing (ICONIP ’02), vol. 2, pp. 726–730 (2002)

    Google Scholar 

  47. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2004)

    Google Scholar 

  48. Platt, J.C.: Fast Training of Support Vector Machines Using Sequential Minimal Optimization. In: Advances in Kernel Methods: Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)

    Google Scholar 

  49. Platt, J.C.: Probabilities for SV Machines. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press, Cambridge (1999)

    Google Scholar 

  50. Wu, T.F., Lin, C.J., Weng, R.C.: Probability estimates for multi-class classification by pairwise coupling. The Journal of Machine Learning Research 5, 975–1005 (2004)

    MathSciNet  Google Scholar 

  51. Burges, C.J.C.: Simplified support vector decision rules. In: Proceedings of the Thirteenth International Conference on Machine Learning, Bari, Italy, pp. 71–77 (1996)

    Google Scholar 

  52. Osuna, E., Freund, R., Girosi, F.: An improved training algorithm for support vector machines. In: IEEE Workshop on Neural Networks for Signal Processing, Amelia Island, Florida, USA, pp. 276–285 (1997)

    Google Scholar 

  53. Gutiérrez, D., Parrado, E., Navia, A.: Mega-GSVC: Training SVMs with Millions of Data. In: Proceedings of the Learning’04 International Conference (2004)

    Google Scholar 

  54. Parrado, E., Arenas, J., Mora, I., Figueiras, A., Navia, A.: Growing Support Vector Classifiers with Controlled Complexity. Pattern Recognition 36, 1479–1488 (2003)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Yannis Stylianou Marcos Faundez-Zanuy Anna Esposito

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this chapter

Cite this chapter

Solera-Ureña, R., Padrell-Sendra, J., Martín-Iglesias, D., Gallardo-Antolín, A., Peláez-Moreno, C., Díaz-de-María, F. (2007). SVMs for Automatic Speech Recognition: A Survey. In: Stylianou, Y., Faundez-Zanuy, M., Esposito, A. (eds) Progress in Nonlinear Speech Processing. Lecture Notes in Computer Science, vol 4391. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71505-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71505-4_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71503-0

  • Online ISBN: 978-3-540-71505-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics