Multimedia Tools and Applications

, Volume 78, Issue 22, pp 31709–31731 | Cite as

Automatic speech patterns recognition of commands using SVM and PSO

  • Gracieth Cavalcanti BatistaEmail author
  • Washington Luis Santos Silva
  • Duarte Lopes de Oliveira
  • Osamu Saotome


This paper proposes the implementation of an Automatic Speech Recognition (ASR) process through extraction of Mel-Frequency Cepstral Coefficients (MFCCs) from voice signal commands, application of the Discrete Cosine Transform (DCT) in these coefficients, Support Vector Machine (SVM) training optimized by the Particle Swarm Optimization (PSO) technique in order to speed up the whole process and using One Against All (OAA) multiclass SVM classification. The main contribution is in training phase that it is the combination of SVM with PSO algorithm, resulting in computational load and processing time reduction. This novel algorithm is called here as PSO-SVM hybrid training application and its performance is shown as the experimental results of voice signal commands in Brazilian Portuguese language. Such commands comprise 10 isolated digits (from zero to nine) and 20 action commands such as “go ahead”, “finish”, “pause”, etc.; that is, there are 30 different pattern types (classes) to be separated (recognized). The process is speaker independent type, that is, the voice bank used in training is different from the one used in tests. The obtained results presented success rates of 92% to 99% during the tests for the classifier using RBF kernel function. Besides, the comparison section shows that this technique is 25 times faster than the recognition without optimization and also, it presents 10% of improvement in recognition success rate when compared to the well-known technique, Gaussian Mixture Models (GMM) algorithm. In addition, the proposed algorithm can be applied in any data processing board for voice signals (DSP, FPGA, DSPIC, ...).


PSO-SVM hybrid training PSO algorithm SVM multiclass Speech patterns recognition 



  1. 1.
    Aggarwal RK, Dave M (2012) Filterbank optimization for robust ASR using GA and PSO. International Journal of Speech Technology, Springer Science + Business Media, vol 9, pp 191–201Google Scholar
  2. 2.
    Ananthi S, Dhanalakshmi P (2014) SVM and HMM modeling techniques for speech recognition using LPCC and MFCC Features, In: Proceedings of the 3rd international conference on frontiers of intelligent computing: theory and applications (FICTA), advances in intelligent systems and computing. Springer, Cham, vol 327, pp 519–526Google Scholar
  3. 3.
    Alzubi J, Nayyar A, Kumar A (2018) Machine learning from theory to algorithms: an overview. In: Journal of physics: conference series. IOP Publishing, vol 1142Google Scholar
  4. 4.
    Batista GC, Silva WLS (2015) Using support vector machines and two dimensional discrete cosine transform in speech automatic recognition. In: 2015 International joint conference on neural networks (IJCNN) - IEEEGoogle Scholar
  5. 5.
    Bresolin AA (2008) Reconhecimento de voz através de unidades menores do que a palavra, utilizando Wavelet Packet e SVM, em uma nova Estrutura Hierárquica de Decisão, UFRN NatalGoogle Scholar
  6. 6.
    Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2:121–167CrossRefGoogle Scholar
  7. 7.
    Cao H, Xin Y, Yuan Q (2015) Prediction of biochar yield from cattle manure pyrolysis via least squares support vector machine intelligent approach. Elsevier Juornal: Bioresource Technology 202:158–164Google Scholar
  8. 8.
    Chao CF, Hong MH (January 2015) The construction of support vector machine classifier using the firefly algorithm, Hindawi Publishing Corporation: Computational Intelligence and Neuroscience, Article ID 212719Google Scholar
  9. 9.
    Chaves JB, Moreno CP, Gantolin A, Maria FD (2005) Multiclass SVM-Based Isolated-Digit Recognition using a HMM-Guided Segmentation. In: Proceedings of the ISCA tutorial and research workshop on non-linear speech processing, Barcelona, pp 137–144Google Scholar
  10. 10.
    Cristianini N, Scholkopf B (2002) Support vector machines and kernel methods: the new generation of learning machines. AI Mag 3:23Google Scholar
  11. 11.
    Dahake PP, Shaw K, Malathi P (2016) Speaker dependent speech emotion recognition using MFCC and support vector machine. In: International conference on automatic control and dynamic optimization techniques (ICACDOT) - IEEEGoogle Scholar
  12. 12.
    De-Gang C, Heng YW, Tsang ECC (July 2008) Generalized Mercer theorem and its application to feature space related to indefinite kernels. In: IEEE Proceedings of the seventh international conference on machine learning and cybernetics, Kunming, pp 12–15Google Scholar
  13. 13.
    Ding CHQ, Dubchak I (2001) Multi-class protein fold recognition using support vector machines and neural networks. Oxford University Press, London, vol 17Google Scholar
  14. 14.
    Dong N, Huang H, Zheng L (2015) Support vector machine in crash prediction at the level of traffic analysis zones: assessing the spatial proximity effects. Elsevier Journal: Accident Analysis and Prevention 82:192–198Google Scholar
  15. 15.
    Engelbrecht AP (2005) Fundamentals of computational swarm intelligence. Wiley Publisher, New YorkGoogle Scholar
  16. 16.
    Engelbrecht AP (2007) Computational intelligence - an introduction, 2nd edn. Wiley Publisher, New YorkCrossRefGoogle Scholar
  17. 17.
    Haykin S (2002) Redes neurais: Princípio e Prática, BookmanGoogle Scholar
  18. 18.
    Kanisha B, Lokesh S, Kumar PM, Parthasarathy P, Babu GC (2018) Speech recognition with improved support vector machine using dual classifiers and cross fitness validation. Person Ubiquit Comput Springer Link 22(5-6):1083–1091CrossRefGoogle Scholar
  19. 19.
    Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of the IEEE international conference on neural networks, pp 1942–1948Google Scholar
  20. 20.
    Kennedy J, Eberhart R (1997) A discrete binary version of the particle swarm algorithm. In: IEEE conference on systems, man and cybernetics, vol 1Google Scholar
  21. 21.
    Kennedy J, Eberhart R, Shi Y (2001) Swarm intelligence. Morgan Kaufmann Publishers, San FranciscoGoogle Scholar
  22. 22.
    Kheirandish A, Shafiabady N, Dahari M, Kazemi MS, Isa D (2016) Modeling of commercial proton exchange membrane fuel cell using support vector machine. Elsevier Journal: ScienceDirect 41:11351–11358Google Scholar
  23. 23.
    Kumar A, Rout SS, Goel V (2017) Speech mel frequency cepstral coefficient feature classification using multi level support vector machine. In: 4th IEEE Uttar Pradesh section international conference on electrical computer and electronics (UPCON)Google Scholar
  24. 24.
    Lazinica A (2009) Particle Swarm Optimization. In-Tech Publisher, ViennaCrossRefGoogle Scholar
  25. 25.
    Manikandan J, Venkataramani B (2011) Design of a real time automatic speech recognition system using modified one against all SVM classifier. Elsevier J Microprocess Microsyst 24:568–578CrossRefGoogle Scholar
  26. 26.
    Manikandan J, Venkataramani B, Avanthi V (2009) FPGA Implementation of support vector machine based isolated digit recognition system. In: IEEE international conference on VLSI Design, New Delhi, pp 347–352Google Scholar
  27. 27.
    Mercer J (1909) Functions of positive and negative type, and their connections with theory of integral equations. In: Philosophical transactions of the royal society of London, vol 209, pp 415–446Google Scholar
  28. 28.
    Najkar N, Razzazi F, Sameti H (2010) A novel approach to HMM-based speech recognition systems using particle swarm optimization. Elsevier Journal: Mathematical and Computer Modeling 52:1910–1920zbMATHGoogle Scholar
  29. 29.
    Nayyar A, Nguyen NG (2018) Introduction to swarm intelligence. In: Advance in swarm intelligence for optimizing problems in computer science. Chapman and Hall/CRC Press, Boston, pp 53–78Google Scholar
  30. 30.
    Nayyar A, Le DN, Nguyen NG (2018) Advances in swarm intelligence for optimizing problems in computer science. CRC Press, Boca RatonGoogle Scholar
  31. 31.
    Nayyar A, Garg S, Gupta D, Khanna A (2018) Evolutionary computation: theory and algorithms, In: Advances in swarm intelligence for optimizing problems in computer science. Chapman and Hall/CRC Press, Boston, pp 1–26Google Scholar
  32. 32.
    Parsopoulos K, Vrahatis M (2010) Particle swarm optimization and intelligence. Advances and Applications, IGI GlobalGoogle Scholar
  33. 33.
    Picone JW (1993) Signal modeling techniques in speech recognition. In: IEEE Transactions on Computer. 9th edn., vol 81, pp 1215–1247Google Scholar
  34. 34.
    Rocha PL, Silva WLS (2016) Intelligent system of speech recognition using neural networks based on DCT parametric models of low order. In: 2016 international joint conference on neural networks (IJCNN). IEEEGoogle Scholar
  35. 35.
    Scholkopf B, Simard O, Smola A, Vapnik V (1999) Prior knowledge in support vector kernels. The MIT Press, vol 2Google Scholar
  36. 36.
    Sheng H, Xiao J (2015) Electric vehicle state of charge estimation: nonlinear correlatino and fuzzy support vector machine. Elsevier Journal: Journal of Power Sources 281:131–137Google Scholar
  37. 37.
    Shieh MY, Chiou JS, Hu YC, Wang KY (2014) Applications of PCA and SVM-PSO based real-time face recognition system, mathematical problems in engineering. Hindawi Publishing Corporation, Article ID 530251Google Scholar
  38. 38.
    Silva WLS (2015) Sistema de inferência genético-nebuloso para reconhecimento de voz: Uma abordagem em modelos preditivos de baixa ordem utilizando a transformada cosseno discreta, Doctoral dissertation, Universidade Federal do MaranhãoGoogle Scholar
  39. 39.
    Vapnik VN (2000) The nature of statistical learning theory, 2nd edn. Springer, BerlinCrossRefGoogle Scholar
  40. 40.
    Vapnik VN, Chervonenkis AY (2015) On the uniform convergence of relative frequencies of their probabilities to events. Springer International Publishing Switzerland 2015 (Received by the editors on May 7 1969), pp 11–30Google Scholar
  41. 41.
    Wang S, Zhang Y, Lv L, Wu R, Fan X, Zhao J, Guo W (2017) Abnormal regional homogeneity as a potential imaging biomarker for adolescent-onset schizophrenia; a resting-state FMRI study and support vector machine analysis. Elsevier Journal: Schizophrenia ResearchGoogle Scholar
  42. 42.
    Ynoguti CA, Violaro F (2008) A brazilian portuguese speech database. In: XXVI Simpósio brasileiro de telecomunicaçõesGoogle Scholar
  43. 43.
    Zarrouk E, Ayed YB, Gargouri F (2014) Hybrid continuous speech recognition systems by HMM, MLP and SVM: a comparative study. International Journal of Speech Technology, Springer Science + Business Media New York, vol 24, pp 223–233Google Scholar
  44. 44.
    Zhang X, Guo Y (2009) Optimization of SVM parameters based on PSO algorithm. In: fifth international conference on natural computation. IEEEGoogle Scholar
  45. 45.
    Zhang S, Liu C, Yao K, Gong Y (2015) Deep neural support vector machines for speech recognition. In: IEEE international conference on acoustics speech and signal processing (ICASSP)Google Scholar
  46. 46.
    Zhang S, et al. (2016) Recurrent support vector machines for speech recognition, Microsoft Corporation. Redmond, WAGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Technological Institute of AeronauticsSao Jose dos CamposBrazil
  2. 2.Federal Institute of MaranhaoSao LuisBrazil

Personalised recommendations