Comparative Evaluation of Speech Recognition Systems Based on Different Toolkits

  • Fatima Barkani
  • Hassan Satori
  • Mohamed Hamidi
  • Ouissam Zealouk
  • Naouar Laaidi
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1076)


Speech recognition is a method that allows machines to convert the incoming speech signals into text commands. This paper presents a brief survey on automatic speech recognition systems based on HTK, Julius, MATLAB, Sphinx and Kaldi. A description of the mentioned speech recognition systems is discussed, and the structure and performance of these different systems are presented.


Speech recognition HMMs CMU Sphinx HTK Julius Kaldi 


  1. 1.
    Karpagavalli, S., Deepika, R., Kokila, P., Usha Rani, K., Chandra, E.: Automatic speech recognition: architecture, methodologies and challenges-a review. Int. J. Adv. Res. Comput. Sci. 2(6) (2011)Google Scholar
  2. 2.
    Satori, H., ElHaoussi, F.: Investigation Amazing speech recognition using CMU tools. Int. J. Speech Technol. 17(3), 235–243 (2014)CrossRefGoogle Scholar
  3. 3.
    Satori, H., Zealouk, O., Satori, K., ElHaoussi, F.: Voice comparison between smokers and non-smokers using HMM speech recognition system. Int. J. Speech Technol. 20(4), 771–777 (2017)CrossRefGoogle Scholar
  4. 4.
    Hamidi, M., Satori, H., Satori, K.: Implementing a voice interface in VOIP network with IVR server using Amazing digits. Int. J. Multi. Sci. 2, 38–43 (2016)Google Scholar
  5. 5.
    Hamidi, M., Satori, H., Zealouk, O., Satori, K.: Speech coding effect on amazing alphabet speech recognition performance. J. Adv. Res. Dyn. Control Syst. 11(2), 1392–1400 (2019)Google Scholar
  6. 6.
    Zealouk, O., Satori, H., Hamidi, M., Satori, K.: Speech recognition for Moroccan dialects: feature extraction and classification methods. J. Adv. Res. Dyn. Control Syst. 11(2), 1401–1408 (2019)Google Scholar
  7. 7.
    Dua, M., Aggarwal, R.K., Kadyan, V., Dua, S.: Punjabi automatic speech recognition using HTK. Int. J. Comput. Sci. Issues (IJCSI) 9(4), 359 (2012)Google Scholar
  8. 8.
    El Ghazi, A., Daoui, C., Idrissi, N., Fakir, M., Bouikhalene, B.: Speech recognition system based on Hidden Markov Model concerning the Moroccan dialect DARIJA. Global J. Comput. Sci. Technol (2011)Google Scholar
  9. 9.
    Ilham, A., Hassan, S., Khalid, S.: Building a first amazing database for automatic audiovisual speech recognition system. In: Proceedings of the 2nd International Conference on Smart Digital Environment, pp. 94–99. ACM (2018, October)Google Scholar
  10. 10.
    Medennikov, I., Prudnikov, A.: Advances in STC Russian spontaneous speech recognition system. In: International Conference on Speech and Computer, pp. 116–123. Springer, Cham (2016, August)Google Scholar
  11. 11.
    Peddinti, V., Manohar, V., Wang, Y., Povey, D., Khudanpur, S.: Far-field ASR without parallel data. In: Interspeech, pp. 1996–2000 (2016, September)Google Scholar
  12. 12.
    Mittal, S., Kaur, R.: Implementation of word level speech recognition system for Punjabi language. Int. J. Comput. Appl. 146(3) (2016)Google Scholar
  13. 13.
    Husnain, S.K., Beg, A., Awan, M.S.: Frequency analysis of spoken Urdu numbers using MATLAB and Simulink. PAF KIET J. Eng. Sci. 1, 5 (2007)Google Scholar
  14. 14.
    Kimutai, S.K., Milgo, E., Gichoya, D.: Isolated Swahili words recognition using Sphinx4. Int. J. Emerg. Sci. Eng. 2(2), 2319–6378 (2013)Google Scholar
  15. 15.
    Al-Qatab, B.A., Ainon, R.N.: Arabic speech recognition using hidden Markov model toolkit (HTK). In: 2010 International Symposium in Information Technology (ITSim), Vol. 2, pp. 557–562. IEEE (2010, June)Google Scholar
  16. 16.
    Kumar, K., Aggarwal, R.K., Jain, A.: A Hindi speech recognition system for connected words using HTK. Int. J. Comput. Syst. Eng. 1(1), 25–32 (2012)CrossRefGoogle Scholar
  17. 17.
    Mohamed, H., Hassan, S., Ouissam, Z., Khalid, S., Naouar, L.: Interactive voice response server voice network administration using hidden Markov model speech recognition system. In: 2018 Second World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4), pp. 16–21. IEEE (2018, October)Google Scholar
  18. 18.
    Kraleva, R., Kralev, V.: On model architecture for a children’s speech recognition interactive dialog system. (2016). arXiv preprint arXiv:1605.07733
  19. 19.
    Hayes, B.: First links in the Markov chain. Am. Sci. 101(2), 252 (2013)MathSciNetGoogle Scholar
  20. 20.
    Rabiner, L.R., Juang, B.H.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4–16 (1986)Google Scholar
  21. 21.
    CMUSphinx, Open Source Toolkit For Speech Recognition, Project By CMU, “Sphinx-4 Application Programmer’s Guide”Google Scholar
  22. 22.
    Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Valtchev, V.: The HTK Book, 3rd edn, p. 175. Cambridge University Engineering Department, Cambridge (2002)Google Scholar
  23. 23.
    Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine julius. In: Proceedings: APSIPA ASC 2009: Asia-Pacific signal and information processing association, 2009 annual summit and conference. Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference, International Organizing Committee, pp. 131–137 (2009)Google Scholar
  24. 24.
    Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P. and Silovsky, J.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (No. EPFL-CONF-192584). IEEE Signal Processing Society (2011)Google Scholar
  25. 25.
    Campbell, D., Palomaki, K., Brown, G.: A MATLAB simulation of “shoebox” room acoustics for use in research and teaching. Comput. Inf. Syst. 9(3), 48 (2005)Google Scholar
  26. 26.
    Yang, H., Oehlke, C., Meinel, C.: German speech recognition: a solution for the analysis and processing of lecture recordings. In: IEEE/ACIS 10th International Conference on Computer and Information Science (ICIS), 2011, pp. 201–206. IEEE (2011, May)Google Scholar
  27. 27.
    Gaida, C., Lange, P., Petrick, R., Proba, P., Malatawy, A., Suendermann-Oeft, D.: Comparing Open-source Speech Recognition Toolkits. Technical Report. DHBW Stuttgart, Stuttgart (2014)Google Scholar
  28. 28.
    Samudravijaya, K., Barot, M.: A comparison of public domain software tools for speech recognition. In: WSLP2003, pp. 125–131 (2003)Google Scholar
  29. 29.
    Ma, G., Zhou, W., Zheng, J., You, X. Ye, W. A Comparison between HTK and SPHINX on Chinese Mandarin. In: 2009 International Joint Conference on Artificial Intelligence, pp. 394–397. IEEE (2009)Google Scholar
  30. 30.
    Vertanen, K.: Baseline WSJ acoustic models for HTK and Sphinx: Training recipes and recognition experiments. Technical report). Cavendish Laboratory, Cambridge, United Kingdom (2006)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  • Fatima Barkani
    • 1
  • Hassan Satori
    • 1
  • Mohamed Hamidi
    • 1
  • Ouissam Zealouk
    • 1
  • Naouar Laaidi
    • 1
  1. 1.LIIAN Laboratory, Faculty of Sciences Dhar MahrazSidi Mohammed Ben Abbdallah UniversityFezMorocco

Personalised recommendations