Speech recognition with reference to Assamese language using novel fusion technique

Article
  • 22 Downloads

Abstract

This paper describes the implementation of a speech recognition system in Assamese language. The database for this research work consists of a vocabulary of ten Assamese words. The models for speech recognition have been trained using Hidden Markov Model, Vector Quantization technique and I-vector technique. Two new fusion methods have been proposed in this research study by combining the three techniques.

Keywords

Speech recognition Hidden Markov Model Vector Quantization I-vector Fusion technique Assamese 

References

  1. Balleda, J., Murthy, H. A., & Nagarajan, T. (2000). Language identification from short segments of speech. In Interspeech, Beijing.Google Scholar
  2. Bansal, P., Dev, A., & Jain, S. B. (2007). Automatic speaker identification using vector quantization. Asian Journal of Information Technology, 6(9), 938–942.Google Scholar
  3. Bharali, S. S., & Kalita, S. K. (2015). A comparative study of different features for isolated spoken word recognition using HMM with reference to Assamese language. International Journal of Speech Technology, 18(4), 673–684.CrossRefGoogle Scholar
  4. Biswas, S., Rohdin, J., & Shinoda, K. (2014). I-vector selection for effective PLDA modeling in speaker recognition. In Proceedings Odyssey the speaker and language recognition workshop, Brno (pp. 100–105).Google Scholar
  5. Debyeche, M., Haton, J. P., & Houacine, A. (2014). A new vector quantization approach for discrete HMM speech recognition system. International Journal of Computing, 5(1), 72–78.MathSciNetGoogle Scholar
  6. Dehak, N., Dehak, R., Kenny, P., Brümmer, N., Ouellet, P., & Dumouchel, P. (2009). Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification, vol. 9. In Interspeech, Brighton.Google Scholar
  7. Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., & Ouellet, P. (2011a). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19.4, 788–798.CrossRefGoogle Scholar
  8. Dehak, N., Torres-Carrasquillo, P. A., Reynolds, D. A., & Dehak, R. (2011b). Language recognition via I-vectors and dimensionality reduction. In Interspeech, Florence (pp. 857–860).Google Scholar
  9. En-Naimani, Z. A. K. A. R. I., A. E., Lazaar, M. O. H. A. M. E. D., & Ettaouil, M. O. H. A. M. E. D. (2014). Hybrid system of optimal self organizing maps and hidden Markov Model for Arabic digits recognition. WSEAS Transactions on Systems, 13(60), 606–616.Google Scholar
  10. Garcia-Romero, D., & Espy-Wilson, C. Y. (2011). Analysis of I-vector length normalization in speaker recognition systems. In Interspeech, Florence (pp. 249–252).Google Scholar
  11. Hassan, F., Khan, M. S. A., Kotwal, M. R. A., & Huda, M. N. (2012). Gender independent bangia automatic speech recognition. In International Conference on Informatics, Electronics & Vision (ICIEV).Google Scholar
  12. Kanagasundaram, A., Vogt, R., Dean, D. B., Sridharan, S., & Mason, M. W. (2011). I-vector based speaker recognition on short utterances. In Proceedings of the 12th annual conference of the international speech communication association. International speech communication association (ISCA), Florence (pp. 2341–2344).Google Scholar
  13. Kumar, K., & Aggarwal, R. K. (2011). Hindi speech recognition system using HTK. International Journal of Computing and Business Research, 2(2), 2229–6166.Google Scholar
  14. Kumar, K., Aggarwal, R. K., & Jain, A. (2012). A Hindi speech recognition system for connected words using HTK. International Journal of Computational Systems Engineering, 1(1), 25–32.CrossRefGoogle Scholar
  15. Kumar, R., & Singh, M. (2011). Spoken isolated word recognition of Punjabi language using dynamic time warp technique. In Information systems for Indian languages. Berlin: Springer (pp. 301–301).Google Scholar
  16. Kurian, C., & Balakrishnan, K. (2009). Speech recognition of Malayalam numbers. In IEEE World Congress on Nature & Biologically Inspired Computing, 2009. NaBIC 2009, Coimbatore (pp. 1475–1479).Google Scholar
  17. Matějka, P., Glembek, O., Castaldo, F., Alam, M. J., Plchot, O., Kenny, P., & Černocky, J. (2011). Full-covariance UBM and heavy-tailed PLDA in I-vector speaker verification. In IEEE International conference on acoustics, speech and signal processing (ICASSP) IEEE, Prague (pp. 4828).Google Scholar
  18. Misra, D. D., Dutta, K., Bhattacharjee, U., Sarma, K. K., & Goswami, P. K. (2015). Assamese vowel speech recognition using GMM and ANN approaches. In Recent trends in intelligent and emerging systems (pp. 163–170). New Delhi: Springer.Google Scholar
  19. Muslima, U., & Islam, M. B. (2014). Experimental framework for mel-scaled LP based Bangla speech recognition. In 2013 IEEE 16th international conference on computer and information technology (ICCIT), Khulna (pp. 56–59).Google Scholar
  20. Pruthi, T., Saksena, S., & Das, P. K. (2000). Swaranjali: Isolated word recognition for Hindi language using VQ and HMM. In international conference on multimedia processing and systems (ICMPS), Chennai.Google Scholar
  21. Rabiner, L. R. (1989). A tutorial on hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.CrossRefGoogle Scholar
  22. Rabiner, L. R., & Juang, B. H. (1986). An introduction to hidden Markov Models. IEEE ASSP Magazine, 3(1), 4–16.CrossRefGoogle Scholar
  23. Rabiner, L. R., Levinson, S. E., & Sondhi, M. M. (1983). On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition. Bell System Technical Journal, 62(4), 1075–1105.CrossRefGoogle Scholar
  24. Razavi, M., Rasipuram, R., & Magimai-Doss, M. (2014). On modeling context-dependent clustered states: Comparing HMM/GMM, hybrid HMM/ANN and KL-HMM approaches. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) , New York (pp. 7659–7663).Google Scholar
  25. Senoussaoui, M., Kenny, P., Dehak, N., & Dumouchel, P. (2010). An I-vector extractor suitable for speaker recognition with both microphone and telephone speech. In Odyssey, Brno.Google Scholar
  26. Sharma, M., & Sarma, K. K. (2015). Dialectal Assamese vowel speech detection using acoustic phonetic features, KNN and RNN. In 2015 IEEE 2nd international conference on signal processing and integrated networks (SPIN), Noida (pp. 674–678).Google Scholar
  27. Soong, F. K., Rosenberg, A. E., Juang, B. H., & Rabiner, L. R. (1987). Report: A vector quantization approach to speaker recognition. AT&T Technical Journal, 66(2), 14–26.CrossRefGoogle Scholar
  28. Verma, P., & Das, P. K. (2015). i-Vectors in speech processing applications: A survey. International Journal of Speech Technology, 18(4), 529–546.CrossRefGoogle Scholar
  29. Zarrouk, E., Ayed, Y. B., & Gargouri, F. (2014). Hybrid continuous speech recognition systems by HMM, MLP and SVM: A comparative study. International Journal of Speech Technology, 17(3), 223–233.CrossRefGoogle Scholar
  30. Zeinali, H., Sameti, H., & Burget, L. (2017). HMM-based phrase-independent I-vector extractor for text-dependent speaker verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(7), 1421–1435.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Gauhati UniversityGuwahatiIndia

Personalised recommendations