Advertisement

Speech Signal Analysis for Language Identification Using Tensors

  • Shubham Jain
  • Bhagath ParabattinaEmail author
  • Pradip Kumar Das
Conference paper
  • 28 Downloads
Part of the Communications in Computer and Information Science book series (CCIS, volume 1241)

Abstract

Language detection is the first step in speech recognition systems. It helps these systems to use grammar and semantics of a language in a better way. Due to these reasons, active research is being carried out in language identification. Every language has specific sound patterns, rhythm, tone, nasal features, etc. We have proposed an approach based on Tensor that uses MFCCs for determining the characteristic features of a language that can be used to identify a spoken language. Tensor based algorithms perform quite well for higher dimensions and scale quite well as compared to classic maximum likelihood estimation (MLE) used in latent variable modeling. Also, this approaches does not suffer from slow convergence and require fewer data points for learning. We have conducted language identification experiments on native Indian English and Hindi for some chosen speakers, and an accuracy of around 70% is observed.

Keywords

Language identification Tensor analysis MFCC 

References

  1. 1.
    How many languages are there in the world in 2020? (surprising results). https://www.theintrepidguide.com/how-many-languages-are-there-in-the-world/#.Xlj1vHUzZuQ. Accessed 28 Feb 2020
  2. 2.
    Albawi, S., Mohammed, T.A., Al-Zawi, S.: Understanding of a convolutional neural network. In: 2017 International Conference on Engineering and Technology (ICET), pp. 1–6, August 2017.  https://doi.org/10.1109/ICEngTechnol.2017.8308186
  3. 3.
    Biemond, J., Lagendijk, R.L.: The expectation-maximization (EM) algorithm applied to image identification and restoration. In: Proceedings of the ICCON IEEE International Conference on Control and Applications, pp. 231–235, April 1989.  https://doi.org/10.1109/ICCON.1989.770513
  4. 4.
    Boyajian, A.: The tensor - a new engineering tool. Electr. Eng. 55(8), 856–862 (1936).  https://doi.org/10.1109/EE.1936.6539021CrossRefzbMATHGoogle Scholar
  5. 5.
    Bartz, C., Herold, T., Yang, H., Meinel, C.: Language identification using deep convolutional recurrent neural networks. arXiv preprint arXiv:1708.04811 (2017)
  6. 6.
    Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998).  https://doi.org/10.1109/5254.708428CrossRefGoogle Scholar
  7. 7.
    Hossan, M.A., Memon, S., Gregory, M.A.: A novel approach for MFCC feature extraction. In: 2010 4th International Conference on Signal Processing and Communication Systems, pp. 1–5, December 2010.  https://doi.org/10.1109/ICSPCS.2010.5709752
  8. 8.
    Hsu, D., Kakade, S.M.: Learning mixtures of spherical Gaussians: moment methods and spectral decompositions. In: Proceedings of the 4th Conference on Innovations in Theoretical Computer Science, ITCS 2013, pp. 11–20. ACM, New York (2013).  https://doi.org/10.1145/2422436.2422439. http://doi.acm.org/10.1145/2422436.2422439
  9. 9.
    Lei, X., Tu, G.H., Liu, A.X., Li, C.Y., Xie, T.: The insecurity of home digital voice assistants-Amazon Alexa as a case study. arXiv preprint arXiv:1712.03327 (2017)
  10. 10.
    López, G., Quesada, L., Guerrero, L.A.: Alexa vs. Siri vs. Cortana vs. Google assistant: a comparison of speech-based natural user interfaces. In: Nunes, I. (ed.) AHFE 2017. AISC, vol. 592, pp. 241–250. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-60366-7_23CrossRefGoogle Scholar
  11. 11.
    Madhu, C., George, A., Mary, L.: Automatic language identification for seven Indian languages using higher level features. In: 2017 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES), pp. 1–6, August 2017.  https://doi.org/10.1109/SPICES.2017.8091332
  12. 12.
    Mohamed, O.M.M., Jaïdane-Saïdane, M.: Generalized Gaussian mixture model. In: 2009 17th European Signal Processing Conference, pp. 2273–2277, August 2009Google Scholar
  13. 13.
    Rabanser, S., Shchur, O., Günnemann, S.: Introduction to tensor decompositions and their applications in machine learning. arXiv preprint arXiv:1711.10781 (2017)
  14. 14.
    Rabiner, L., Juang, B.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4–16 (1986).  https://doi.org/10.1109/MASSP.1986.1165342CrossRefGoogle Scholar
  15. 15.
    Reynolds, D.A., Campbell, W.M., Shen, W., Singer, E.: Automatic language recognition via spectral and token based approaches. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds.) Springer Handbook of Speech Processing. SH, pp. 811–824. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-49127-9_41CrossRefGoogle Scholar
  16. 16.
    Sinha, S., Jain, A., Agrawal, S.S.: Fusion of multi-stream speech features for dialect classification. CSI Trans. ICT 2(4), 243–252 (2015).  https://doi.org/10.1007/s40012-015-0063-yCrossRefGoogle Scholar
  17. 17.
    Tierney, J.: A study of LPC analysis of speech in additive noise. IEEE Trans. Acoust. Speech Signal Process. 28(4), 389–397 (1980).  https://doi.org/10.1109/TASSP.1980.1163423CrossRefGoogle Scholar
  18. 18.
    Torres-Carrasquillo, P.A., Reynolds, D.A., Deller, J.R.: Language identification using gaussian mixture model tokenization. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. I-757–I-760, May 2002.  https://doi.org/10.1109/ICASSP.2002.5743828

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.IIT GuwahatiGuwahatiIndia

Personalised recommendations