Abstract
Language detection is the first step in speech recognition systems. It helps these systems to use grammar and semantics of a language in a better way. Due to these reasons, active research is being carried out in language identification. Every language has specific sound patterns, rhythm, tone, nasal features, etc. We have proposed an approach based on Tensor that uses MFCCs for determining the characteristic features of a language that can be used to identify a spoken language. Tensor based algorithms perform quite well for higher dimensions and scale quite well as compared to classic maximum likelihood estimation (MLE) used in latent variable modeling. Also, this approaches does not suffer from slow convergence and require fewer data points for learning. We have conducted language identification experiments on native Indian English and Hindi for some chosen speakers, and an accuracy of around 70% is observed.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
How many languages are there in the world in 2020? (surprising results). https://www.theintrepidguide.com/how-many-languages-are-there-in-the-world/#.Xlj1vHUzZuQ. Accessed 28 Feb 2020
Albawi, S., Mohammed, T.A., Al-Zawi, S.: Understanding of a convolutional neural network. In: 2017 International Conference on Engineering and Technology (ICET), pp. 1–6, August 2017. https://doi.org/10.1109/ICEngTechnol.2017.8308186
Biemond, J., Lagendijk, R.L.: The expectation-maximization (EM) algorithm applied to image identification and restoration. In: Proceedings of the ICCON IEEE International Conference on Control and Applications, pp. 231–235, April 1989. https://doi.org/10.1109/ICCON.1989.770513
Boyajian, A.: The tensor - a new engineering tool. Electr. Eng. 55(8), 856–862 (1936). https://doi.org/10.1109/EE.1936.6539021
Bartz, C., Herold, T., Yang, H., Meinel, C.: Language identification using deep convolutional recurrent neural networks. arXiv preprint arXiv:1708.04811 (2017)
Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998). https://doi.org/10.1109/5254.708428
Hossan, M.A., Memon, S., Gregory, M.A.: A novel approach for MFCC feature extraction. In: 2010 4th International Conference on Signal Processing and Communication Systems, pp. 1–5, December 2010. https://doi.org/10.1109/ICSPCS.2010.5709752
Hsu, D., Kakade, S.M.: Learning mixtures of spherical Gaussians: moment methods and spectral decompositions. In: Proceedings of the 4th Conference on Innovations in Theoretical Computer Science, ITCS 2013, pp. 11–20. ACM, New York (2013). https://doi.org/10.1145/2422436.2422439. http://doi.acm.org/10.1145/2422436.2422439
Lei, X., Tu, G.H., Liu, A.X., Li, C.Y., Xie, T.: The insecurity of home digital voice assistants-Amazon Alexa as a case study. arXiv preprint arXiv:1712.03327 (2017)
López, G., Quesada, L., Guerrero, L.A.: Alexa vs. Siri vs. Cortana vs. Google assistant: a comparison of speech-based natural user interfaces. In: Nunes, I. (ed.) AHFE 2017. AISC, vol. 592, pp. 241–250. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60366-7_23
Madhu, C., George, A., Mary, L.: Automatic language identification for seven Indian languages using higher level features. In: 2017 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES), pp. 1–6, August 2017. https://doi.org/10.1109/SPICES.2017.8091332
Mohamed, O.M.M., Jaïdane-Saïdane, M.: Generalized Gaussian mixture model. In: 2009 17th European Signal Processing Conference, pp. 2273–2277, August 2009
Rabanser, S., Shchur, O., Günnemann, S.: Introduction to tensor decompositions and their applications in machine learning. arXiv preprint arXiv:1711.10781 (2017)
Rabiner, L., Juang, B.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4–16 (1986). https://doi.org/10.1109/MASSP.1986.1165342
Reynolds, D.A., Campbell, W.M., Shen, W., Singer, E.: Automatic language recognition via spectral and token based approaches. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds.) Springer Handbook of Speech Processing. SH, pp. 811–824. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-49127-9_41
Sinha, S., Jain, A., Agrawal, S.S.: Fusion of multi-stream speech features for dialect classification. CSI Trans. ICT 2(4), 243–252 (2015). https://doi.org/10.1007/s40012-015-0063-y
Tierney, J.: A study of LPC analysis of speech in additive noise. IEEE Trans. Acoust. Speech Signal Process. 28(4), 389–397 (1980). https://doi.org/10.1109/TASSP.1980.1163423
Torres-Carrasquillo, P.A., Reynolds, D.A., Deller, J.R.: Language identification using gaussian mixture model tokenization. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. I-757–I-760, May 2002. https://doi.org/10.1109/ICASSP.2002.5743828
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Jain, S., Parabattina, B., Das, P.K. (2020). Speech Signal Analysis for Language Identification Using Tensors. In: Bhattacharjee, A., Borgohain, S., Soni, B., Verma, G., Gao, XZ. (eds) Machine Learning, Image Processing, Network Security and Data Sciences. MIND 2020. Communications in Computer and Information Science, vol 1241. Springer, Singapore. https://doi.org/10.1007/978-981-15-6318-8_25
Download citation
DOI: https://doi.org/10.1007/978-981-15-6318-8_25
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6317-1
Online ISBN: 978-981-15-6318-8
eBook Packages: Computer ScienceComputer Science (R0)