Speech Signal Analysis for Language Identification Using Tensors

Jain, Shubham; Parabattina, Bhagath; Das, Pradip Kumar

doi:10.1007/978-981-15-6318-8_25

Speech Signal Analysis for Language Identification Using Tensors

Conference paper
First Online: 15 June 2020

1093 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1241))

Abstract

Language detection is the first step in speech recognition systems. It helps these systems to use grammar and semantics of a language in a better way. Due to these reasons, active research is being carried out in language identification. Every language has specific sound patterns, rhythm, tone, nasal features, etc. We have proposed an approach based on Tensor that uses MFCCs for determining the characteristic features of a language that can be used to identify a spoken language. Tensor based algorithms perform quite well for higher dimensions and scale quite well as compared to classic maximum likelihood estimation (MLE) used in latent variable modeling. Also, this approaches does not suffer from slow convergence and require fewer data points for learning. We have conducted language identification experiments on native Indian English and Hindi for some chosen speakers, and an accuracy of around 70% is observed.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://bit.ly/lowResourceSpeechDataset.

References

How many languages are there in the world in 2020? (surprising results). https://www.theintrepidguide.com/how-many-languages-are-there-in-the-world/#.Xlj1vHUzZuQ. Accessed 28 Feb 2020
Albawi, S., Mohammed, T.A., Al-Zawi, S.: Understanding of a convolutional neural network. In: 2017 International Conference on Engineering and Technology (ICET), pp. 1–6, August 2017. https://doi.org/10.1109/ICEngTechnol.2017.8308186
Biemond, J., Lagendijk, R.L.: The expectation-maximization (EM) algorithm applied to image identification and restoration. In: Proceedings of the ICCON IEEE International Conference on Control and Applications, pp. 231–235, April 1989. https://doi.org/10.1109/ICCON.1989.770513
Boyajian, A.: The tensor - a new engineering tool. Electr. Eng. 55(8), 856–862 (1936). https://doi.org/10.1109/EE.1936.6539021
Article MATH Google Scholar
Bartz, C., Herold, T., Yang, H., Meinel, C.: Language identification using deep convolutional recurrent neural networks. arXiv preprint arXiv:1708.04811 (2017)
Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998). https://doi.org/10.1109/5254.708428
Article Google Scholar
Hossan, M.A., Memon, S., Gregory, M.A.: A novel approach for MFCC feature extraction. In: 2010 4th International Conference on Signal Processing and Communication Systems, pp. 1–5, December 2010. https://doi.org/10.1109/ICSPCS.2010.5709752
Hsu, D., Kakade, S.M.: Learning mixtures of spherical Gaussians: moment methods and spectral decompositions. In: Proceedings of the 4th Conference on Innovations in Theoretical Computer Science, ITCS 2013, pp. 11–20. ACM, New York (2013). https://doi.org/10.1145/2422436.2422439. http://doi.acm.org/10.1145/2422436.2422439
Lei, X., Tu, G.H., Liu, A.X., Li, C.Y., Xie, T.: The insecurity of home digital voice assistants-Amazon Alexa as a case study. arXiv preprint arXiv:1712.03327 (2017)
López, G., Quesada, L., Guerrero, L.A.: Alexa vs. Siri vs. Cortana vs. Google assistant: a comparison of speech-based natural user interfaces. In: Nunes, I. (ed.) AHFE 2017. AISC, vol. 592, pp. 241–250. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60366-7_23
Chapter Google Scholar
Madhu, C., George, A., Mary, L.: Automatic language identification for seven Indian languages using higher level features. In: 2017 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES), pp. 1–6, August 2017. https://doi.org/10.1109/SPICES.2017.8091332
Mohamed, O.M.M., Jaïdane-Saïdane, M.: Generalized Gaussian mixture model. In: 2009 17th European Signal Processing Conference, pp. 2273–2277, August 2009
Google Scholar
Rabanser, S., Shchur, O., Günnemann, S.: Introduction to tensor decompositions and their applications in machine learning. arXiv preprint arXiv:1711.10781 (2017)
Rabiner, L., Juang, B.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4–16 (1986). https://doi.org/10.1109/MASSP.1986.1165342
Article Google Scholar
Reynolds, D.A., Campbell, W.M., Shen, W., Singer, E.: Automatic language recognition via spectral and token based approaches. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds.) Springer Handbook of Speech Processing. SH, pp. 811–824. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-49127-9_41
Chapter Google Scholar
Sinha, S., Jain, A., Agrawal, S.S.: Fusion of multi-stream speech features for dialect classification. CSI Trans. ICT 2(4), 243–252 (2015). https://doi.org/10.1007/s40012-015-0063-y
Article Google Scholar
Tierney, J.: A study of LPC analysis of speech in additive noise. IEEE Trans. Acoust. Speech Signal Process. 28(4), 389–397 (1980). https://doi.org/10.1109/TASSP.1980.1163423
Article Google Scholar
Torres-Carrasquillo, P.A., Reynolds, D.A., Deller, J.R.: Language identification using gaussian mixture model tokenization. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. I-757–I-760, May 2002. https://doi.org/10.1109/ICASSP.2002.5743828

Download references

Author information

Authors and Affiliations

IIT Guwahati, North Amingaon, Guwahati, 781039, Assam, India
Shubham Jain, Bhagath Parabattina & Pradip Kumar Das

Authors

Shubham Jain
View author publications
You can also search for this author in PubMed Google Scholar
Bhagath Parabattina
View author publications
You can also search for this author in PubMed Google Scholar
Pradip Kumar Das
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bhagath Parabattina .

Editor information

Editors and Affiliations

National Institute of Technology Silchar, Silchar, India
Arup Bhattacharjee
National Institute Of Technology Silchar, Silchar, India
Samir Kr. Borgohain
National Institute of Technology Silchar, Silchar, India
Badal Soni
National Institute of Technology Kurukshetra, Kurukshetra, India
Gyanendra Verma
University of Eastern Finland, Kuopio, Finland
Xiao-Zhi Gao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jain, S., Parabattina, B., Das, P.K. (2020). Speech Signal Analysis for Language Identification Using Tensors. In: Bhattacharjee, A., Borgohain, S., Soni, B., Verma, G., Gao, XZ. (eds) Machine Learning, Image Processing, Network Security and Data Sciences. MIND 2020. Communications in Computer and Information Science, vol 1241. Springer, Singapore. https://doi.org/10.1007/978-981-15-6318-8_25

Download citation

DOI: https://doi.org/10.1007/978-981-15-6318-8_25
Published: 15 June 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6317-1
Online ISBN: 978-981-15-6318-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics