Kurdish Spoken Dialect Recognition Using X-Vector Speaker Embedding

Amani, Arash; Mohammadamini, Mohammad; Veisi, Hadi

doi:10.1007/978-3-030-87802-3_5

Arash Amani¹⁰,
Mohammad Mohammadamini¹¹ &
Hadi Veisi¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12997))

Included in the following conference series:

International Conference on Speech and Computer

1603 Accesses
1 Citations

Abstract

This paper presents a dialect recognition system for the Kurdish language using speaker embedding. Two main goals are followed in this research: first, we investigate the availability of dialect information in speaker embedding, then this information is used for spoken dialect recognition in the Kurdish language. Second, we introduce a public dataset for Kurdish spoken dialect recognition named Zar. The Zar dataset comprises 16,385 utterances in 49 h-36 min for five dialects of the Kurdish language (Northern Kurdish, Central Kurdish, Southern Kurdish, Hawrami, and Zazaki). The dialect recognition is done with x-vector speaker embedding which is trained for speaker recognition using Voxceleb1 and Voxceleb2 datasets. After that, the extracted x-vectors are used to train support vector machine (SVM) and decision tree classifiers for dialect recognition. The results are compared with an i-vector system that is trained specifically for Kurdish spoken dialect recognition. In both systems (i-vector and x-vector), the SVM classifier with 87% of precision results in better performance. Our results show that the information preserved in the speaker embedding can be used for automatic dialect recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/ArashAmani/Kurdish-Dialect-Recognition.

References

Li, H., Ma, B., Lee, K.: Spoken language recognition: from fundamentals to practice. In: Proceedings of the IEEE, vol. 101, issue 5, pp. 1136–1159 (2013). https://doi.org/10.1109/JPROC.2012.2237151
Biadsy, F., Soltauy, H., Manguy, L., Navratily, J., Hirschberg, J.: Discriminative phonotactics for dialect recognition using context-dependent phone classifiers. In: Proceedings of the IEEE Odyssey: Speaker and Language Recognition Workshop, pp. 263–270, Brno, Czech Republic (2010)
Google Scholar
Wang, W., Song, W., Chen, Ch., Zhang, Z., Xin, Y.: I-vector features and deep neural network modeling for language recognition. Procedia Comput. Sci. 147, 36–43 (2019)
Article Google Scholar
Torres-Carrasquillo, P., Gleason, T., Reynolds, D.: Dialect identification using Gaussian Mixture Models (2004)
Google Scholar
Lei, Y., Hansen, J.: Factor analysis-based information integration for Arabic dialect identification. In: 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4337–4340 (2009). https://doi.org/10.1109/ICASSP.2009.4960589
Hanani, A., Naser, R.: Spoken Arabic dialect recognition using X-vectors. Natural Language Engineering. Cambridge University Press (2020)
Google Scholar
Snyder, D., Garcia-Romero, D., McCree, A., Sell, G., Povey, D., Khudanpur, S.: Spoken language recognition using X-vectors. In: Proceedings of the Odyssey 2018 The Speaker and Language Recognition Workshop, pp. 105–111 (2018). https://doi.org/10.21437/Odyssey.2018-15
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: robust DNN embedding for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333 (2018). https://doi.org/10.1109/ICASSP.2018.8461375
Raj, D., Snyder, D., Povey, D., Khudanpur, S.: Probing the information encoded in X-vectors. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 726–733 (2019). https://doi.org/10.1109/ASRU46091.2019.9003979
Mohammadamini, M., Matrouf, D., Bonastre, J-F., Serizel, R., Dowerah, S., Jouvet, D.: Compensate multiple distortions for speaker recognition systems. In: EUSIPCO (2021)
Google Scholar
Veisi, H., MohammadAmini, M., Hosseini, H.: Toward Kurdish language processing: experiments in collecting and processing the AsoSoft text corpus. Digit. Scholarsh. Humanit. 35(1), 176–193 (2020). https://doi.org/10.1093/llc/fqy074
Article Google Scholar
Malmasi, S.: Subdialectal differences in Sorani Kurdish. In: Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects, Osaka, Japan (2016)
Google Scholar
Veisi, H., Hosseini, H., Mohammadamini, M., Fathy, W., Mahmudi, A.: A Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon (2021), https://arxiv.org/abs/2102.07412v1
Abdul, Z.: Kurdish speaker identification based on one dimensional convolutional neural network. Comput. Methods Diff. Equat. 7(4), 566–572 (2019). (Special Issue)
Google Scholar
Hassani, H., Hamid, O.: Using Artificial Neural Networks in Dialect Identification in Less-resourced Languages - The Case of Kurdish Dialects Identification
Google Scholar
Hassani, H., Medjedovic, D.: Automatic Kurdish dialects identification. In: Conference: Fifth International Conference on Natural language Processing, Sydney, Australia (2016)
Google Scholar
Pappagari, R., Wang, T., Villalba, J., Chen, N., Dehak, N.: X-vectors meet emotions: a study on dependencies between emotion and speaker recognition. In: ICASSP (2020)
Google Scholar
Nandwana, M.K., et al.: The VOiCES from a distance challenge 2019: analysis of speaker verification results and remaining challenges. In: Proceedings of the Speaker and Language Recognition Workshop, pp. 165–170. https://doi.org/10.21437/Odyssey.2020-24
Snyder, D., Chen, G., Povey, D.: MUSAN A Music, Speech, and Noise Corpus (2015) arXiv:1510.08484v1

Download references

Author information

Authors and Affiliations

Asosoft Research Group, Tehran, Iran
Arash Amani
Avignon University LIA (Laboratoire Informatique d’Avignon), Avignon, France
Mohammad Mohammadamini
Faculty of New Sciences and Technologies, University of Tehran, Tehran, Iran
Hadi Veisi

Authors

Arash Amani
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Mohammadamini
View author publications
You can also search for this author in PubMed Google Scholar
Hadi Veisi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Amani, A., Mohammadamini, M., Veisi, H. (2021). Kurdish Spoken Dialect Recognition Using X-Vector Speaker Embedding. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-87802-3_5
Published: 22 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87801-6
Online ISBN: 978-3-030-87802-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics