Emirati-accented speaker identification in each of neutral and shouted talking environments
This work is devoted to capturing Emirati-accented speech database (Arabic United Arab Emirates database) in each of neutral and shouted talking environments in order to study and enhance text-independent Emirati-accented “speaker identification performance in shouted environment” based on each of “first-order circular suprasegmental hidden Markov models (CSPHMM1s), second-order circular suprasegmental hidden Markov models (CSPHMM2s), and third-order circular suprasegmental hidden Markov models (CSPHMM3s)” as classifiers. In this research, our database was collected from 50 Emirati native speakers (25 per gender) uttering eight common Emirati sentences in each of neutral and shouted talking environments. The extracted features of our collected database are called “Mel-Frequency Cepstral Coefficients (MFCCs)”. Our results show that average Emirati-accented speaker identification performance in neutral environment is 94.0, 95.2, and 95.9% based on CSPHMM1s, CSPHMM2s, and CSPHMM3s, respectively. On the other hand, the average performance in shouted environment is 51.3, 55.5, and 59.3% based, respectively, on “CSPHMM1s, CSPHMM2s, and CSPHMM3s”. The achieved “average speaker identification performance in shouted environment based on CSPHMM3s” is very similar to that obtained in “subjective assessment by human listeners”.
KeywordsEmirati-accented speech database Hidden Markov models Neutral talking environment Shouted talking environment Speaker identification Suprasegmental hidden Markov models
The authors of this work wish to thank University of Sharjah for funding their work through the competitive research project entitled “Capturing, Studying, and Analyzing Arabic Emirati-Accented Speech Database in Stressful and Emotional Talking Environments for Different Applications”, No. 1602040349-P. The authors wish also to thank engineers Merah Al Suwaidi, Deema Al Rais, and Hannah Saud for capturing the Emirati-accented speech database.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no competing interests.
Research involving animal rights
This study does not involve any animal participants.
- Al-Dahri, S. S., Al-Jassar, Y. H., Alotaibi, Y. A., Alsulaiman, M. M., & Abdullah-Al-Mamun, K. A. (2008). A word-dependent automatic Arabic speaker identification system. In signal processing and information technology (ISSPIT 2008) (pp. 198–202).Google Scholar
- https://catalog.ldc.upenn.edu/LDC2002S02 (West Point Arabic Speech).
- https://catalog.ldc.upenn.edu/LDC2006S43 (Gulf Arabic Conversational Telephone Speech)
- Kirchhoff, K., Bilmes, J., Das, S., Duta, N., Egan, M., Ji, G., He, F., Henderson, J., Liu, D., Noamany, M., Schone, P., Schwartz, R., Vergyri, D. (2003) Novel approaches to Arabic speech recognition: Report from the 2002 Johns-Hopkins workshop. In proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP) (vol. 1, 2003, pp. 344–347).Google Scholar
- Krobba, A., Debyeche, M., Amrouche, A. (2010) Evaluation of speaker identification system using GSMEFR speech data, Proc. 2010 International Conference on Design & Technology of Integrated Systems in Nanoscale Era, Hammamet, March 2010, pp. 1–5.Google Scholar
- Pavel, M., Ondrej, G., Ondrej, N., Oldrich, P., Frantisek, G., Lukas, B., & Jan, H. C. (2016). Analysis of DNN approaches to speaker identification. In International conference on acoustics, speech and signal processing 2016 (pp. 5100–5104).Google Scholar
- Polzin, T. S., & Waibel, A. H. (1998). Detecting emotions in speech, cooperative multimodal communication. In 2nd International Conference 1998, CMC, 1998.Google Scholar
- Shahin, I. (2016). Emirati speaker verification based on HMM1s, HMM2s, and HMM3s. In 13th International Conference on Signal Processing (ICSP 2016), Chengdu, China, November 2016, pp. 562–567, https://doi.org/10.1109/ICSP.2016.7877896.
- Shahin, I., & Ba-Hutair, M. N. (2014). Emarati speaker identification. In 12th International Conference on Signal Processing (ICSP 2014) (pp. 488–493). HangZhou, China.Google Scholar
- Shahin, I., & Botros, N., Modeling and analyzing the vocal tract under normal and stressful talking conditions. In IEEE SOUTHEASTCON 2001., Clemson, March 2001, pp. 213–220.Google Scholar
- Staroniewicz, P., & Majewski, W. (2004) SVM based text-dependent speaker identification for large set of voices. In 12th European Signal Processing Conference, EUSIPCO 2004, Vienna, Austria, September 2004, pp. 333–336.Google Scholar
- Zheng, C., & Yuan, B. Z. (1988). Text-dependent speaker identification using circular hidden Markov models. In IEEE International Conference on Acoustics, Speech and Signal Processing, S13.3, pp. 580–582.Google Scholar