Circuits, Systems, and Signal Processing

, Volume 37, Issue 8, pp 3589–3604 | Cite as

Acoustic Feature Analysis and Discriminative Modeling for Language Identification of Closely Related South-Asian Languages

  • Farah AdeebaEmail author
  • Sarmad Hussain


With the advancement in technology, communication between people around the world from different linguistic backgrounds is increasing gradually, resulting in the requirement of language identification services. Language identification techniques extract distinguishable information as features of a language from the speech corpora to differentiate one language from other. Without publicly available speech corpora, comparison between different techniques will not be much reliable. This paper investigates state-of-the-art features and techniques for language identification of under-resource and closely related languages, namely Pashto, Punjabi, Sindhi, and Urdu. For language identification, speech corpus is designed and collected for mentioned languages. The dataset is a read speech data collected over telephone network (mobile and landline) from different regions of Pakistan. The speech corpus is annotated at the sentence level using X-SAMPA, its orthographic transcription is also provided, and verified data are divided into training and evaluation sets. Mel-frequency cepstral coefficients and their shifted delta cepstral features are used to develop language identification system of target languages. Gaussian mixture model with universal background model (GMM-UBM)-based and I-vector-based language identification approaches are investigated. The results show that GMM-UBM is more effective than the I-vector for language identification of short duration test utterances.


Speech corpus Language identification Urdu Sindhi Pashto Punjabi 



The authors would like to acknowledge Ashok Kumar Khatri, Asad Mustafa, and Inaam-ullah Torwali of Center for Language Engineering, for their assistance in development of phonetic lexicon of Sindhi, Punjabi, and Pashto languages.


  1. 1.
    (16 Feb 2017). The 2011 NIST Language Recognition Evaluation Results.
  2. 2.
    (2017). Gurumukhi–Shahmukhi Transliteration.
  3. 3.
  4. 4.
    1998 Census Report of Pakistan, Islamabad1998Google Scholar
  5. 5.
    F. Adeeba, S. Hussain, T. Habib, E. Ul-Haq, K. S. Shahid, Comparison of Urdu text to speech synthesis using unit selection and HMM based techniques, Presented at the Oriental COCOSDA Bali (Indonesia, 2016)Google Scholar
  6. 6.
    F. Adeeba, Q.-u.-A. Akram, H. Khalid, S. Hussain, CLE Urdu books N-grams, Presented at the Conference on Language and Technology (Karachi, Pakistan, 2014)Google Scholar
  7. 7.
    A.K.H. Al-Ali, D. Dean, B. Senadji, V. Chandran, G.R. Naik, Enhanced forensic speaker verification using a combination of DWT and MFCC feature warping in the presence of noise and reverberation conditions. (IEEE Access, 2017) pp. 15400–15413Google Scholar
  8. 8.
    M.H. Bahari, N. Dehak, H.V. Hamme, L. Burget, A.M. Ali, J. Glass, Non-negative factor analysis of gaussian mixture model weight adaptation for language and dialect recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22, 1117–1129 (2014)CrossRefGoogle Scholar
  9. 9.
    H. Behravan, V. Hautamäki, T. Kinnunen, Factors affecting i-vector based foreign accent recognition: a case study in spoken Finnish. Speech Commun. 66, 118–129 (2015)CrossRefGoogle Scholar
  10. 10.
    N. Bertoldi, M. Federico, Cross-Language Spoken Document Retrieval on the TREC SDR Collection, in Advances in Cross-Language Information Retrieval: Third Workshop of the Cross-Language Evaluation Forum, CLEF 2002 Rome, Italy, Sept 19–20, 2002 Revised Papers, C. Peters, M. Braschler, J. Gonzalo, M. Kluck, (Eds.), (Springer, Berlin, 2003) pp. 476–481Google Scholar
  11. 11.
    P. Boersma, Praat, a system for doing phonetics by computer. Glot Int. 5, 341–345 (2001)Google Scholar
  12. 12.
    J.P.C.W.M. Campbell, D.A. Reynolds, E. Singer, P.A. Torres-Carrasquillo, Support vector machines for speaker and language recognition, Presented at the Odyssey 2004: The speaker and Language Recognition Workshop (2006)Google Scholar
  13. 13.
    L. Chi-Yueh, W. Hsiao-Chuan, Language identification using pitch contour information, in Proceedings of (ICASSP’05) IEEE International Conference on Acoustics, Speech, and Signal Processing (2005), pp. 601–604Google Scholar
  14. 14.
    N. Dehak, P.A. Torres-Carrasquillo, D. Reynolds, R. Dehak, Language recognition via ivectors and dimensionality reduction (2011)Google Scholar
  15. 15.
    N. Dehak, P. Dumouchel, P. Kenny, Modeling prosodic features with joint factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 15, 2095–2103 (2007)CrossRefGoogle Scholar
  16. 16.
    K.C. Djamel MOSTEFA, Sylvie BRUNESSAUX, Karim Boudahmane, New language resources for the Pashto language, Presented at the Language Resource and Evaluation (LREC) (Istanbul, Turkey, 2012)Google Scholar
  17. 17.
    M. Djellab, A. Amrouche, A. Bouridane, N. Mehallegue, Algerian Modern Colloquial Arabic Speech Corpus (AMCASC): regional accents recognition within complex socio-linguistic environments. Lang. Resour. Eval. 51, 613–641 (2017)CrossRefGoogle Scholar
  18. 18.
    Ethnologue. (2017, 14 Jan 2017). Sindhi.
  19. 19.
    M. Farooq, An Acoustic Phonetic Study of Six Accents of Urdu in Pakistan. MS thesis, Department of English Language and Literature, University of Management and Technology (2014)Google Scholar
  20. 20.
    C.Y.E.-W.D. Garcia-Romero, Analysis of i-vector length normalization in speaker recognition systems, Presented at the Interspeech (Florence, 2011)Google Scholar
  21. 21.
    R.G. Gordon, Ethnologue: Languages of the World, 15th edn. (SIL International, Dallas, 2005)Google Scholar
  22. 22.
    G.A. Grierson, Linguistic Survey of India. vol. Volume IX: Indo-Aryan family. Central group, edn (Office of the Superintendent of Government Printing India, Calcutta, 1916), p. 609Google Scholar
  23. 23.
    W. Habib, R.H. Basit, S. Hussain, F. Adeeba, Design of speech corpus for open domain Urdu text to speech system using greedy algorithm, in Conference on Language and Technology (CLT) (Karachi, 2014)Google Scholar
  24. 24.
    M. India, J.A.R. Fonollosa, J. Hernando, LSTM Neural Network-based speaker segmentationusing acoustic and language modelling, in Interspeech Stockholm (Sweden, 2017), pp. 2834–2838Google Scholar
  25. 25.
  26. 26.
    P. Kenny, A small footprint i-vector extractor, in ODYSSEY (2012)Google Scholar
  27. 27.
    M.A. Kohler, M. Kennedy, Language identification using shifted delta cepstra, in The 2002 45th Midwest Symposium on Circuits and Systems, MWSCAS-2002, vol. 3 (2002), pp. III-69-72Google Scholar
  28. 28.
    T. Lander, R.A. Cole, B.T. Oshika, M. Noel, The OGI 22 language telephone speech corpus, in 4th European Conference on Speech Communication and Technology (Madrid, 1995)Google Scholar
  29. 29.
    H. Li, B. Ma, K.A. Lee, Spoken language recognition: from fundamentals to practice. Proc. IEEE 101, 1136–1159 (2013)CrossRefGoogle Scholar
  30. 30.
    S.O.S.G. Liu, T. Hasan, J.W. Suh, C. Zhang, M. Mehrabani, H. Boril, J.H.L. Hansen, UTD-CRSS systems for NIST language recognition evaluation 2011, Presented at the NIST 2011 Language Recognition Evaluation Workshop (2011)Google Scholar
  31. 31.
    Y. Liu, L. He, Y. Tian, Z. Chen, J. Liu, M.T. Johnson, Comparison of multiple features and modeling methods for text-dependent speaker verification. CoRR abs/1707.04373 (2017)Google Scholar
  32. 32.
    B. Ma, C. Guan, H. Li, C.-H. Lee, Multilingual speech recognition with language identification, in INTERSPEECH (2002)Google Scholar
  33. 33.
    A. Martin, A. Le, D. Graff, J. v. Santen. (2017). 2007 NIST Language Recognition Evaluation Supplemental Training Set.
  34. 34.
    D. Martínez, O. Plchot, L. Burget, O. Glembek, P. Matejka, Language recognition in ivectors space, in Proceedings of Interspeech (Firenze, 2011), pp. 861–864Google Scholar
  35. 35.
    L. Mary, B. Yegnanarayana, Prosodic features for language identification, in International Conference on Signal Processing, Communications and Networking, 2008. ICSCN’08 (2008), pp. 57–62Google Scholar
  36. 36.
    P. Matejka, P. Schwarz, J. Cernocky, P. Chytil, Phonotactic language identification using high quality phoneme recognition, in Proceedings og Eurospeech 2005 (2005)Google Scholar
  37. 37.
    P. Mewaram, A Sindhi-English Dictionary (The Sind Juvenile Co-operative Society, Hyderabad, 1910)Google Scholar
  38. 38.
    G.R. Naik, Measure of quality of source separation for sub-and super-Gaussian audio mixtures. Informatica 23, 581–599 (2012)MathSciNetzbMATHGoogle Scholar
  39. 39.
    G.R. Naik, W. Wang, Audio analysis of statistically instantaneous signals with mixed Gaussian probability distributions. Int. J. Electron. 99, 1333–1350 (2012)CrossRefGoogle Scholar
  40. 40.
    R.W.M. Ng, T. Lee, C.C. Leung, B. Ma, H. Li, Analysis and selection of prosodic features for language identification, in International Conference Asian Language Processing, IALP’09 (2009), pp. 123–128Google Scholar
  41. 41.
    Y. Obuchi, N. Sato, Language identification using phonetic and prosodic HMMs with feature Normalization, in Proceedings of (ICASSP’05). IEEE International Conference on Acoustics, Speech, and Signal Processing (2005), pp. 569–572Google Scholar
  42. 42.
    A. Poddar, M. Sahidullah, G. Saha, Performance comparison of speaker recognition systems in presence of duration variability, in 2015 annual IEEE India conference (INDICON) India (2015), pp. 1–6Google Scholar
  43. 43.
    Punjab Post.
  44. 44.
  45. 45.
    D.A. Reynolds, T.F. Quatieri, R.B. Dunn, Speaker verification using adapted Gaussian mixture models. Digit. Signal Process. 10, 19–41 (2000)CrossRefGoogle Scholar
  46. 46.
    M. Scarpiniti, F. Garzia, Security monitoring based on joint automatic speaker recognition and blind source separation, in International Carnahan Conference on Security Technology (ICCST) (Rome, 2014), pp. 1–6Google Scholar
  47. 47.
    W. Shen, W. Campbell, T. Gleason, D. Reynolds, E. Singer, Experiments with Lattice-based PPRLM language identification, in IEEE Odyssey—The Speaker and Language Recognition Workshop (2006), pp. 1–6Google Scholar
  48. 48.
    E. Singer, P.A. Torres-Carrasquillo, D.A. Reynolds, A. McCree, F. Richardson, N. Dehak,et al., The MITLL NIST LRE 2011 language recognition system, in ODYSSEY (2012)Google Scholar
  49. 49.
    S. Strassel, K. Walker, K. Jones, D. Graff, C. Cieri, New resources for recognition of confusable linguistic varieties: the LRE11 corpus. Presented at the Odyssey 2012: The Speaker and Language Recognition Workshop (Singapore, 2012)Google Scholar
  50. 50.
    Z.H. Tan, B. Lindberg, Low-complexity variable frame rate analysis for speech recognition and voice activity detection. IEEE J. Sel. Top. Signal Process. 4, 798–807 (2010)CrossRefGoogle Scholar
  51. 51.
    The Hidden Markov Model Toolkit.
  52. 52.
    S. Urooj, S. Hussain, F. Adeeba, F. Jabeen, R. Parveen, CLE Urdu digest corpus, in Conference on Language and Technology(CLT) (Lahore, 2012), pp. 47–53Google Scholar
  53. 53.
    A. Waibel, P. Geutner, L.M. Tomokiyo, T. Schultz, M. Woszczyna, Multilinguality in speech and spoken language systems. Proc. IEEE 88, 1297–1313 (2000)CrossRefGoogle Scholar
  54. 54.
    J.C. Wells, Computer-coding the IPA: a proposed extension of SAMPA (1999)Google Scholar
  55. 55.
    C.-H. Wu, G.-L. Yan, Acoustic feature analysis and discriminative modeling of filled pauses for spontaneous speech recognition. J. VLSI Signal Process. Syst. Signal Image Video Technol. 36, 91–104 (2004)CrossRefGoogle Scholar
  56. 56.
    F. Yokomori, Y. Ninomiya, M. Morise, A. Tanaka, K. Ozawa, Acoustic feature analysis focusing on gender difference in likability evaluation of female speech. Trans. Jpn. Soc. Kansei Eng. 15, 721–729 (2016)CrossRefGoogle Scholar
  57. 57.
    Q. Zhang, H. Bo, x, il, J.H.L. Hansen, Supervector pre-processing for PRSVM-based Chinese and Arabic dialect identification, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (2013), pp. 7363–7367Google Scholar
  58. 58.
    Q. Zhang, G. Liu, J.H. Hansen, Robust language recognition based on diverse features, in ODYSSEY: The Speaker and Language and Language Recognition Workshop (2014), pp. 152–157Google Scholar
  59. 59.
    X. Zhang, D. Wang, Deep learning based binaural speech separation in reverberant environments. IEEE/ACM Trans. Audio Speech Lang. Process. 25, 1075–1084 (2017)CrossRefGoogle Scholar
  60. 60.
    V.W. Zue, J.R. Glass, Conversational interfaces: advances and challenges. Proc. IEEE 88, 1166–1180 (2000)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Center for Language Engineering (CLE), Al-Khawarizmi Institute of Computer Science (KICS)University of Engineering and TechnologyLahorePakistan

Personalised recommendations