Advertisement

Artificial Intelligence Review

, Volume 51, Issue 4, pp 647–672 | Cite as

Empirical analysis of linguistic and paralinguistic information for automatic dialect classification

  • Shweta SinhaEmail author
  • Aruna Jain
  • Shyam S. Agrawal
Article
  • 97 Downloads

Abstract

Current research in automatic speech recognition is primarily concerned with the correct evaluation of linguistic information transmitted in the speech signal and with the identification of variations, naturally present in speech. These differences in speech may be due to the individual’s age; gender; or speaking style influenced by his dialect. Undoubtedly, the focus of research in this field is to strengthen further the techniques developed thus far, regarding their reliability and accuracy. The endeavour of this research paper is to primarily concentrate on analysis and modelling of linguistic and paralinguistic information embedded in the speech signal for discovering the similarities and dissimilarities among acoustic characteristics arising out of different dialects. This paper investigates the influence of dialectal variations, by measuring and analysing certain acoustic features such as formant frequencies, pitch, pitch slope, duration and intensity of vowel sounds. For automatic identification of native dialect, these differences are further exploited, given a sample of native speaker’s speech. For the classification of dialect in the spoken utterances support vector machines along with dialect-specific Gaussian mixture models were used. The system performance is compared with human perception of dialects. The proposed study focuses on various dialects of one of the world’s major language; Hindi.

Keywords

Support vector machine Gaussian mixture model Dialect identification Acoustic feature Human perception 

References

  1. Adank P, Van Hout R, Van de Velde H (2007) An acoustic description of the vowels of northern and southern standard Dutch II: regional varietiesa. J Acoust Soc Am 121(2):1130–1141CrossRefGoogle Scholar
  2. Aggarwal RK, Dave M (2012) Integration of multiple acoustic and language models for improved Hindi speech recognition system. Int J Speech Technol 15(2):165–180CrossRefGoogle Scholar
  3. Agrawal SS, Jain A, Sinha S (2016) Analysis and modeling of acoustic information for automatic dialect classification. Int J Speech Technol 19(3):593–609CrossRefGoogle Scholar
  4. Barkat M, Ohala J, Pellegrino F (1999) Prosody as a distinctive feature for the discrimination of Arabic dialects. Eurospeech 99:395–398Google Scholar
  5. Behravan H, Hautamäki V, Kinnunen T (2015) Factors affecting i- vector based foreign accent recognition: a case study in spoken Finnish. Speech Commun 66:118–129CrossRefGoogle Scholar
  6. Biadsy F (2011) Automatic dialect and accent recognition and its application to speech recognition. Ph.D. Thesis, Columbia UniversityGoogle Scholar
  7. Biadsy F, Hirschberg J, Ellis DPW (2011) Dialect and accent recognition using phonetic-segmentation supervectors. In: INTERSPEECH, pp 752–756Google Scholar
  8. Bianchini M, Frasconi P, Gori M (1995) Learning in multilayered networks used as autoassociators. IEEE Trans Neural Netw 6(2):512–515CrossRefGoogle Scholar
  9. Blackburn CS, Vonwiller J, King RW (1993) Automatic accent classification using artificial neural networks. In: EUROSPEECH, vol 2, pp 1241–1244Google Scholar
  10. Chambers JK, Trudgill P (1998) Dialectology. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  11. Chan MV , Feng X , Heinen JA, Niederjohn RJ (1994) Classification of speech accents with neural networks. In: Neural networks, world congress on computational intelligence, vol 7, pp 4483–4486. IEEEGoogle Scholar
  12. Chen T, Huang C, Chang E, Wang J (2001) Automatic accent identification using Gaussian mixture models. In: Workshop on automatic speech recognition and understanding, pp 343–346. IEEEGoogle Scholar
  13. Cho T, Keating PA (2001) Articulatory and acoustic studies on domain-initial strengthening in Korean. J Phonetics 29(2):155–190CrossRefGoogle Scholar
  14. Deivapalan PG, Jha M, Guttikonda R, Murthy HA (2008) DONLabel: an automatic labeling tool for Indian languages. Energy 2:4Google Scholar
  15. DeMarco A, Cox SJ (2013) Native accent classification via i-vectors and speaker compensation fusion. In :INTERSPEECH, pp 1472–1476Google Scholar
  16. Dyrud LO (2001) Hindi-Urdu: stress accent or non-stress accent?. Ph.D. Thesis, University of North DakotaGoogle Scholar
  17. Ganapathiraju A, Hamaker J, Picone J, Ordowski M, Doddington GR (2001) Syllable-based large vocabulary continuous speech recognition. IEEE Trans Speech Audio Process 9(4):358–366CrossRefGoogle Scholar
  18. Gang L, Lei Y , Hansen JHL (2010) Dialect identification: impact of differences between read versus spontaneous speech. In: Signal processing conference, 2010 18th European, pp 2003–2006. IEEEGoogle Scholar
  19. Hanani A, Russell MJ, Carey MJ (2013) Human and computer recognition of regional accents and ethnic groups from British English speech. Comput Speech Lang 27(1):59–74CrossRefGoogle Scholar
  20. Hansen JHL, Arslan JHL (1995) Foreign accent classification using source generator based prosodic features. In: Proceeding acoustics, speech, and signal processing, vol 1, pp 836–839. IEEEGoogle Scholar
  21. Hou J, Liu Y, Zheng TF, Olsen J, Tian J (2010) Multi- layered features with SVM for Chinese accent identification. In: Proceeding audio language and image processing (ICALIP), pp 25–30. IEEEGoogle Scholar
  22. Huang R, Hansen JHL, Angkititrakul P (2007) Dialect/accent classification using unrestricted audio. IEEE Trans Audio Speech Lang Process 15(2):453–464CrossRefGoogle Scholar
  23. Koolagudi SG, Maity S, Vuppala AK, Chakrabarti S, Rao KS (2009) IITKGP-SESC: speech database for emotion analysis. In: Contemporary computing. Springer, Berlin, pp 485–492Google Scholar
  24. Kulshreshtha M, Mathur R (2012) Dialect accent features for establishing speaker identity: a case study. Springer, BerlinCrossRefGoogle Scholar
  25. Kumar M, Rajput N, Verma A (2004) A large-vocabulary continuous speech recognition system for Hindi. IBM J Res Dev 48(5.6):703–715CrossRefGoogle Scholar
  26. Kumpf K, King K (1997) Foreign speaker accent classification using phoneme-dependent accent discrimination models and comparisons with human perception benchmarks. In: EUROSPEECH, pp 2323–2326Google Scholar
  27. Ladefoged P, Broadbent DE (1957) Information conveyed by vowels. J Acoust Soc Am 29(1):98–104CrossRefGoogle Scholar
  28. Lazaridis A, Goldman J-P, Avanzi M, Garner PN (2014) Syllable-based regional Swiss French accent identification using prosodic features. In: Nouveaux cahiers de linguistique francaise, number EPFL-CONF-199821Google Scholar
  29. Levent M, Hansen JHL (1996) Language accent classification in American English. Speech Commun 18(4):353–367CrossRefGoogle Scholar
  30. Liu M, Xu B, Hunng T, Deng Y, Li C ( 2000) Mandarin accent adaptation based on context-independent/context-dependent pronunciation modeling. In: Proceedings acoustics, speech, and signal processing, vol 2, pp II1025–II1028. IEEEGoogle Scholar
  31. Ljolje A, Fallside F (1987) Recognition of isolated prosodic patterns using Hidden Markov models. Comput Speech Lang 2(1):27–34CrossRefGoogle Scholar
  32. Ma B, Zhu D, Tong R (2006) Chinese dialect identification using tone features based on pitch flux. In :Acoustics, speech and signal processing, vol 1, pp I–I. IEEEGoogle Scholar
  33. Mehrabani M, Boril H, Hansen JHL (2010) Dialect distance assessment method based on comparison of pitch pattern statistical models. In: Acoustics speech and signal processing (ICASSP), pp 5158–5161. IEEEGoogle Scholar
  34. Mishra D, Bali K (2011) A comparative phonological study of the dialects of Hindi. In: Proceedings of ICPhS XVII, Hong Kong, pp 17–21Google Scholar
  35. Ohala M (1986) A search for the phonetic correlates of Hindi stress. In: Krishnamurti B, Masica C, Sinha A (eds) South Asian languages: structure, convergence, and diglossia, pp 81–92Google Scholar
  36. OShaughnessy D (2008) Automatic speech recognition: history, methods and challenges. Pattern Recogn 41(10):2965–2979CrossRefzbMATHGoogle Scholar
  37. Peters J, Gilles P, Auer P, Selting M (2002) Identification of regional varieties by intonational cues: an experimental study on Hamburg and Berlin German. Lang Speech 45(2):115–138CrossRefGoogle Scholar
  38. Rabiner L, Juang B-H (1993) Fundamentals of speech recognition. Prentice Hall, Upper Saddle RiverGoogle Scholar
  39. Raman S (1985) Speech recognition of Hindi stop consonants. Ph.D. Thesis, Indian Institute of Technology, MadrasGoogle Scholar
  40. Rao PVS (1993) VOICE: an integrated speech recognition synthesis system for the Hindi language. Speech Commun 13(1):197–205MathSciNetCrossRefGoogle Scholar
  41. Rao KS, Koolagudi SG (2012) Emotion recognition using speech features. Springer, BerlinzbMATHGoogle Scholar
  42. Rao KS, Yegnanarayana B (2009) Intonation modeling for Indian languages. Comput Speech Lang 23(2):240–256CrossRefGoogle Scholar
  43. Ryan R (2008) Multiclass classification. http://www.mit.edu/~9.520/spring09/Classes/. Accessed 20 Sept 2014
  44. Rym H, Melissa B-D, Emmanuel F, François P (2004) Speech timing and rhythmic structure in Arabic dialects: a comparison of two approaches. Interspeech 4:1613–1616Google Scholar
  45. Sekhar CC, Yegnanarayana B (2002) A constraint satisfaction model for recognition of stop consonant-vowel (SCV) utterances. IEEE Trans Speech Audio Process 10(7):472–480CrossRefGoogle Scholar
  46. Sinha S, Agrawal SS, Jain A (2013) Dialectal influences on acoustic duration of Hindi phonemes. In: Conference on Asian spoken language research and evaluation (O- COCOSDA/CASLRE), pp 1–5. IEEEGoogle Scholar
  47. Sinha S, Jain A, Agrawal SS (2015) Fusion of multi-stream speech features for dialect classification. CSI Trans ICT 2(4):243–252CrossRefGoogle Scholar
  48. Tang H, Ghorbani AA (2003) Accent classification using support vector machine and hidden Markov model. In: Advances in artificial intelligence. Springer, Berlin, pp 629–631Google Scholar
  49. Torres-Carrasquillo PA , Gleason TP , Reynolds DA (2004) Dialect identification using Gaussian mixture models. In: ODYSSEY 04-The speaker and language recognition workshop, pp 297–300Google Scholar
  50. Yan Q, Vaseghi S (2003) Analysis, modelling and synthesis of formants of British, American and Australian accents. In: Proceeding acoustics, speech, and signal processing, vol 1, pp I–712. IEEEGoogle Scholar
  51. Zheng DC, Dyke D, Berryman F, Morgan C (2012) A new approach to acoustic analysis of two British regional accents: Birmingham and Liverpool accents. Int J Speech Technol 15(2):77–85CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2017

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringBirla Institute of TechnologyMesra, RanchiIndia
  2. 2.KIIT College of EngineeringGurgaonIndia

Personalised recommendations