Advertisement

Literature Review

  • K. Sreenivasa RaoEmail author
  • V. Ramu Reddy
  • Sudhamay Maity
Chapter
Part of the SpringerBriefs in Electrical and Computer Engineering book series

Abstract

This chapter provides an overview of existing language identification systems. Existing language-specific features applied for LID study have been highlighted. The reasons for attraction towards developing implicit LID systems are explained and finally the motivation for the present work has been discussed.

Keywords

Language identification Review of implicit LID systems Review of explicit LID systems Identification of indian languages 

References

  1. 1.
    Muthusamy YK, Cole RA, BT Oshika (1992) The OGI multi-language telephone speech corpus. In: Proceedings of international conference spoken language processing, pp 895–898, Oct 1992Google Scholar
  2. 2.
    LDC (1996) Philadelphia, PA. http://www.ldc.upenn.edu/Catalog. LDC96S46–LDC96S60
  3. 3.
    Muthusamy YK, Jain N, Cole RA (1994) Perceptual benchmarks for automatic language identification. In: Proceedings of IEEE international conference acoustics, speech, and signal processing, vol 1, pp 333–336, April 1994Google Scholar
  4. 4.
    Lamel LF, Gauvain JL (1993) Cross lingual experiments with phone recognition. In: Proceedings of IEEE international conference acoustics, speech, and signal processing, pp 507–510, April 1993Google Scholar
  5. 5.
    Lamel LF, Gauvain JL (1994) Language identification using phonebased acoustic likelihoods. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing,vol 1, pp 293–296, April 1994Google Scholar
  6. 6.
    Berkling KM, Arai T, Bernard E (1994) Analysis of phoneme based features for langugae identification. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, pp 289–292, April 1994Google Scholar
  7. 7.
    Hazen TJ, Zue VW (1994) Recent improvements in an approach to segement-based automatic language identification. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, pp 1883–1886, Sept 1994Google Scholar
  8. 8.
    Andersen O, Dalsgaard P, Barry W (1994) On the use of datadriven clustering technique for identification of poly and mono-phonemes for four European languages. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, pp 121–124, April 1994Google Scholar
  9. 9.
    Tucker RCF, Carey MJ, Paris ES (1994) Automatic language identification using sub-words models. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, pp 301–304, April 1994Google Scholar
  10. 10.
    Zissman MA, Singer E (1994) Automatic language identification of telephone speech messages using phoneme recognition and N-gram modeling. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, (ICASSP-94), vol 1, pp I/305-I/308, 1994Google Scholar
  11. 11.
    Zissman MA (1996) Comparison of four approaches to automatic language identification of telephone speech. IEEE Trans Speech Audio Process 4:31–44CrossRefGoogle Scholar
  12. 12.
    Kadambe S, Hieronymus JL (1995) Language identification with phonological and lexical models. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, pp 3507–3510, May 1995Google Scholar
  13. 13.
    Yan Y, Barnard E (1995) Analysis approach to automatic langauge identification based on language-dependent phone recognition. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, vol 5, pp 3511–3514, May 1995Google Scholar
  14. 14.
    Navratil J, Zuhlke W (1997) Phonetic-context mapping in language identification. In: Proceedings of EUROSPEECH, vol 1, (Greece), pp 71–74, Sept 1997Google Scholar
  15. 15.
    Navratil J (2001) Spoken language recognition a step toward multilinguality in speech processing. IEEE Trans Speech Audio Process 9:678–685 SepGoogle Scholar
  16. 16.
    Hazen TJ, Zue VW (1997) Segment-based automatic language identification. J Acoust Soc Am 101:2323–2331CrossRefGoogle Scholar
  17. 17.
    Kirchhoff K, Parandekar S (2001) Multi-stream statistical N-gram modeling with application to automatic language identification. In Proceeding of EUROSPEECH-2001, pp 803–806, 2001Google Scholar
  18. 18.
    Prasad VK (2003) Segmentation and recognition of continuous speech. Ph.D. thesis, Indian Institute of Technology, Department of Computer Science and Engineering, Madras, India, 2003Google Scholar
  19. 19.
    Ramasubramanian V, Jayaram AKVS, Sreenivas TV (2003) Language identification using parallel phone recognition. In: WSLP, TIFR, (Mumbai), pp 109–116, Jan 2003Google Scholar
  20. 20.
    Gauvain J, Messaoudi A, Schwenk H (2004) Language recognition using phone latices. In: Proceedings of INTERSPEECH-2004, pp 25–28, 2004Google Scholar
  21. 21.
    Shen W, Campbell W, Gleason T, Reynolds D, Singer E (2006) Experiments with lattice-based PPRLM language identification. In: Proceedings on IEEE Odyssey 2006: speaker and language recognition workshop, pp 1–6, 2006Google Scholar
  22. 22.
    Gleason TP, Zissman MA (2001) Composite background models and score standardization for language identification systems. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP01), vol 1, pp 529–532, 2001Google Scholar
  23. 23.
    Cordoba R, Dharo L, Fernandez-Martinez F, Macias-Guarasa J, Ferreiros J (2007) Language identification based on n-gram frequency ranking. In: Proceedings of EUROSPEECH-2007, pp 2137–2140, 2007Google Scholar
  24. 24.
    Li H, Ma B, Lee C-H (2007) A vector space modeling approach to spoken language identification. IEEE Trans Audio, Speech Lang Process 15:271–284CrossRefGoogle Scholar
  25. 25.
    Chai SK, Haizhou L (2008) On acoustic diversification front-end for spoken language identification. IEEE Trans Audio, Speech, Lang Process 16:1029–1037CrossRefGoogle Scholar
  26. 26.
    Tong R, Ma B, Li H, Chng E (2008) Target-oriented phone selection from universal phone set for spoken language recognition. In: Proceedings of INTERSPEECH-2008Google Scholar
  27. 27.
    You J-L, Chen Y-N, Chu M, Soong FK, Wang J-L (2008) Identifying language origin of named entity with multiple information sources. IEEE Trans Audio, Speech, Lang Process 16:1077–1086 AuguestGoogle Scholar
  28. 28.
    Botha GR, Barnard E (2012) Factors that affect the accuracy of text-based language identification. Compu Speech Lang 26:307–320CrossRefGoogle Scholar
  29. 29.
    Zissman MA, Berkling KM (2001) Automatic language identification. Speech Comm 35:115–124zbMATHGoogle Scholar
  30. 30.
    Martin AF, Przybocki MA (2003) NIST 2003 language recognition evaluation. In: Proceedings of EUROSPEECH (Geneva, Switzerland), pp 1341–1344, Sept 2003Google Scholar
  31. 31.
    Leonard RG, Doddington GR (1974) Automatic language identification. Technical report, A.F.R.A.D. Centre Technical Report RADC-TR-74-200, 1974Google Scholar
  32. 32.
    House AS, Neuburg EP (1977) Toward automatic identification of the language of an utterance. J Acoust Soc Am 62:708–713CrossRefGoogle Scholar
  33. 33.
    Cimarusti D, Eves RB (1982) Development of an automatic identification system of spoken languages: phase I. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, pp 1661–1663, May 1982Google Scholar
  34. 34.
    Eady S (1982) Differences in F0 patterns of speech: tone languages versus stress language. Lang Speech 25:29–42Google Scholar
  35. 35.
    Ives R (1986) A minimal rule AI expert system for real-time classification of natural spoken languages. In: Proceedings of 2nd annual artificial intelligence and advanced computer technology conference, pp 337–340, 1986Google Scholar
  36. 36.
    Foil JT (1986) Language identification using noisy speech. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, pp 861–864, April 1986Google Scholar
  37. 37.
    Goodman FJ, Martin AF, Wohlford RE (1989) Improved automatic language identification in noisy speech. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, pp 528–531, May 1989Google Scholar
  38. 38.
    Muthusamy YK, Cole RA, Gopalakrishnan M (1991) A segment-based approach to automatic language identification. In: Proceedings of IEEE international conference on acoustics, speech,and signal processing, vol 1, pp 353–356, April 1991Google Scholar
  39. 39.
    Sugiyama M (1991) Automatic language recognition using acoustic features. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, pp 813–816, May 1991Google Scholar
  40. 40.
    Riek L, Mistreta W, Morgan D (1991) Experiments in language identification. Technical Report, Lockheed Sanders Technical Report SPCOT-91-002, 1991Google Scholar
  41. 41.
    Nakagawa S, Ueda Y, Seino T (1992) Speaker-independent, text independent language identification by HMM. In: Proceedings of international conference on spoken language processing (ICSLP-1992), pp 1011–1014, 1992Google Scholar
  42. 42.
    Zissman MA (1993) Automatic langauge identification using Gaussian mixture and hidden Markov models. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, pp 399–402, April 1993Google Scholar
  43. 43.
    Itahashi S, Zhou J, Tanaka K (1994) Spoken language discrimination using speech fundamental frequency. In: Proceedings of international conference on spoken language processing (ICSLP-1994), pp 1899–1902, 1994Google Scholar
  44. 44.
    Shuichi I, Liang D (1995) Language identification based on speech fundamental frequency. In: Proceedings of EUROSPEECH-1995, pp 1359–1362, 1995Google Scholar
  45. 45.
    Li K (1994) Automatic language identification using syllabic features. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, pp 297–300, 1994Google Scholar
  46. 46.
    Pellegrino F, Andre-Abrecht R (1999) An unsupervised approach to language identification. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, pp 833–836, 1999Google Scholar
  47. 47.
    Carrasquillo PAT, Reynolds DA, Deller JR (2002) Language identification using Gaussian mixture model tokenization. In: Proceedings of IEEE international conference on acoustics,speech, and signal processing, vol I, pp 757–760, 2002Google Scholar
  48. 48.
    Torres-Carrasquillo P, Singer E, Kohler M, Greene R, Reynolds D, Deller JJ (2002) Approaches to language identification using Gaussian mixture models and shifted delta cepstral features. In: Proceedings international conference on spoken language processing (ICSLP-2002), 2002Google Scholar
  49. 49.
    Corredor-Ardoy C, Gauvain J, Adda-Decker M, Lamel L (1997) Language identification with language-independent acoustic models. In: Proceedings of EUROSPEECH-1997, pp 55–58, 1997Google Scholar
  50. 50.
    Dalsgaard P, Andersen O (1992) Identification of mono- and polyphonemes using acoustic-phonetic features derived by a self-organising neural network. In: Proceedings of International conference spoken language processing (ICSLP-1992), pp 547–550, 1992Google Scholar
  51. 51.
    Pellegrino F, Farinas J, Andr-Obrecht R (1992) Comparison of two phonetic approaches to language identification. In: Proceedings of EUROSPEECH99, pp 399–402, 1999Google Scholar
  52. 52.
    Ueda Y, Nakagawa S (1990) Diction for phoneme/syllable/word-category and identification of language using HMM. In: Proceedings of international conference on spoken language processing (ICSLP-1990), pp 1209–1212, 1990Google Scholar
  53. 53.
    Cole RA, Inouye JWT, Muthusamy YK, Gopalakrishnan M (1989) Language identification with neural networks: A feasibility study. In: Proceedings of IEEE Pacific rim conference communications, computers and signal processing, pp 525–529Google Scholar
  54. 54.
    Braun J, Levkowitz H (1998) Automatic language identification with perceptually guided training and recurrent neural networks. In: Proceedings of international conference on spoken language processing (ICSLP-1998), 1998Google Scholar
  55. 55.
    Wong E, Sridharan S (2002) Gaussian mixture model based language identification system. In: Proceedings international conference spoken language processing (ICSLP-2002), pp 93–96, 2002Google Scholar
  56. 56.
    Campbell W, Singera E, Torres-Carrasquillo P, Reynolds D (2004) Language recognition with support vector machines. In Proceedings of ODYSSEY- 2004:2004Google Scholar
  57. 57.
    Lu-Feng Z, Man-hung S, Xi Y, Gish H (2006) Discriminatively trained language models using support vector machines for language identification. In: Proceedings of speaker and language recognition workshop, 2006. IEEE Odyssey, pp 1–6Google Scholar
  58. 58.
    Castaldo F, Dalmasso E, Laface P, Colibro D, Vair C (2007) Language identification using acoustic models and speaker compensated cepstral-time matrices. In: IEEE international conference on acoustics, speech and signal processing (ICASSP 2007), pp IV-1013IV-1016, 2007Google Scholar
  59. 59.
    Noor E, Aronowitz H (2006) Efficient language identification using anchor models and support vector machines. In: Proceedings of IEEE Odyssey 2006 speaker and language recognition workshop, pp 1–6, 2006Google Scholar
  60. 60.
    Lin C, Wang H (2006) Language identification using pitch contour information in the ergodic Markov model. In: Proceedings of 2006 IEEE international conference on acoustics, speech, and signal processing (ICASSP 2006), pp I-I, 2006Google Scholar
  61. 61.
    Rouas J-L, Farinas J, Pellegrino F, Andr-Obrecht R (2005) Rhythmic unit extraction and modelling for automatic language identification. Speech Commun 47:436–456CrossRefGoogle Scholar
  62. 62.
    Wu C-H, Chiu Y-H, Shia C-J, Lin C-Y (2006) Automatic segmentation and identification of mixed-language speech using delta-BIC and LSA-based GMMs. IEEE Trans Audio Speech Lang Process 14:266–276CrossRefGoogle Scholar
  63. 63.
    Rouas JL (2007) Automatic prosodic variations modeling for language and dialect discrimination. IEEE Trans Audio, Speech, Lang Process 15:1904–1911CrossRefGoogle Scholar
  64. 64.
    Siu M-H, Yang X, Gish H (2009) Discriminatively trained GMMs for language classification using boosting methods. IEEE Trans Audio, Speech, Lang Process 17:187–197CrossRefGoogle Scholar
  65. 65.
    Sangwan A, Mehrabani M, Hansen JHL (2010) Automatic language analysis and identification based on speech production knowledge. In: ICASSP, 2010Google Scholar
  66. 66.
    Martnez D, Burget L, Ferrer L, Scheffer N (2012) iVector-based prosodic system for language Identification. In: ICASSP, 2012Google Scholar
  67. 67.
    Jyotsna B, Murthy HA, Nagarajan T (2000) Language identification from short segments of speech. In: Proceedings of international conference on spoken language processing (Beijing, China), pp 1033–1036, Oct 2000Google Scholar
  68. 68.
    Nagarajan T, Murthy HA (2002) Language identification using spectral vector distribution across languages. In: Proceedings of international conference on natural language processing, pp 327–335, 2002Google Scholar
  69. 69.
    Jayaram AKVS, Ramasubramanian V, Sreenivas TV (2003) Language identification using parallel sub-word recognition. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, vol I, pp 32–35, 2003Google Scholar
  70. 70.
    Mary L, Yegnanarayana B (2004) Autoassociative neural network models for language identification. In: Proceedings of international conference on intelligent sensing and information processing (Chennai, India), pp 317–320, 2004Google Scholar
  71. 71.
    Mary L, Rao KS, Yegnanarayana B (2005) Neural network classifiers for language identification using syntactic and prosodic features. In: Proceedings of IEEE international conference on Intelligent sensing and information processing (Chennai, India), pp 404–408, Jan 2005Google Scholar
  72. 72.
    Mary L, Yegnanarayana B (2008) Extraction and representation of prosodic features for language and speaker recognition. Speech Commun 50:782–796CrossRefGoogle Scholar
  73. 73.
    Sreenivasa Rao K, Koolagudi SG (2011) Identification of hindi dialects and emotions using spectral and prosodic features of speech. J Syst Cybern Inform 9(4):24–33Google Scholar
  74. 74.
    Yadav J, Sreenivasa Rao K (2014) Emotional-speech synthesis from neutral-speech using prosody imposition. In: International conference on recent trends in computer science and engineering (ICRTCSE-2014), Central University of Bihar, Patna, India, Feb 8–9, 2014Google Scholar
  75. 75.
    Koolagudi SG, Rastogi D, Sreenivasa Rao K (2012) Spoken language identification using spectral features. Communications in computer and information science (CCIS): contemporary computing, vol 306. Springer, New York, pp 496–497Google Scholar
  76. 76.
    Greenberg S (1999) Speaking in short hand–a syllable-centric perspective for understanding pronunciation variation. Speech Comm 29:159–176CrossRefGoogle Scholar
  77. 77.
    Maity S, Vuppala AK, Rao KS, Nandi D (2012) IITKGP-MLILSC speech database for language identification. In: National Conference Communication, Feb 2012Google Scholar
  78. 78.
    Rao KS, Maity S, Reddy VR (2013) Pitch synchronous and glottal closure based speech analysis for language recognition. Int J Speech Technol (Springer) 16(4):413–430CrossRefGoogle Scholar
  79. 79.
    Ramu Reddy V, Maity S, Sreenivasa Rao K (2013) Recognition of Indian languages using multi-level spectral and prosodic features. Int J Speech Technol (Springer) 16(4):489–510CrossRefGoogle Scholar

Copyright information

© The Author(s) 2015

Authors and Affiliations

  • K. Sreenivasa Rao
    • 1
    Email author
  • V. Ramu Reddy
    • 2
  • Sudhamay Maity
    • 3
  1. 1.Indian Institute of Technology KharagpurKharagpurIndia
  2. 2.Innovation Lab KolkataKolkataIndia
  3. 3.Indian Institute of Technology KharagpurKharagpurIndia

Personalised recommendations