International Journal of Speech Technology

, Volume 19, Issue 1, pp 75–85 | Cite as

Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise

  • Phani Kumar Polasi
  • Kalva Sri Rama Krishna


Language Identification has gained significant importance in recent years, both in research and commercial market place, demanding an improvement in the ability of machines to distinguish between languages. Although methods like Gaussian mixture models, hidden Markov models and neural networks are used for identifying languages the problem of language identification in noisy environments could not be addressed so far. This paper addresses the performance of automatic language identification system in noisy environments. A comparative performance analysis of speech enhancement techniques like minimum mean squared estimation, spectral subtraction and temporal processing, with different types of noise at different SNRs, is presented here. Though these individual enhancement techniques may not yield good performance with different types of noise at different SNRs, it is proposed to combine the evidences of all these techniques to improve the overall performance of the system significantly. The language identification studies are performed using IITKGP-MLILSC (IIT Kharagpur-Multilingual Indian Language Speech Corpus) databases which consists of 27 languages.


Language identification Noise Indian languages MFCC GMM MMSE SS TP 



The authors are grateful to Dr K Sreenivasa Rao, Associate Professor and his team at School of Information Technology (SIT), IIT Kharagpur for providing IIT Kharagpur-Multilingual Indian Language Speech Corpus) databases which consists of 27 languages. We would also like to thank their suggestions and helpful discussions.


  1. Ambikairajah, E., et al. (2011). Language identification: A tutorial. Circuits and Systems Magazine IEEE, 11(2), 82–108.CrossRefGoogle Scholar
  2. Benesty, J., Sondhi, M. M., & Huang, Y. (Eds.). (2008). Springer handbook of speech processing. Berlin: Springer.Google Scholar
  3. Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing, 27(2), 113–120.CrossRefGoogle Scholar
  4. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39, 1–38.MathSciNetMATHGoogle Scholar
  5. Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Processing, 33(2), 443–445.CrossRefGoogle Scholar
  6. Foil, J. (1986). Language identification using noisy speech. Acoustics, Speech, and Signal Processing, IEEE international conference on ICASSP’86. Vol. 11. IEEE.Google Scholar
  7. Goodman, F. J., Martin, A. F., & Wohlford, R. (1989). Improved automatic language identification in noisy speech. Acoustics, Speech, and Signal Processing, 1989. ICASSP-89., 1989 international conference on. IEEE.Google Scholar
  8. Hegde, R. M., & Murthy, H. A. (2005) Automatic language identification and discrimination using the modified group delay feature. In Intelligent Sensing and Information Processing, 2005. Proceedings of 2005 International Conference on. IEEE.Google Scholar
  9. Jothilakshmi, S., Ramalingam, V., & Palanivel, S. (2012). A hierarchical language identification system for Indian languages. Digital Signal Processing, 22(3), 544–553.CrossRefMathSciNetGoogle Scholar
  10. Krishnamoorthy, P., & Prasanna, S. R. M. (2009). Application of combined temporal and spectral processing methods for speaker recognition under noisy, reverberant or multi-speaker environments. Sadhana, 34(5), 729–754.CrossRefGoogle Scholar
  11. Lander, T., Cole, R., Oshika, B., & Noel, M. (1995). The OGI 22 language telephone speech corpus. In Eurospeech (pp. 1894–1903).Google Scholar
  12. Lawson, A., McLaren, M., Lei, Y., Mitra, V., Scheffer, N., Ferrer, L., & Graciarena, M. (2013). Improving language identification robustness to highly channel-degraded speech through multiple system fusion. In INTERSPEECH (pp. 1507–1510). Lyon.Google Scholar
  13. Maity, S., et al. (2012). IITKGP-MLILSC speech database for language identification. Communications (NCC), 2012 National Conference on. IEEE.Google Scholar
  14. Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50(10), 782–796.CrossRefGoogle Scholar
  15. Nakagawa, S., Ueda, Y., & Seino T. (1992). Speaker-independent, text-independent language identification by HMM. ICSLP. Vol. 92.Google Scholar
  16. Rao, K. S., Maity, S., & Reddy, V. R. (2013). Pitch synchronous and glottal closure based speech analysis for language recognition. International Journal of Speech Technology, 16(4), 413–430.CrossRefGoogle Scholar
  17. Reddy, V. R., Maity, S., & Rao, K. S. (2013). Identification of Indian languages using multi-level spectral and prosodic features. International Journal of Speech Technology, 16(4), 489–511.CrossRefGoogle Scholar
  18. Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83.CrossRefGoogle Scholar
  19. Vuppala, A. K., Rao, K. S., Chakrabarti, S., Krishnamoorthy, P., & Prasanna, S. R. M. (2011). Recognition of consonant-vowel (CV) units under background noise using combined temporal and spectral preprocessing. International Journal of Speech Technology, 14(3), 259–272.CrossRefGoogle Scholar
  20. Vuppala, A. K., & Sreenivasa Rao, K. (2013). Vowel onset point detection for noisy speech using spectral energy at formant frequencies. International Journal of Speech Technology, 16(2), 229–235.CrossRefGoogle Scholar
  21. Zissman, M. A. (1996). Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, 4(1), 31.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.ECE DepartmentV R Siddhartha Engineering CollegeVijayawadaIndia

Personalised recommendations