Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise
- 152 Downloads
Language Identification has gained significant importance in recent years, both in research and commercial market place, demanding an improvement in the ability of machines to distinguish between languages. Although methods like Gaussian mixture models, hidden Markov models and neural networks are used for identifying languages the problem of language identification in noisy environments could not be addressed so far. This paper addresses the performance of automatic language identification system in noisy environments. A comparative performance analysis of speech enhancement techniques like minimum mean squared estimation, spectral subtraction and temporal processing, with different types of noise at different SNRs, is presented here. Though these individual enhancement techniques may not yield good performance with different types of noise at different SNRs, it is proposed to combine the evidences of all these techniques to improve the overall performance of the system significantly. The language identification studies are performed using IITKGP-MLILSC (IIT Kharagpur-Multilingual Indian Language Speech Corpus) databases which consists of 27 languages.
KeywordsLanguage identification Noise Indian languages MFCC GMM MMSE SS TP
The authors are grateful to Dr K Sreenivasa Rao, Associate Professor and his team at School of Information Technology (SIT), IIT Kharagpur for providing IIT Kharagpur-Multilingual Indian Language Speech Corpus) databases which consists of 27 languages. We would also like to thank their suggestions and helpful discussions.
- Benesty, J., Sondhi, M. M., & Huang, Y. (Eds.). (2008). Springer handbook of speech processing. Berlin: Springer.Google Scholar
- Foil, J. (1986). Language identification using noisy speech. Acoustics, Speech, and Signal Processing, IEEE international conference on ICASSP’86. Vol. 11. IEEE.Google Scholar
- Goodman, F. J., Martin, A. F., & Wohlford, R. (1989). Improved automatic language identification in noisy speech. Acoustics, Speech, and Signal Processing, 1989. ICASSP-89., 1989 international conference on. IEEE.Google Scholar
- Hegde, R. M., & Murthy, H. A. (2005) Automatic language identification and discrimination using the modified group delay feature. In Intelligent Sensing and Information Processing, 2005. Proceedings of 2005 International Conference on. IEEE.Google Scholar
- Lander, T., Cole, R., Oshika, B., & Noel, M. (1995). The OGI 22 language telephone speech corpus. In Eurospeech (pp. 1894–1903).Google Scholar
- Lawson, A., McLaren, M., Lei, Y., Mitra, V., Scheffer, N., Ferrer, L., & Graciarena, M. (2013). Improving language identification robustness to highly channel-degraded speech through multiple system fusion. In INTERSPEECH (pp. 1507–1510). Lyon.Google Scholar
- Maity, S., et al. (2012). IITKGP-MLILSC speech database for language identification. Communications (NCC), 2012 National Conference on. IEEE.Google Scholar
- Nakagawa, S., Ueda, Y., & Seino T. (1992). Speaker-independent, text-independent language identification by HMM. ICSLP. Vol. 92.Google Scholar