Abstract
In this work, we propose new features for the GMM based spoken language identification system. A two stage approach is followed for extraction of the proposed new features. MFCCs and formants are extracted from huge corpus of all languages under consideration. In the first phase, MFCCs and formants are concatenated to form the feature vector. K clusters are formed from these feature vectors and one Gaussian is designed for each cluster. In the second phase, these feature vectors are evaluated against each of the K Gaussians and the returned K probabilities are considered as the elements of the proposed new feature vector, thus forming a K-element new feature vector. This proposed method for deriving new feature vector is common for both training and testing phases. In the training phase, K-element feature vectors are generated from the language specific speech corpus and language specific GMMs are trained. In testing phase, similar procedure is followed for extraction of K-element feature vector from unknown speech utterance and evaluated against language specific GMMs. Usefulness, the language specific apriori knowledge is used for further improvement of recognition performance. The experiments are carried out on OGI database and the LID performance is nearly 100%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zissman, M.A.: Overview of Current Techniques for Automatic Language Identification of Speech. In: Proceedings of the IEEE Automatic Speech Recognition Workshop, pp. 60–62 (December 1995)
Waibel, A., Geutner, P., Tomokiyo, L.M., Schultz, T., Woszczyna, M.: Multilinguality in speech and spoken language systems. Proc. IEEE 88(8), 1181–1990 (2000)
Sugiyama, M.: Automatic language recognition using acoustic features. In: Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing, pp. 813–816 (May 1991)
Zissman, M.A.: Comparison of Four Approaches to Automatic Language Identification of Telephone Speech. IEEE Trans. Speech and Audio Proc. SAP-4(1), 31–44 (1996)
Martin, A.F., Garofolo, J.S.: NIST speech processing evaluations: LVCSR, speaker recognition, language recognition. In: Proc. IEEE Workshop on Signal Processing Applications for Public Security and Forensics, pp. 1–7 (2007)
Kirchhoff, K.: Language characteristics. In: Schultz, T., Kirchhoff, K. (eds.) Multilingual Speech Processing. Elsevier (2006)
Zhao, J., Shu, H., Zhang, L., Wang, X., Gong, Q., Li, P.: Cortical competition during language discrimination. NeuroImage 43, 624–633 (2008)
Torres Carrasquillo, P.A., Reynolds, D.A., Deller Jr., J.R.: Language identification using Gaussian mixture model tokenization. In: Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing, vol. 1, pp. 757–760 (2002)
Muthusamy, Y.K., Barnard, E., Cole, R.A.: Automatic language identification: A Review/Tutorial. IEEE Signal Processing Magazine (October 1994)
Nakagawa, S., Suzuki, H.: A New Speech Recognition Method Based on VQ-Distortion Measure and HMM. In: Proc. Int. Conf. ASSP, pp. 673–679 (April 1993)
Torres-Carrasquillo, P.A., Singer, E., Kohler, M., Greene, R., Reynolds, D.A., Deller Jr., J.R.: Approaches to language identification using Gaussian mixture models and shifted delta cepstral features. In: Proc. ICSLP, pp. 89–92 (2002)
Nagarajan, T., Murthy, H.A.: Language identification using spectral vector distribution across the languages. In: Proceedings of Int. Conf. Natural Language Processing (December 2002)
Yegnanarayana, B.: Formant extraction from linear prediction phase spectrum. J. Acoust. Soc. Amer. 63, 1638–1640 (1978)
Bruce, I.C., Karkhanis, N.V., Young, E.D., Sachs, M.B.: Robust formant tracking in noise. In: ICASSP (2002)
Bruce, I.C., Mustafa, K.: Robust formant tracking for continuous speech with speaker variability. IEEE Trans. ASSP 14(2), 435–444 (2006)
OGI Multi Language Telephone Speech (January 2004), http://www.cslu.ogi.edu/corpora/mlts/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Manchala, S., Prasad, V.K. (2013). GMM Based Language Identification System Using Robust Features. In: Železný, M., Habernal, I., Ronzhin, A. (eds) Speech and Computer. SPECOM 2013. Lecture Notes in Computer Science(), vol 8113. Springer, Cham. https://doi.org/10.1007/978-3-319-01931-4_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-01931-4_21
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01930-7
Online ISBN: 978-3-319-01931-4
eBook Packages: Computer ScienceComputer Science (R0)