Covariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data

Vaněk, Jan; Machlica, Lukáš; Psutka, Josef V.; Psutka, Josef

doi:10.1007/978-3-319-01931-4_13

Jan Vaněk²²,
Lukáš Machlica²²,
Josef V. Psutka²² &
…
Josef Psutka²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8113))

Included in the following conference series:

International Conference on Speech and Computer

1209 Accesses
2 Citations

Abstract

An estimation of parameters of a multivariate Gaussian Mixture Model is usually based on a criterion (e.g. Maximum Likelihood) that is focused mostly on training data. Therefore, testing data, which were not seen during the training procedure, may cause problems. Moreover, numerical instabilities can occur (e.g. for low-occupied Gaussians especially when working with full-covariance matrices in high-dimensional spaces). Another question concerns the number of Gaussians to be trained for a specific data set. The approach proposed in this paper can handle all these issues. It is based on an assumption that the training and testing data were generated from the same source distribution. The key part of the approach is to use a criterion based on the source distribution rather than using the training data itself. It is shown how to modify an estimation procedure in order to fit the source distribution better (despite the fact that it is unknown), and subsequently new estimation algorithm for diagonal- as well as full-covariance matrices is derived and tested.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Young, S., et al.: The HTK Book (for HTK Version 3.4), Cambridge (2006)
Google Scholar
Diehl, F., Gales, M.J.F., Liu, X., Tomalin, M., Woodland, P.C.: Word Boundary Modelling and Full Covariance Gaussians for Arabic Speech-to-Text Systems. In: Proc. INTERSPEECH 2011, pp. 777–780 (2011)
Google Scholar
Bell, P., King, S.: A Shrinkage Estimator for Speech Recognition with Full Covariance HMMs. In: Proc. Interspeech 2008, Brisbane, Australia (2008)
Google Scholar
Bell, P.: Full Covariance Modelling for Speech Recognition. Ph.D. Thesis, The University of Edinburgh
Google Scholar
Lee, Y., Lee, K.Y., Lee, J.: The Estimating Optimal Number of Gaussian Mixtures Based on Incremental k-means for Speaker Identification. International Journal of Information Technology 12(7), 13–21 (2006)
Google Scholar
Figueiredo, M.A.T., Leitão, J.M.N., Jain, A.K.: On Fitting Mixture Models. In: Hancock, E.R., Pelillo, M. (eds.) EMMCVPR 1999. LNCS, vol. 1654, pp. 54–69. Springer, Heidelberg (1999)
Chapter Google Scholar
Mclachlan, G.J., Peel, D.: On a Resampling Approach to Choosing the Number of Components in Normal Mixture Models. Computing Science and Statistics 28, 260–266 (1997)
Google Scholar
Paclík, P., Novovičová, J.: Number of Components and Initialization in Gaussian Mixture Model for Pattern Recognition. In: Proc. Artificial Neural Nets and Genetic Algorithms, pp. 406–409. Springer, Wien (2001)
Google Scholar
Schwarz, G.E.: Estimating the dimension of a model. Annals of Statistics 6(2), 461–464 (1978)
Article MathSciNet MATH Google Scholar
Akaike, H.: On entropy maximization principle. In: Applications of Statistics, pp. 27–41. North-Holland, Amsterdam (1977)
Google Scholar
Machlica, L., Vanek, J., Zajic, Z.: Fast Estimation of Gaussian Mixture Model Parameters on GPU using CUDA. In: Proc. PDCAT, Gwangju, South Korea (2011)
Google Scholar
Vanek, J., Trmal, J., Psutka, J.V., Psutka, J.: Optimized Acoustic Likelihoods Computation for NVIDIA and ATI/AMD Graphics Processors. IEEE Transactions on Audio, Speech and Language Processing 20(6), 1818–1828 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Applied Sciences, Department of Cybernetics, University of West Bohemia in Pilsen, Univerzitní 22, Pilsen, 306 14, Czech Republic
Jan Vaněk, Lukáš Machlica, Josef V. Psutka & Josef Psutka

Authors

Jan Vaněk
View author publications
You can also search for this author in PubMed Google Scholar
Lukáš Machlica
View author publications
You can also search for this author in PubMed Google Scholar
Josef V. Psutka
View author publications
You can also search for this author in PubMed Google Scholar
Josef Psutka
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Applied Sciences, Department of Cybernetics, University of West Bohemia, Univerzitní 8, 306 14, Plzeň, Czech Republic
Miloš Železný
University of West Bohemia, 306 14, Pilsen, Czech Republic
Ivan Habernal
Speech and Multimodal Interfaces Laboratory, St. Petersburg Institute of Informatics and Automation for the Russian Academy of Sciences, 14-th line, 39, 199178, St. Petersburg, Russia
Andrey Ronzhin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vaněk, J., Machlica, L., Psutka, J.V., Psutka, J. (2013). Covariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data. In: Železný, M., Habernal, I., Ronzhin, A. (eds) Speech and Computer. SPECOM 2013. Lecture Notes in Computer Science(), vol 8113. Springer, Cham. https://doi.org/10.1007/978-3-319-01931-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-01931-4_13
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01930-7
Online ISBN: 978-3-319-01931-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics