Abstract
Efficient similarity search for data with complex structures is a challenging task in many modern data mining applications, such as image retrieval, speaker recognition and stock market analysis. A common way to model these data objects is using Gaussian Mixture Models which has the ability to approximate arbitrary distributions in a concise way. To facilitate efficient queries, indexes are essential techniques. However, due different numbers of components in Gaussian Mixture Models, existing index methods tend to break down in performance. In this paper we propose a novel technique Normalized Transformation that reorganizes the index structure to account for different numbers of components in Gaussian Mixture Models. In addition, Normalized Transformation enables us to derive a set of similarity measures on the basis of existing ones that have close-form expression. Extensive experiments demonstrate the effectiveness of proposed technique for Gaussian component-based indexing and the performance of the novel similarity measures for clustering and classification.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
To the best of our knowledge.
- 2.
Implementation provided by WEKA at http://weka.sourceforge.net/doc.dev/weka/clusterers/EM.html.
- 3.
- 4.
- 5.
- 6.
References
STATS description. https://www.stats.com/sportvu-basketball-media/. Accessed 25 Feb 2017
Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machines using GMM supervectors for speaker verification. IEEE Sig. Process. Lett. 13(5), 308–311 (2006)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digit. Sig. Process. 10(1–3), 19–41 (2000)
KaewTraKulPong, P., Bowden, R.: An improved adaptive background mixture model for real-time tracking with shadow detection. In: Remagnino, P., Jones, G.A., Paragios, N., Regazzoni, C.S. (eds.) Video-Based Surveillance Systems, pp. 135–144. Springer, Boston (2002). doi:10.1007/978-1-4615-0913-4_11
Zivkovic, Z.: Improved adaptive Gaussian mixture model for background subtraction. In: ICPR, pp. 28–31 (2004)
Böhm, C., Pryakhin, A., Schubert, M.: The Gauss-tree: efficient object identification in databases of probabilistic feature vectors. In: ICDE, p. 9 (2006)
Helén, M.L., Virtanen, T.: Query by example of audio signals using Euclidean distance between Gaussian mixture models. In: ICASSP, vol. 1, pp. 225–228 (2007)
Sfikas, G., Constantinopoulos, C., Likas, A., Galatsanos, N.P.: An analytic distance metric for gaussian mixture models with application in image retrieval. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 835–840. Springer, Heidelberg (2005). doi:10.1007/11550907_132
Jensen, J.H., Ellis, D.P., Christensen, M.G., Jensen, S.H.: Evaluation of distance measures between Gaussian mixture models of MFCCs. In: ISMIR, pp. 107–108 (2007)
Tao, Y., Cheng, R., Xiao, X., Ngai, W.K., Kao, B., Prabhakar, S.: Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: VLDB, pp. 922–933 (2005)
Zhou, L., Wackersreuther, B., Fiedler, F., Plant, C., Böhm, C.: Gaussian component based index for GMMs. In: ICDM, pp. 1365–1370 (2016)
Böhm, C., Kunath, P., Pryakhin, A., Schubert, M.: Querying objects modeled by arbitrary probability distributions. In: Papadias, D., Zhang, D., Kollios, G. (eds.) SSTD 2007. LNCS, vol. 4605, pp. 294–311. Springer, Heidelberg (2007). doi:10.1007/978-3-540-73540-3_17
Kullback, S.: Information Theory and Statistics. Courier Dover Publications, New York (2012)
Hershey, J.R., Olsen, P.A.: Approximating the Kullback Leibler divergence between Gaussian mixture models. In: ICASSP, pp. 317–320 (2007)
Goldberger, J., Gordon, S., Greenspan, H.: An efficient image similarity measure based on approximations of KL-divergence between two Gaussian mixtures. In: ICCV, pp. 487–493 (2003)
Cui, S., Datcu, M.: Comparison of Kullback-Leibler divergence approximation methods between Gaussian mixture models for satellite image retrieval. In: IGARSS, pp. 3719–3722 (2015)
Beecks, C., Ivanescu, A.M., Kirchhoff, S., Seidl, T.: Modeling image similarity by Gaussian mixture models and the signature quadratic form distance. In: ICCV, pp. 1754–1761 (2011)
Rougui, J.E., Gelgon, M., Aboutajdine, D., Mouaddib, N., Rziza, M.: Organizing Gaussian mixture models into a tree for scaling up speaker retrieval. Pattern Recogn. Lett. 28(11), 1314–1319 (2007)
Zhou, L., Ye, W., Plant, C., Böhm, C.: Knowledge discovery of complex data using Gaussian mixture models. In: DaWaK (2017)
Geusebroek, J.-M., Burghouts, G.J., Smeulders, A.W.: The Amsterdam library of object images. Int. J. Comput. Vis. 61(1), 103–112 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Zhou, L., Ye, W., Wackersreuther, B., Plant, C., Böhm, C. (2017). Novel Indexing Strategy and Similarity Measures for Gaussian Mixture Models. In: Benslimane, D., Damiani, E., Grosky, W., Hameurlain, A., Sheth, A., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2017. Lecture Notes in Computer Science(), vol 10439. Springer, Cham. https://doi.org/10.1007/978-3-319-64471-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-64471-4_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64470-7
Online ISBN: 978-3-319-64471-4
eBook Packages: Computer ScienceComputer Science (R0)