Abstract
With the explosive growth of data quantity and variety, the representation and analysis of complex data becomes a more and more challenging task in many modern applications. As a general class of probabilistic distribution functions, Gaussian Mixture Models have the ability to approximate arbitrary distributions in a concise way, making them very suitable for the representation of complex data. To facilitate efficient queries and following analysis, we generalize Euclidean distance to Gaussian Mixture Models and derive the closed-form expression called Infinite Euclidean Distance. Our metric enables efficient and accurate similarity calculations. For the analysis of complex data, we model two real-world data sets, NBA player statistic and the weather data of airports, into Gaussian Mixture Models, and we compare the performance of Infinite Euclidean Distance to previous similarity measures on both classification and clustering tasks. Experimental evaluations demonstrate the efficiency and effectiveness of Infinite Euclidean Distance and Gaussian Mixture Models on the analysis of complex data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
With the given parameter, the query accuracies of GQFD using VP-tree is guaranteed for the synthetic data.
- 5.
- 6.
References
Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5), 308–311 (2006)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digit. Signal Proc. 10(1–3), 19–41 (2000)
KaewTraKulPong, P., Bowden, R.: An improved adaptive background mixture model for real-time tracking with shadow detection. In: Remagnino, P., Jones, G.A., Paragios, N., Regazzoni, C.S. (eds.) Video-Based Surveillance Systems, pp. 135–144. Springer, Boston (2002)
Zivkovic, Z.: Improved adaptive gaussian mixture model for background subtraction. In: ICPR, pp. 28–31 (2004)
STATS description. https://www.stats.com/sportvu-basketball-media/. Accessed 25 Feb 2017
Cheplygina, V., Tax, D.M.J., Loog, M.: Dissimilarity-based ensembles for multiple instance learning. IEEE Trans. Neural Netw. Learn. Syst. 27(6), 1379–1391 (2016)
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: ICCV, pp. 1470–1477 (2003)
Kriegel, H.-P., Pryakhin, A., Schubert, M.: An EM-approach for clustering multi-instance objects. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS, vol. 3918, pp. 139–148. Springer, Heidelberg (2006). doi:10.1007/11731139_18
Wei, X., Wu, J., Zhou, Z.: Scalable multi-instance learning. In: ICDM, pp. 1037–1042 (2014)
Zhou, Z., Sun, Y., Li, Y.: Multi-instance learning by treating instances as non-I.I.D. samples. In: ICML, pp. 1249–1256 (2009)
Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search in metric spaces. In: VLDB, pp. 426–435 (1997)
Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: ACM/SIGACT-SIAM SODA, pp. 311–321 (1993)
Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89(1–2), 31–71 (1997)
Amores, J.: Multiple instance classification: Review, taxonomy and comparative study. Artif. Intell. 201, 81–105 (2013)
Weidmann, N., Frank, E., Pfahringer, B.: A two-level learning method for generalized multi-instance problems. In: Lavrač, N., Gamberger, D., Blockeel, H., Todorovski, L. (eds.) ECML 2003. LNCS, vol. 2837, pp. 468–479. Springer, Heidelberg (2003). doi:10.1007/978-3-540-39857-8_42
Chen, Y., Wang, J.Z.: Image categorization by learning and reasoning with regions. J. Mach. Learn. Res. 5, 913–939 (2004)
Chen, Y., Bi, J., Wang, J.Z.: MILES: multiple-instance learning via embedded instance selection. Pattern Anal. Mach. Intell. 28(12), 1931–1947 (2006)
Wang, H., Yang, Q., Zha, H.: Adaptive p-posterior mixture-model kernels for multiple instance learning. In: ICML, pp. 1136–1143 (2008)
Vatsavai, R.R.: Gaussian multiple instance learning approach for mapping the slums of the world using very high resolution imagery. In: SIGKDD, pp. 1419–1426 (2013)
Sikka, K., Giri, R., Bartlett, M.S.: Joint clustering and classification for multiple instance learning. In: BMVC, p. 71.1–71.12 (2015)
Reynolds, D.: Gaussian mixture models. In: Li, S.Z., Jain, A. (eds.) Encyclopedia of Biometrics, pp. 827–832. Springer, New York (2015)
Kullback, S.: Information Theory and Statistics. Courier Dover Publications, Mineola (2012)
Hershey, J.R., Olsen, P.A.: Approximating the kullback leibler divergence between gaussian mixture models. In: ICASSP, pp. 317–320 (2007)
Goldberger, J., Gordon, S., Greenspan, H.: An efficient image similarity measure based on approximations of KL-divergence between two gaussian mixtures. In: ICCV, pp. 487–493 (2003)
Cui, S., Datcu, M.: Comparison of kullback-leibler divergence approximation methods between gaussian mixture models for satellite image retrieval. In: IGARSS, pp. 3719–3722 (2015)
Helén, M.L., Virtanen, T.: Query by example of audio signals using euclidean distance between gaussian mixture models. In: ICASSP, vol. 1, pp. 225–228 (2007)
Sfikas, G., Constantinopoulos, C., Likas, A., Galatsanos, N.P.: An analytic distance metric for gaussian mixture models with application in image retrieval. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 835–840. Springer, Heidelberg (2005). doi:10.1007/11550907_132
Jensen, J.H., Ellis, D.P.W., Christensen, M.G., Jensen, S.H.: Evaluation of distance measures between gaussian mixture models of MFCCs. In: ISMIR, pp. 107–108 (2007)
Beecks, C., Ivanescu, A.M., Kirchhoff, S., Seidl, T.: Modeling image similarity by gaussian mixture models and the signature quadratic form distance. In: ICCV, pp. 1754–1761 (2011)
Tao, Y., Cheng, R., Xiao, X., Ngai, W.K., Kao, B., Prabhakar, S.: Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: VLDB, pp. 922–933 (2005)
Rougui, J.E., Gelgon, M., Aboutajdine, D., Mouaddib, N., Rziza, M.: Organizing Gaussian mixture models into a tree for scaling up speaker retrieval. Pattern Recogn. Lett. 28(11), 1314–1319 (2007)
Böhm, C., Kunath, P., Pryakhin, A., Schubert, M.: Querying objects modeled by arbitrary probability distributions. In: Papadias, D., Zhang, D., Kollios, G. (eds.) SSTD 2007. LNCS, vol. 4605, pp. 294–311. Springer, Heidelberg (2007). doi:10.1007/978-3-540-73540-3_17
Zhou, L., Wackersreuther, B., Fiedler, F., Plant, C., Böhm, C.: Gaussian component based index for GMMs. In: ICDM, pp. 1365–1370 (2016)
Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: SIGKDD, pp. 226–231 (1996)
Peel, M.C., Finlayson, B.L., McMahon, T.A.: Updated world map of the köppen-geiger climate classification. Hydrol. Earth Syst. Sci. Discuss. 4(2), 439–473 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Zhou, L., Ye, W., Plant, C., Böhm, C. (2017). Knowledge Discovery of Complex Data Using Gaussian Mixture Models. In: Bellatreche, L., Chakravarthy, S. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2017. Lecture Notes in Computer Science(), vol 10440. Springer, Cham. https://doi.org/10.1007/978-3-319-64283-3_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-64283-3_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64282-6
Online ISBN: 978-3-319-64283-3
eBook Packages: Computer ScienceComputer Science (R0)