Knowledge Discovery of Complex Data Using Gaussian Mixture Models

Zhou, Linfei; Ye, Wei; Plant, Claudia; Böhm, Christian

doi:10.1007/978-3-319-64283-3_30

Knowledge Discovery of Complex Data Using Gaussian Mixture Models

Linfei Zhou¹⁵,
Wei Ye¹⁵,
Claudia Plant¹⁶ &
…
Christian Böhm¹⁵

Conference paper
First Online: 03 August 2017

1677 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10440))

Abstract

With the explosive growth of data quantity and variety, the representation and analysis of complex data becomes a more and more challenging task in many modern applications. As a general class of probabilistic distribution functions, Gaussian Mixture Models have the ability to approximate arbitrary distributions in a concise way, making them very suitable for the representation of complex data. To facilitate efficient queries and following analysis, we generalize Euclidean distance to Gaussian Mixture Models and derive the closed-form expression called Infinite Euclidean Distance. Our metric enables efficient and accurate similarity calculations. For the analysis of complex data, we model two real-world data sets, NBA player statistic and the weather data of airports, into Gaussian Mixture Models, and we compare the performance of Infinite Euclidean Distance to previous similarity measures on both classification and clustering tasks. Experimental evaluations demonstrate the efficiency and effectiveness of Infinite Euclidean Distance and Gaussian Mixture Models on the analysis of complex data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://drive.google.com/open?id=0B3LRCuPdnX1BSTU3UjBCVDJSLWs.
2.
https://drive.google.com/open?id=0B3LRCuPdnX1BUW5TbzNSdDBoaVk.
3.
http://stats.nba.com/help/glossary/.
4.
With the given parameter, the query accuracies of GQFD using VP-tree is guaranteed for the synthetic data.
5.
http://bleacherreport.com/articles/537852-michael-jordan-and-his-nba-heirs-the-10-most-like-mike-players-in-the-league.
6.
http://www.rantsports.com/nba/2015/07/12/10-current-nba-players-who-emulate-michael-jordans-competitiveness/.

References

Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5), 308–311 (2006)
Article Google Scholar
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digit. Signal Proc. 10(1–3), 19–41 (2000)
Article Google Scholar
KaewTraKulPong, P., Bowden, R.: An improved adaptive background mixture model for real-time tracking with shadow detection. In: Remagnino, P., Jones, G.A., Paragios, N., Regazzoni, C.S. (eds.) Video-Based Surveillance Systems, pp. 135–144. Springer, Boston (2002)
Chapter Google Scholar
Zivkovic, Z.: Improved adaptive gaussian mixture model for background subtraction. In: ICPR, pp. 28–31 (2004)
Google Scholar
STATS description. https://www.stats.com/sportvu-basketball-media/. Accessed 25 Feb 2017
Cheplygina, V., Tax, D.M.J., Loog, M.: Dissimilarity-based ensembles for multiple instance learning. IEEE Trans. Neural Netw. Learn. Syst. 27(6), 1379–1391 (2016)
Article Google Scholar
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: ICCV, pp. 1470–1477 (2003)
Google Scholar
Kriegel, H.-P., Pryakhin, A., Schubert, M.: An EM-approach for clustering multi-instance objects. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS, vol. 3918, pp. 139–148. Springer, Heidelberg (2006). doi:10.1007/11731139_18
Chapter Google Scholar
Wei, X., Wu, J., Zhou, Z.: Scalable multi-instance learning. In: ICDM, pp. 1037–1042 (2014)
Google Scholar
Zhou, Z., Sun, Y., Li, Y.: Multi-instance learning by treating instances as non-I.I.D. samples. In: ICML, pp. 1249–1256 (2009)
Google Scholar
Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search in metric spaces. In: VLDB, pp. 426–435 (1997)
Google Scholar
Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: ACM/SIGACT-SIAM SODA, pp. 311–321 (1993)
Google Scholar
Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89(1–2), 31–71 (1997)
Article MATH Google Scholar
Amores, J.: Multiple instance classification: Review, taxonomy and comparative study. Artif. Intell. 201, 81–105 (2013)
Article MathSciNet MATH Google Scholar
Weidmann, N., Frank, E., Pfahringer, B.: A two-level learning method for generalized multi-instance problems. In: Lavrač, N., Gamberger, D., Blockeel, H., Todorovski, L. (eds.) ECML 2003. LNCS, vol. 2837, pp. 468–479. Springer, Heidelberg (2003). doi:10.1007/978-3-540-39857-8_42
Chapter Google Scholar
Chen, Y., Wang, J.Z.: Image categorization by learning and reasoning with regions. J. Mach. Learn. Res. 5, 913–939 (2004)
MathSciNet Google Scholar
Chen, Y., Bi, J., Wang, J.Z.: MILES: multiple-instance learning via embedded instance selection. Pattern Anal. Mach. Intell. 28(12), 1931–1947 (2006)
Article Google Scholar
Wang, H., Yang, Q., Zha, H.: Adaptive p-posterior mixture-model kernels for multiple instance learning. In: ICML, pp. 1136–1143 (2008)
Google Scholar
Vatsavai, R.R.: Gaussian multiple instance learning approach for mapping the slums of the world using very high resolution imagery. In: SIGKDD, pp. 1419–1426 (2013)
Google Scholar
Sikka, K., Giri, R., Bartlett, M.S.: Joint clustering and classification for multiple instance learning. In: BMVC, p. 71.1–71.12 (2015)
Google Scholar
Reynolds, D.: Gaussian mixture models. In: Li, S.Z., Jain, A. (eds.) Encyclopedia of Biometrics, pp. 827–832. Springer, New York (2015)
Chapter Google Scholar
Kullback, S.: Information Theory and Statistics. Courier Dover Publications, Mineola (2012)
MATH Google Scholar
Hershey, J.R., Olsen, P.A.: Approximating the kullback leibler divergence between gaussian mixture models. In: ICASSP, pp. 317–320 (2007)
Google Scholar
Goldberger, J., Gordon, S., Greenspan, H.: An efficient image similarity measure based on approximations of KL-divergence between two gaussian mixtures. In: ICCV, pp. 487–493 (2003)
Google Scholar
Cui, S., Datcu, M.: Comparison of kullback-leibler divergence approximation methods between gaussian mixture models for satellite image retrieval. In: IGARSS, pp. 3719–3722 (2015)
Google Scholar
Helén, M.L., Virtanen, T.: Query by example of audio signals using euclidean distance between gaussian mixture models. In: ICASSP, vol. 1, pp. 225–228 (2007)
Google Scholar
Sfikas, G., Constantinopoulos, C., Likas, A., Galatsanos, N.P.: An analytic distance metric for gaussian mixture models with application in image retrieval. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 835–840. Springer, Heidelberg (2005). doi:10.1007/11550907_132
Google Scholar
Jensen, J.H., Ellis, D.P.W., Christensen, M.G., Jensen, S.H.: Evaluation of distance measures between gaussian mixture models of MFCCs. In: ISMIR, pp. 107–108 (2007)
Google Scholar
Beecks, C., Ivanescu, A.M., Kirchhoff, S., Seidl, T.: Modeling image similarity by gaussian mixture models and the signature quadratic form distance. In: ICCV, pp. 1754–1761 (2011)
Google Scholar
Tao, Y., Cheng, R., Xiao, X., Ngai, W.K., Kao, B., Prabhakar, S.: Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: VLDB, pp. 922–933 (2005)
Google Scholar
Rougui, J.E., Gelgon, M., Aboutajdine, D., Mouaddib, N., Rziza, M.: Organizing Gaussian mixture models into a tree for scaling up speaker retrieval. Pattern Recogn. Lett. 28(11), 1314–1319 (2007)
Article Google Scholar
Böhm, C., Kunath, P., Pryakhin, A., Schubert, M.: Querying objects modeled by arbitrary probability distributions. In: Papadias, D., Zhang, D., Kollios, G. (eds.) SSTD 2007. LNCS, vol. 4605, pp. 294–311. Springer, Heidelberg (2007). doi:10.1007/978-3-540-73540-3_17
Chapter Google Scholar
Zhou, L., Wackersreuther, B., Fiedler, F., Plant, C., Böhm, C.: Gaussian component based index for GMMs. In: ICDM, pp. 1365–1370 (2016)
Google Scholar
Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: SIGKDD, pp. 226–231 (1996)
Google Scholar
Peel, M.C., Finlayson, B.L., McMahon, T.A.: Updated world map of the köppen-geiger climate classification. Hydrol. Earth Syst. Sci. Discuss. 4(2), 439–473 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Ludwig-Maximilians-Universität München, Munich, Germany
Linfei Zhou, Wei Ye & Christian Böhm
University of Vienna, Vienna, Austria
Claudia Plant

Authors

Linfei Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Wei Ye
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Plant
View author publications
You can also search for this author in PubMed Google Scholar
Christian Böhm
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian Böhm .

Editor information

Editors and Affiliations

LIAS/ISAE-ENSMA, Chasseneuil, France
Ladjel Bellatreche
University of Texas at Arlington, Arlington, Texas, USA
Sharma Chakravarthy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, L., Ye, W., Plant, C., Böhm, C. (2017). Knowledge Discovery of Complex Data Using Gaussian Mixture Models. In: Bellatreche, L., Chakravarthy, S. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2017. Lecture Notes in Computer Science(), vol 10440. Springer, Cham. https://doi.org/10.1007/978-3-319-64283-3_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-64283-3_30
Published: 03 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64282-6
Online ISBN: 978-3-319-64283-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics