Skip to main content

Knowledge Discovery of Complex Data Using Gaussian Mixture Models

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10440))

Abstract

With the explosive growth of data quantity and variety, the representation and analysis of complex data becomes a more and more challenging task in many modern applications. As a general class of probabilistic distribution functions, Gaussian Mixture Models have the ability to approximate arbitrary distributions in a concise way, making them very suitable for the representation of complex data. To facilitate efficient queries and following analysis, we generalize Euclidean distance to Gaussian Mixture Models and derive the closed-form expression called Infinite Euclidean Distance. Our metric enables efficient and accurate similarity calculations. For the analysis of complex data, we model two real-world data sets, NBA player statistic and the weather data of airports, into Gaussian Mixture Models, and we compare the performance of Infinite Euclidean Distance to previous similarity measures on both classification and clustering tasks. Experimental evaluations demonstrate the efficiency and effectiveness of Infinite Euclidean Distance and Gaussian Mixture Models on the analysis of complex data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://drive.google.com/open?id=0B3LRCuPdnX1BSTU3UjBCVDJSLWs.

  2. 2.

    https://drive.google.com/open?id=0B3LRCuPdnX1BUW5TbzNSdDBoaVk.

  3. 3.

    http://stats.nba.com/help/glossary/.

  4. 4.

    With the given parameter, the query accuracies of GQFD using VP-tree is guaranteed for the synthetic data.

  5. 5.

    http://bleacherreport.com/articles/537852-michael-jordan-and-his-nba-heirs-the-10-most-like-mike-players-in-the-league.

  6. 6.

    http://www.rantsports.com/nba/2015/07/12/10-current-nba-players-who-emulate-michael-jordans-competitiveness/.

References

  1. Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5), 308–311 (2006)

    Article  Google Scholar 

  2. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digit. Signal Proc. 10(1–3), 19–41 (2000)

    Article  Google Scholar 

  3. KaewTraKulPong, P., Bowden, R.: An improved adaptive background mixture model for real-time tracking with shadow detection. In: Remagnino, P., Jones, G.A., Paragios, N., Regazzoni, C.S. (eds.) Video-Based Surveillance Systems, pp. 135–144. Springer, Boston (2002)

    Chapter  Google Scholar 

  4. Zivkovic, Z.: Improved adaptive gaussian mixture model for background subtraction. In: ICPR, pp. 28–31 (2004)

    Google Scholar 

  5. STATS description. https://www.stats.com/sportvu-basketball-media/. Accessed 25 Feb 2017

  6. Cheplygina, V., Tax, D.M.J., Loog, M.: Dissimilarity-based ensembles for multiple instance learning. IEEE Trans. Neural Netw. Learn. Syst. 27(6), 1379–1391 (2016)

    Article  Google Scholar 

  7. Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: ICCV, pp. 1470–1477 (2003)

    Google Scholar 

  8. Kriegel, H.-P., Pryakhin, A., Schubert, M.: An EM-approach for clustering multi-instance objects. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS, vol. 3918, pp. 139–148. Springer, Heidelberg (2006). doi:10.1007/11731139_18

    Chapter  Google Scholar 

  9. Wei, X., Wu, J., Zhou, Z.: Scalable multi-instance learning. In: ICDM, pp. 1037–1042 (2014)

    Google Scholar 

  10. Zhou, Z., Sun, Y., Li, Y.: Multi-instance learning by treating instances as non-I.I.D. samples. In: ICML, pp. 1249–1256 (2009)

    Google Scholar 

  11. Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search in metric spaces. In: VLDB, pp. 426–435 (1997)

    Google Scholar 

  12. Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: ACM/SIGACT-SIAM SODA, pp. 311–321 (1993)

    Google Scholar 

  13. Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89(1–2), 31–71 (1997)

    Article  MATH  Google Scholar 

  14. Amores, J.: Multiple instance classification: Review, taxonomy and comparative study. Artif. Intell. 201, 81–105 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  15. Weidmann, N., Frank, E., Pfahringer, B.: A two-level learning method for generalized multi-instance problems. In: Lavrač, N., Gamberger, D., Blockeel, H., Todorovski, L. (eds.) ECML 2003. LNCS, vol. 2837, pp. 468–479. Springer, Heidelberg (2003). doi:10.1007/978-3-540-39857-8_42

    Chapter  Google Scholar 

  16. Chen, Y., Wang, J.Z.: Image categorization by learning and reasoning with regions. J. Mach. Learn. Res. 5, 913–939 (2004)

    MathSciNet  Google Scholar 

  17. Chen, Y., Bi, J., Wang, J.Z.: MILES: multiple-instance learning via embedded instance selection. Pattern Anal. Mach. Intell. 28(12), 1931–1947 (2006)

    Article  Google Scholar 

  18. Wang, H., Yang, Q., Zha, H.: Adaptive p-posterior mixture-model kernels for multiple instance learning. In: ICML, pp. 1136–1143 (2008)

    Google Scholar 

  19. Vatsavai, R.R.: Gaussian multiple instance learning approach for mapping the slums of the world using very high resolution imagery. In: SIGKDD, pp. 1419–1426 (2013)

    Google Scholar 

  20. Sikka, K., Giri, R., Bartlett, M.S.: Joint clustering and classification for multiple instance learning. In: BMVC, p. 71.1–71.12 (2015)

    Google Scholar 

  21. Reynolds, D.: Gaussian mixture models. In: Li, S.Z., Jain, A. (eds.) Encyclopedia of Biometrics, pp. 827–832. Springer, New York (2015)

    Chapter  Google Scholar 

  22. Kullback, S.: Information Theory and Statistics. Courier Dover Publications, Mineola (2012)

    MATH  Google Scholar 

  23. Hershey, J.R., Olsen, P.A.: Approximating the kullback leibler divergence between gaussian mixture models. In: ICASSP, pp. 317–320 (2007)

    Google Scholar 

  24. Goldberger, J., Gordon, S., Greenspan, H.: An efficient image similarity measure based on approximations of KL-divergence between two gaussian mixtures. In: ICCV, pp. 487–493 (2003)

    Google Scholar 

  25. Cui, S., Datcu, M.: Comparison of kullback-leibler divergence approximation methods between gaussian mixture models for satellite image retrieval. In: IGARSS, pp. 3719–3722 (2015)

    Google Scholar 

  26. Helén, M.L., Virtanen, T.: Query by example of audio signals using euclidean distance between gaussian mixture models. In: ICASSP, vol. 1, pp. 225–228 (2007)

    Google Scholar 

  27. Sfikas, G., Constantinopoulos, C., Likas, A., Galatsanos, N.P.: An analytic distance metric for gaussian mixture models with application in image retrieval. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 835–840. Springer, Heidelberg (2005). doi:10.1007/11550907_132

    Google Scholar 

  28. Jensen, J.H., Ellis, D.P.W., Christensen, M.G., Jensen, S.H.: Evaluation of distance measures between gaussian mixture models of MFCCs. In: ISMIR, pp. 107–108 (2007)

    Google Scholar 

  29. Beecks, C., Ivanescu, A.M., Kirchhoff, S., Seidl, T.: Modeling image similarity by gaussian mixture models and the signature quadratic form distance. In: ICCV, pp. 1754–1761 (2011)

    Google Scholar 

  30. Tao, Y., Cheng, R., Xiao, X., Ngai, W.K., Kao, B., Prabhakar, S.: Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: VLDB, pp. 922–933 (2005)

    Google Scholar 

  31. Rougui, J.E., Gelgon, M., Aboutajdine, D., Mouaddib, N., Rziza, M.: Organizing Gaussian mixture models into a tree for scaling up speaker retrieval. Pattern Recogn. Lett. 28(11), 1314–1319 (2007)

    Article  Google Scholar 

  32. Böhm, C., Kunath, P., Pryakhin, A., Schubert, M.: Querying objects modeled by arbitrary probability distributions. In: Papadias, D., Zhang, D., Kollios, G. (eds.) SSTD 2007. LNCS, vol. 4605, pp. 294–311. Springer, Heidelberg (2007). doi:10.1007/978-3-540-73540-3_17

    Chapter  Google Scholar 

  33. Zhou, L., Wackersreuther, B., Fiedler, F., Plant, C., Böhm, C.: Gaussian component based index for GMMs. In: ICDM, pp. 1365–1370 (2016)

    Google Scholar 

  34. Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: SIGKDD, pp. 226–231 (1996)

    Google Scholar 

  35. Peel, M.C., Finlayson, B.L., McMahon, T.A.: Updated world map of the köppen-geiger climate classification. Hydrol. Earth Syst. Sci. Discuss. 4(2), 439–473 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian Böhm .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zhou, L., Ye, W., Plant, C., Böhm, C. (2017). Knowledge Discovery of Complex Data Using Gaussian Mixture Models. In: Bellatreche, L., Chakravarthy, S. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2017. Lecture Notes in Computer Science(), vol 10440. Springer, Cham. https://doi.org/10.1007/978-3-319-64283-3_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64283-3_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64282-6

  • Online ISBN: 978-3-319-64283-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics